pyspark.pandas.DataFrame.replace#

DataFrame.replace(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad')[source]#

Returns a new DataFrame replacing a value with another value.

Parameters
to_replaceint, float, string, list, tuple or dict

Value to be replaced.

valueint, float, string, list or tuple

Value to use to replace holes. The replacement value must be an int, float, or string. If value is a list or tuple, value should be of the same length with to_replace.

inplaceboolean, default False

Fill in place (do not create a new object)

limitint, default None

Maximum size gap to forward or backward fill.

Deprecated since version 4.0.0.

regexbool or str, default False

Whether to interpret to_replace and/or value as regular expressions. If this is True then to_replace must be a string. Alternatively, this could be a regular expression in which case to_replace must be None.

method‘pad’, default None

The method to use when for replacement, when to_replace is a scalar, list or tuple and value is None.

Deprecated since version 4.0.0.

Returns
DataFrame

Object after replacement.

Examples

>>> df = ps.DataFrame({"name": ['Ironman', 'Captain America', 'Thor', 'Hulk'],
...                    "weapon": ['Mark-45', 'Shield', 'Mjolnir', 'Smash']},
...                   columns=['name', 'weapon'])
>>> df
              name   weapon
0          Ironman  Mark-45
1  Captain America   Shield
2             Thor  Mjolnir
3             Hulk    Smash

Scalar to_replace and value

>>> df.replace('Ironman', 'War-Machine')
              name   weapon
0      War-Machine  Mark-45
1  Captain America   Shield
2             Thor  Mjolnir
3             Hulk    Smash

List like to_replace and value

>>> df.replace(['Ironman', 'Captain America'], ['Rescue', 'Hawkeye'], inplace=True)
>>> df
      name   weapon
0   Rescue  Mark-45
1  Hawkeye   Shield
2     Thor  Mjolnir
3     Hulk    Smash

Dicts can be used to specify different replacement values for different existing values To use a dict in this way the value parameter should be None

>>> df.replace({'Mjolnir': 'Stormbuster'})
      name       weapon
0   Rescue      Mark-45
1  Hawkeye       Shield
2     Thor  Stormbuster
3     Hulk        Smash

Dict can specify that different values should be replaced in different columns The value parameter should not be None in this case

>>> df.replace({'weapon': 'Mjolnir'}, 'Stormbuster')
      name       weapon
0   Rescue      Mark-45
1  Hawkeye       Shield
2     Thor  Stormbuster
3     Hulk        Smash

Nested dictionaries The value parameter should be None to use a nested dict in this way

>>> df.replace({'weapon': {'Mjolnir': 'Stormbuster'}})
      name       weapon
0   Rescue      Mark-45
1  Hawkeye       Shield
2     Thor  Stormbuster
3     Hulk        Smash