pyspark.pandas.Series.map¶

Series.map(arg: Union[Dict, Callable]) → pyspark.pandas.series.Series[source]¶

Map values of Series according to input correspondence.

Used for substituting each value in a Series with another value, that may be derived from a function, a dict.

Note

make sure the size of the dictionary is not huge because it could downgrade the performance or throw OutOfMemoryError due to a huge expression within Spark. Consider the input as a functions as an alternative instead in this case.

Parameters

argfunction or dict: Mapping correspondence.

Returns

Series: Same index as caller.

See also

Series.apply: For applying more complex functions on a Series.
DataFrame.applymap: Apply a function elementwise on a whole DataFrame.

Notes

When arg is a dictionary, values in Series that are not in the dictionary (as keys) are converted to None. However, if the dictionary is a dict subclass that defines __missing__ (i.e. provides a method for default values), then this default is used rather than None.

Examples

>>> s = ps.Series(['cat', 'dog', None, 'rabbit'])
>>> s
0       cat
1       dog
2      None
3    rabbit
dtype: object

map accepts a dict. Values that are not found in the dict are converted to None, unless the dict has a default value (e.g. defaultdict):

>>> s.map({'cat': 'kitten', 'dog': 'puppy'})
0    kitten
1     puppy
2      None
3      None
dtype: object

It also accepts a function:

>>> def format(x) -> str:
...     return 'I am a {}'.format(x)

>>> s.map(format)
0       I am a cat
1       I am a dog
2      I am a None
3    I am a rabbit
dtype: object

pyspark.pandas.Series.transform pyspark.pandas.Series.groupby