pyspark.pandas.DataFrame.mask#
- DataFrame.mask(cond, other=nan)[source]#
Replace values where the condition is True.
- Parameters
- condboolean DataFrame
Where cond is False, keep the original value. Where True, replace with corresponding value from other.
- otherscalar, DataFrame
Entries where cond is True are replaced with corresponding value from other.
- Returns
- DataFrame
Examples
>>> from pyspark.pandas.config import set_option, reset_option >>> set_option("compute.ops_on_diff_frames", True) >>> df1 = ps.DataFrame({'A': [0, 1, 2, 3, 4], 'B':[100, 200, 300, 400, 500]}) >>> df2 = ps.DataFrame({'A': [0, -1, -2, -3, -4], 'B':[-100, -200, -300, -400, -500]}) >>> df1 A B 0 0 100 1 1 200 2 2 300 3 3 400 4 4 500 >>> df2 A B 0 0 -100 1 -1 -200 2 -2 -300 3 -3 -400 4 -4 -500
>>> df1.mask(df1 > 0).sort_index() A B 0 0.0 NaN 1 NaN NaN 2 NaN NaN 3 NaN NaN 4 NaN NaN
>>> df1.mask(df1 > 1, 10).sort_index() A B 0 0 10 1 1 10 2 10 10 3 10 10 4 10 10
>>> df1.mask(df1 > 1, df1 + 100).sort_index() A B 0 0 200 1 1 300 2 102 400 3 103 500 4 104 600
>>> df1.mask(df1 > 1, df2).sort_index() A B 0 0 -100 1 1 -200 2 -2 -300 3 -3 -400 4 -4 -500
>>> reset_option("compute.ops_on_diff_frames")