pyspark.pandas.DataFrame.mask#

DataFrame.mask(cond, other=nan)[source]#

Replace values where the condition is True.

Parameters
condboolean DataFrame

Where cond is False, keep the original value. Where True, replace with corresponding value from other.

otherscalar, DataFrame

Entries where cond is True are replaced with corresponding value from other.

Returns
DataFrame

Examples

>>> from pyspark.pandas.config import set_option, reset_option
>>> set_option("compute.ops_on_diff_frames", True)
>>> df1 = ps.DataFrame({'A': [0, 1, 2, 3, 4], 'B':[100, 200, 300, 400, 500]})
>>> df2 = ps.DataFrame({'A': [0, -1, -2, -3, -4], 'B':[-100, -200, -300, -400, -500]})
>>> df1
   A    B
0  0  100
1  1  200
2  2  300
3  3  400
4  4  500
>>> df2
   A    B
0  0 -100
1 -1 -200
2 -2 -300
3 -3 -400
4 -4 -500
>>> df1.mask(df1 > 0).sort_index()
     A   B
0  0.0 NaN
1  NaN NaN
2  NaN NaN
3  NaN NaN
4  NaN NaN
>>> df1.mask(df1 > 1, 10).sort_index()
    A   B
0   0  10
1   1  10
2  10  10
3  10  10
4  10  10
>>> df1.mask(df1 > 1, df1 + 100).sort_index()
     A    B
0    0  200
1    1  300
2  102  400
3  103  500
4  104  600
>>> df1.mask(df1 > 1, df2).sort_index()
   A    B
0  0 -100
1  1 -200
2 -2 -300
3 -3 -400
4 -4 -500
>>> reset_option("compute.ops_on_diff_frames")