pyspark.pandas.DataFrame.mask#

DataFrame.mask(cond, other=nan)[source]#

Replace values where the condition is True.

Parameters

condboolean DataFrame: Where cond is False, keep the original value. Where True, replace with corresponding value from other.
otherscalar, DataFrame: Entries where cond is True are replaced with corresponding value from other.

Returns

DataFrame

Examples

>>> from pyspark.pandas.config import set_option, reset_option
>>> set_option("compute.ops_on_diff_frames", True)
>>> df1 = ps.DataFrame({'A': [0, 1, 2, 3, 4], 'B':[100, 200, 300, 400, 500]})
>>> df2 = ps.DataFrame({'A': [0, -1, -2, -3, -4], 'B':[-100, -200, -300, -400, -500]})
>>> df1
   A    B
0  0  100
1  1  200
2  2  300
3  3  400
4  4  500
>>> df2
   A    B
0  0 -100
1 -1 -200
2 -2 -300
3 -3 -400
4 -4 -500

>>> df1.mask(df1 > 0).sort_index()
     A   B
0.0 NaN
NaN NaN
NaN NaN
NaN NaN
NaN NaN

>>> df1.mask(df1 > 1, 10).sort_index()
    A   B
 0  10
 1  10
10  10
10  10
10  10

>>> df1.mask(df1 > 1, df1 + 100).sort_index()
     A    B
  0  200
  1  300
102  400
103  500
104  600

>>> df1.mask(df1 > 1, df2).sort_index()
   A    B
0 -100
1 -200
-2 -300
-3 -400
-4 -500

>>> reset_option("compute.ops_on_diff_frames")