pyspark.pandas.DataFrame.combine_first#

DataFrame.combine_first(other)[source]#

Update null elements with value in the same location in other.

Combine two DataFrame objects by filling null values in one DataFrame with non-null values from other DataFrame. The row and column indexes of the resulting DataFrame will be the union of the two.

New in version 3.3.0.

Parameters

otherDataFrame: Provided DataFrame to use to fill null values.

Returns

DataFrame

Examples

>>> ps.set_option("compute.ops_on_diff_frames", True)
>>> df1 = ps.DataFrame({'A': [None, 0], 'B': [None, 4]})
>>> df2 = ps.DataFrame({'A': [1, 1], 'B': [3, 3]})

>>> df1.combine_first(df2).sort_index()
     A    B
0  1.0  3.0
1  0.0  4.0

Null values persist if the location of that null value does not exist in other

>>> df1 = ps.DataFrame({'A': [None, 0], 'B': [4, None]})
>>> df2 = ps.DataFrame({'B': [3, 3], 'C': [1, 1]}, index=[1, 2])

>>> df1.combine_first(df2).sort_index()
     A    B    C
0  NaN  4.0  NaN
1  0.0  3.0  1.0
2  NaN  3.0  1.0
>>> ps.reset_option("compute.ops_on_diff_frames")