pyspark.pandas.DataFrame.transform¶
- 
DataFrame.transform(func: Callable[[…], Series], axis: Union[int, str] = 0, *args: Any, **kwargs: Any) → DataFrame[source]¶
- Call - funcon self producing a Series with transformed values and that has the same length as its input.- See also Transform and apply a function. - Note - this API executes the function once to infer the type which is potentially expensive, for instance, when the dataset is created after aggregations or sorting. - To avoid this, specify return type in - func, for instance, as below:- >>> def square(x) -> ps.Series[np.int32]: ... return x ** 2 - pandas-on-Spark uses return type hints and does not try to infer the type. - Note - the series within - funcis actually multiple pandas series as the segments of the whole pandas-on-Spark series; therefore, the length of each series is not guaranteed. As an example, an aggregation against each series does work as a global aggregation but an aggregation of each segment. See below:- >>> def func(x) -> ps.Series[np.int32]: ... return x + sum(x) - Parameters
- funcfunction
- Function to use for transforming the data. It must work when pandas Series is passed. 
- axisint, default 0 or ‘index’
- Can only be set to 0 now. 
- *args
- Positional arguments to pass to func. 
- **kwargs
- Keyword arguments to pass to func. 
 
- Returns
- DataFrame
- A DataFrame that must have the same length as self. 
 
- Raises
- ExceptionIf the returned DataFrame has a different length than self.
 
 - See also - DataFrame.aggregate
- Only perform aggregating type operations. 
- DataFrame.apply
- Invoke function on DataFrame. 
- Series.transform
- The equivalent function for Series. 
 - Examples - >>> df = ps.DataFrame({'A': range(3), 'B': range(1, 4)}, columns=['A', 'B']) >>> df A B 0 0 1 1 1 2 2 2 3 - >>> def square(x) -> ps.Series[np.int32]: ... return x ** 2 >>> df.transform(square) A B 0 0 1 1 1 4 2 4 9 - You can omit type hints and let pandas-on-Spark infer its type. - >>> df.transform(lambda x: x ** 2) A B 0 0 1 1 1 4 2 4 9 - For multi-index columns: - >>> df.columns = [('X', 'A'), ('X', 'B')] >>> df.transform(square) X A B 0 0 1 1 1 4 2 4 9 - >>> (df * -1).transform(abs) X A B 0 0 1 1 1 2 2 2 3 - You can also specify extra arguments. - >>> def calculation(x, y, z) -> ps.Series[int]: ... return x ** y + z >>> df.transform(calculation, y=10, z=20) X A B 0 20 21 1 21 1044 2 1044 59069