pyspark.pandas.DataFrame.pivot_table#
- DataFrame.pivot_table(values=None, index=None, columns=None, aggfunc='mean', fill_value=None)[source]#
Create a spreadsheet-style pivot table as a DataFrame. The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame.
- Parameters
- valuescolumn to aggregate.
They should be either a list less than three or a string.
- indexcolumn (string) or list of columns
If an array is passed, it must be the same length as the data. The list should contain string.
- columnscolumn
Columns used in the pivot operation. Only one column is supported and it should be a string.
- aggfuncfunction (string), dict, default mean
If dict is passed, the key is column to aggregate and value is function or list of functions.
- fill_valuescalar, default None
Value to replace missing values with.
- Returns
- tableDataFrame
Examples
>>> df = ps.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo", ... "bar", "bar", "bar", "bar"], ... "B": ["one", "one", "one", "two", "two", ... "one", "one", "two", "two"], ... "C": ["small", "large", "large", "small", ... "small", "large", "small", "small", ... "large"], ... "D": [1, 2, 2, 3, 3, 4, 5, 6, 7], ... "E": [2, 4, 5, 5, 6, 6, 8, 9, 9]}, ... columns=['A', 'B', 'C', 'D', 'E']) >>> df A B C D E 0 foo one small 1 2 1 foo one large 2 4 2 foo one large 2 5 3 foo two small 3 5 4 foo two small 3 6 5 bar one large 4 6 6 bar one small 5 8 7 bar two small 6 9 8 bar two large 7 9
This first example aggregates values by taking the sum.
>>> table = df.pivot_table(values='D', index=['A', 'B'], ... columns='C', aggfunc='sum') >>> table.sort_index() C large small A B bar one 4.0 5 two 7.0 6 foo one 4.0 1 two NaN 6
We can also fill missing values using the fill_value parameter.
>>> table = df.pivot_table(values='D', index=['A', 'B'], ... columns='C', aggfunc='sum', fill_value=0) >>> table.sort_index() C large small A B bar one 4 5 two 7 6 foo one 4 1 two 0 6
We can also calculate multiple types of aggregations for any given value column.
>>> table = df.pivot_table(values=['D'], index =['C'], ... columns="A", aggfunc={'D': 'mean'}) >>> table.sort_index() D A bar foo C large 5.5 2.000000 small 5.5 2.333333
The next example aggregates on multiple values.
>>> table = df.pivot_table(index=['C'], columns="A", values=['D', 'E'], ... aggfunc={'D': 'mean', 'E': 'sum'}) >>> table.sort_index() D E A bar foo bar foo C large 5.5 2.000000 15 9 small 5.5 2.333333 17 13