pyspark.pandas.window.Rolling.sum#

Calculate rolling summation of given DataFrame or Series.

Note

the current implementation of this API uses Spark’s Window without specifying partition specification. This leads to move all data into single partition in single machine and could cause serious performance degradation. Avoid this method against very large dataset.

Returns

Series or DataFrame: Same type as the input, with the same index, containing the rolling summation.

See also

pyspark.pandas.Series.expanding: Calling object with Series data.
pyspark.pandas.DataFrame.expanding: Calling object with DataFrames.
pyspark.pandas.Series.sum: Reducing sum for Series.
pyspark.pandas.DataFrame.sum: Reducing sum for DataFrame.

Examples

>>> s = ps.Series([4, 3, 5, 2, 6])
>>> s
0    4
1    3
2    5
3    2
4    6
dtype: int64

>>> s.rolling(2).sum()
  NaN
  7.0
  8.0
  7.0
  8.0
dtype: float64

>>> s.rolling(3).sum()
   NaN
   NaN
  12.0
  10.0
  13.0
dtype: float64

For DataFrame, each rolling summation is computed column-wise.

>>> df = ps.DataFrame({"A": s.to_numpy(), "B": s.to_numpy() ** 2})
>>> df
   A   B
0  4  16
1  3   9
2  5  25
3  2   4
4  6  36

>>> df.rolling(2).sum()
     A     B
NaN   NaN
7.0  25.0
8.0  34.0
7.0  29.0
8.0  40.0

>>> df.rolling(3).sum()
      A     B
 NaN   NaN
 NaN   NaN
12.0  50.0
10.0  38.0
13.0  65.0