pyspark.streaming.DStream.countByWindow

DStream.countByWindow(windowDuration: int, slideDuration: int) → pyspark.streaming.dstream.DStream[int][source]

Return a new DStream in which each RDD has a single element generated by counting the number of elements in a window over this DStream. windowDuration and slideDuration are as defined in the window() operation.

This is equivalent to window(windowDuration, slideDuration).count(), but will be more efficient if window is large.