pyspark.sql.streaming.StatefulProcessor.handleInputRows#

abstract StatefulProcessor.handleInputRows(key, rows, timerValues)[source]#

Function that will allow users to interact with input data rows along with the grouping key.

Type of input data and return are different by which method is called, such as:

For transformWithStateInPandas, it should take parameters (key, Iterator[pandas.DataFrame]) and return another Iterator[pandas.DataFrame]. For transformWithState, it should take parameters (key, Iterator[pyspark.sql.Row]) and return another Iterator[pyspark.sql.Row].

Note that the function should not make a guess of the number of elements in the iterator. To process all data, the handleInputRows function needs to iterate all elements and process them. On the other hand, the handleInputRows function is not strictly required to iterate through all elements in the iterator if it intends to read a part of data.

Parameters

keyAny: grouping key.
rowsiterable of pandas.DataFrame or iterable of pyspark.sql.Row: iterator of input rows associated with grouping key
timerValues: TimerValues: Timer value for the current batch that process the input rows. Users can get the processing or event time timestamp from TimerValues.