Class StatefulProcessor<K,I,O>

Object
org.apache.spark.sql.streaming.StatefulProcessor<K,I,O>
All Implemented Interfaces:
Serializable
Direct Known Subclasses:
StatefulProcessorWithInitialState

public abstract class StatefulProcessor<K,I,O> extends Object implements Serializable
Represents the arbitrary stateful logic that needs to be provided by the user to perform stateful manipulations on keyed streams.

Users can also explicitly use import implicits._ to access the EncoderImplicits and use the state variable APIs relying on implicit encoders.

See Also:
  • Constructor Details

    • StatefulProcessor

      public StatefulProcessor()
  • Method Details

    • close

      public void close()
      Function called as the last method that allows for users to perform any cleanup or teardown operations.
    • getHandle

      public final StatefulProcessorHandle getHandle()
      Function to get the stateful processor handle that will be used to interact with the state

      Returns:
      handle - instance of StatefulProcessorHandle
    • handleExpiredTimer

      public scala.collection.Iterator<O> handleExpiredTimer(K key, TimerValues timerValues, ExpiredTimerInfo expiredTimerInfo)
      Function that will be invoked when a timer is fired for a given key. Users can choose to evict state, register new timers and optionally provide output rows.

      Note that in microbatch mode, this function will be called once for each unique timer expiry for a given key. If no timer expires for a given key, this function will not be invoked for that key.

      Parameters:
      key - \- grouping key
      timerValues - \- instance of TimerValues that provides access to current processing/event
      expiredTimerInfo - \- instance of ExpiredTimerInfo that provides access to expired timer

      Returns:
      Zero or more output rows
    • handleInputRows

      public abstract scala.collection.Iterator<O> handleInputRows(K key, scala.collection.Iterator<I> inputRows, TimerValues timerValues)
      Function that will allow users to interact with input data rows along with the grouping key and current timer values and optionally provide output rows.

      Note that in microbatch mode, input rows for a given grouping key will be provided in a single function invocation. If the grouping key is not seen in the current microbatch, this function will not be invoked for that key.

      Parameters:
      key - \- grouping key
      inputRows - \- iterator of input rows associated with grouping key
      timerValues - \- instance of TimerValues that provides access to current processing/event time if available

      Returns:
      \- Zero or more output rows
    • implicits

      public StatefulProcessor<K,I,O>.implicits$ implicits()
    • init

      public abstract void init(OutputMode outputMode, TimeMode timeMode)
      Function that will be invoked as the first method that allows for users to initialize all their state variables and perform other init actions before handling data.
      Parameters:
      outputMode - \- output mode for the stateful processor
      timeMode - \- time mode for the stateful processor.
    • setHandle

      public final void setHandle(StatefulProcessorHandle handle)
      Function to set the stateful processor handle that will be used to interact with the state store and other stateful processor related operations.

      Parameters:
      handle - \- instance of StatefulProcessorHandle