Interface SupportsTriggerAvailableNow
- All Superinterfaces:
SparkDataStream,SupportsAdmissionControl
An interface for streaming sources that supports running in Trigger.AvailableNow mode, which
will process all the available data at the beginning of the query in (possibly) multiple batches.
This mode will have better scalability comparing to Trigger.Once mode.
- Since:
- 3.3.0
-
Method Summary
Modifier and TypeMethodDescriptionvoidThis will be called at the beginning of streaming queries with Trigger.AvailableNow, to let the source record the offset for the current latest data at the time (a.k.a the target offset for the query).Methods inherited from interface org.apache.spark.sql.connector.read.streaming.SparkDataStream
commit, deserializeOffset, initialOffset, stopMethods inherited from interface org.apache.spark.sql.connector.read.streaming.SupportsAdmissionControl
getDefaultReadLimit, latestOffset, reportLatestOffset
-
Method Details
-
prepareForTriggerAvailableNow
void prepareForTriggerAvailableNow()This will be called at the beginning of streaming queries with Trigger.AvailableNow, to let the source record the offset for the current latest data at the time (a.k.a the target offset for the query). The source will behave as if there is no new data coming in after the target offset, i.e., the source will not return an offset higher than the target offset whenlatestOffsetis called.Note that there is an exception on the first uncommitted batch after a restart, where the end offset is not derived from the current latest offset. Sources need to take special considerations if wanting to assert such relation. One possible way is to have an internal flag in the source to indicate whether it is Trigger.AvailableNow, set the flag in this method, and record the target offset in the first call of
latestOffset.
-