@Evolving public interface ContinuousStream extends SparkDataStream
SparkDataStream
for streaming queries with continuous mode.Modifier and Type | Method and Description |
---|---|
ContinuousPartitionReaderFactory |
createContinuousReaderFactory()
Returns a factory to create a
ContinuousPartitionReader for each
InputPartition . |
Offset |
mergeOffsets(PartitionOffset[] offsets)
Merge partitioned offsets coming from
ContinuousPartitionReader instances
for each partition to a single global offset. |
default boolean |
needsReconfiguration()
The execution engine will call this method in every epoch to determine if new input
partitions need to be generated, which may be required if for example the underlying
source system has had partitions added or removed.
|
InputPartition[] |
planInputPartitions(Offset start)
Returns a list of
input partitions given the start offset. |
commit, deserializeOffset, initialOffset, stop
InputPartition[] planInputPartitions(Offset start)
input partitions
given the start offset. Each
InputPartition
represents a data split that can be processed by one Spark task. The
number of input partitions returned here is the same as the number of RDD partitions this scan
outputs.
If the Scan
supports filter pushdown, this stream is likely configured with a filter
and is responsible for creating splits for that filter, which is not a full scan.
This method will be called to launch one Spark job for reading the data stream. It will be
called more than once, if needsReconfiguration()
returns true and Spark needs to
launch a new job.
ContinuousPartitionReaderFactory createContinuousReaderFactory()
ContinuousPartitionReader
for each
InputPartition
.Offset mergeOffsets(PartitionOffset[] offsets)
ContinuousPartitionReader
instances
for each partition to a single global offset.default boolean needsReconfiguration()
If true, the Spark job to scan this continuous data stream will be interrupted and Spark will
launch it again with a new list of input partitions
.