All Superinterfaces:: SparkDataStream

@Evolving public interface ContinuousStream extends SparkDataStream

A SparkDataStream for streaming queries with continuous mode.

Since:: 3.0.0

Method Summary

Modifier and Type

Method

Description

ContinuousPartitionReaderFactory

createContinuousReaderFactory()

Returns a factory to create a ContinuousPartitionReader for each InputPartition.

Offset

mergeOffsets(PartitionOffset[] offsets)

Merge partitioned offsets coming from ContinuousPartitionReader instances for each partition to a single global offset.

default boolean

needsReconfiguration()

The execution engine will call this method in every epoch to determine if new input partitions need to be generated, which may be required if for example the underlying source system has had partitions added or removed.

InputPartition[]

planInputPartitions(Offset start)

Returns a list of input partitions given the start offset.

Methods inherited from interface org.apache.spark.sql.connector.read.streaming.SparkDataStream
commit, deserializeOffset, initialOffset, stop

Method Details
- planInputPartitions
  
  InputPartition[] planInputPartitions(Offset start)
  
  Returns a list of input partitions given the start offset. Each InputPartition represents a data split that can be processed by one Spark task. The number of input partitions returned here is the same as the number of RDD partitions this scan outputs.
  If the Scan supports filter pushdown, this stream is likely configured with a filter and is responsible for creating splits for that filter, which is not a full scan.
  
  This method will be called to launch one Spark job for reading the data stream. It will be called more than once, if needsReconfiguration() returns true and Spark needs to launch a new job.
- createContinuousReaderFactory
  
  ContinuousPartitionReaderFactory createContinuousReaderFactory()
  
  Returns a factory to create a ContinuousPartitionReader for each InputPartition.
- mergeOffsets
  
  Offset mergeOffsets(PartitionOffset[] offsets)
  
  Merge partitioned offsets coming from ContinuousPartitionReader instances for each partition to a single global offset.
- needsReconfiguration
  
  default boolean needsReconfiguration()
  
  The execution engine will call this method in every epoch to determine if new input partitions need to be generated, which may be required if for example the underlying source system has had partitions added or removed.
  If true, the Spark job to scan this continuous data stream will be interrupted and Spark will launch it again with a new list of input partitions.

Interface ContinuousStream

Method Summary

Methods inherited from interface org.apache.spark.sql.connector.read.streaming.SparkDataStream

Method Details

planInputPartitions

createContinuousReaderFactory

mergeOffsets

needsReconfiguration