package streaming
Type Members
- trait AcceptsLatestSeenOffset extends SparkDataStream
Indicates that the source accepts the latest seen offset, which requires streaming execution to provide the latest seen offset when restarting the streaming query from checkpoint.
Indicates that the source accepts the latest seen offset, which requires streaming execution to provide the latest seen offset when restarting the streaming query from checkpoint.
Note that this interface aims to only support DSv2 streaming sources. Spark may throw error if the interface is implemented along with DSv1 streaming sources.
The callback method will be called once per run.
- Since
3.3.0
- final class CompositeReadLimit extends ReadLimit
/** Represents a
ReadLimit
where theMicroBatchStream
should scan approximately given maximum number of rows with at least the given minimum number of rows./** Represents a
ReadLimit
where theMicroBatchStream
should scan approximately given maximum number of rows with at least the given minimum number of rows.- Annotations
- @Evolving()
- Since
3.2.0
- See also
SupportsAdmissionControl#latestOffset(Offset, ReadLimit)
- trait ContinuousPartitionReader[T] extends PartitionReader[T]
A variation on
PartitionReader
for use with continuous streaming processing.A variation on
PartitionReader
for use with continuous streaming processing.- Annotations
- @Evolving()
- Since
3.0.0
- trait ContinuousPartitionReaderFactory extends PartitionReaderFactory
A variation on
PartitionReaderFactory
that returnsContinuousPartitionReader
instead ofPartitionReader
.A variation on
PartitionReaderFactory
that returnsContinuousPartitionReader
instead ofPartitionReader
. It's used for continuous streaming processing.- Annotations
- @Evolving()
- Since
3.0.0
- trait ContinuousStream extends SparkDataStream
A
SparkDataStream
for streaming queries with continuous mode.A
SparkDataStream
for streaming queries with continuous mode.- Annotations
- @Evolving()
- Since
3.0.0
- trait MicroBatchStream extends SparkDataStream
A
SparkDataStream
for streaming queries with micro-batch mode.A
SparkDataStream
for streaming queries with micro-batch mode.- Annotations
- @Evolving()
- Since
3.0.0
- abstract class Offset extends AnyRef
An abstract representation of progress through a
MicroBatchStream
orContinuousStream
.An abstract representation of progress through a
MicroBatchStream
orContinuousStream
.During execution, offsets provided by the data source implementation will be logged and used as restart checkpoints. Each source should provide an offset implementation which the source can use to reconstruct a position in the stream up to which data has been seen/processed.
- Annotations
- @Evolving()
- Since
3.0.0
- trait PartitionOffset extends Serializable
Used for per-partition offsets in continuous processing.
Used for per-partition offsets in continuous processing. ContinuousReader implementations will provide a method to merge these into a global Offset.
These offsets must be serializable.
- Annotations
- @Evolving()
- Since
3.0.0
- final class ReadAllAvailable extends ReadLimit
Represents a
ReadLimit
where theMicroBatchStream
must scan all the data available at the streaming source.Represents a
ReadLimit
where theMicroBatchStream
must scan all the data available at the streaming source. This is meant to be a hard specification as being able to return all available data is necessary forTrigger.Once()
to work correctly. If a source is unable to scan all available data, then it must throw an error.- Annotations
- @Evolving()
- Since
3.0.0
- See also
SupportsAdmissionControl#latestOffset(Offset, ReadLimit)
- trait ReadLimit extends AnyRef
Interface representing limits on how much to read from a
MicroBatchStream
when it implementsSupportsAdmissionControl
.Interface representing limits on how much to read from a
MicroBatchStream
when it implementsSupportsAdmissionControl
. There are several child interfaces representing various kinds of limits.- Annotations
- @Evolving()
- Since
3.0.0
- See also
SupportsAdmissionControl#latestOffset(Offset, ReadLimit)
ReadAllAvailable
ReadMaxRows
- class ReadMaxBytes extends ReadLimit
Represents a
ReadLimit
where theMicroBatchStream
should scan files which total size doesn't go beyond a given maximum total size.Represents a
ReadLimit
where theMicroBatchStream
should scan files which total size doesn't go beyond a given maximum total size. Always reads at least one file so a stream can make progress and not get stuck on a file larger than a given maximum.- Annotations
- @Evolving()
- Since
4.0.0
- See also
SupportsAdmissionControl#latestOffset(Offset, ReadLimit)
- class ReadMaxFiles extends ReadLimit
Represents a
ReadLimit
where theMicroBatchStream
should scan approximately the given maximum number of files.Represents a
ReadLimit
where theMicroBatchStream
should scan approximately the given maximum number of files.- Annotations
- @Evolving()
- Since
3.0.0
- See also
SupportsAdmissionControl#latestOffset(Offset, ReadLimit)
- final class ReadMaxRows extends ReadLimit
Represents a
ReadLimit
where theMicroBatchStream
should scan approximately the given maximum number of rows.Represents a
ReadLimit
where theMicroBatchStream
should scan approximately the given maximum number of rows.- Annotations
- @Evolving()
- Since
3.0.0
- See also
SupportsAdmissionControl#latestOffset(Offset, ReadLimit)
- final class ReadMinRows extends ReadLimit
Represents a
ReadLimit
where theMicroBatchStream
should scan approximately at least the given minimum number of rows.Represents a
ReadLimit
where theMicroBatchStream
should scan approximately at least the given minimum number of rows.- Annotations
- @Evolving()
- Since
3.2.0
- See also
SupportsAdmissionControl#latestOffset(Offset, ReadLimit)
- trait ReportsSinkMetrics extends AnyRef
A mix-in interface for streaming sinks to signal that they can report metrics.
A mix-in interface for streaming sinks to signal that they can report metrics.
- Annotations
- @Evolving()
- Since
3.4.0
- trait ReportsSourceMetrics extends SparkDataStream
A mix-in interface for
SparkDataStream
streaming sources to signal that they can report metrics.A mix-in interface for
SparkDataStream
streaming sources to signal that they can report metrics.- Annotations
- @Evolving()
- Since
3.2.0
- trait SparkDataStream extends AnyRef
The base interface representing a readable data stream in a Spark streaming query.
The base interface representing a readable data stream in a Spark streaming query. It's responsible to manage the offsets of the streaming source in the streaming query.
Data sources should implement concrete data stream interfaces:
MicroBatchStream
andContinuousStream
.- Annotations
- @Evolving()
- Since
3.0.0
- trait SupportsAdmissionControl extends SparkDataStream
A mix-in interface for
SparkDataStream
streaming sources to signal that they can control the rate of data ingested into the system.A mix-in interface for
SparkDataStream
streaming sources to signal that they can control the rate of data ingested into the system. These rate limits can come implicitly from the contract of triggers, e.g. Trigger.Once() requires that a micro-batch process all data available to the system at the start of the micro-batch. Alternatively, sources can decide to limit ingest through data source options.Through this interface, a MicroBatchStream should be able to return the next offset that it will process until given a
ReadLimit
.- Annotations
- @Evolving()
- Since
3.0.0
- trait SupportsTriggerAvailableNow extends SupportsAdmissionControl
An interface for streaming sources that supports running in Trigger.AvailableNow mode, which will process all the available data at the beginning of the query in (possibly) multiple batches.
An interface for streaming sources that supports running in Trigger.AvailableNow mode, which will process all the available data at the beginning of the query in (possibly) multiple batches.
This mode will have better scalability comparing to Trigger.Once mode.
- Annotations
- @Evolving()
- Since
3.3.0