Packages

package streaming

Type Members

  1. trait AcceptsLatestSeenOffset extends SparkDataStream

    Indicates that the source accepts the latest seen offset, which requires streaming execution to provide the latest seen offset when restarting the streaming query from checkpoint.

    Indicates that the source accepts the latest seen offset, which requires streaming execution to provide the latest seen offset when restarting the streaming query from checkpoint.

    Note that this interface aims to only support DSv2 streaming sources. Spark may throw error if the interface is implemented along with DSv1 streaming sources.

    The callback method will be called once per run.

    Since

    3.3.0

  2. final class CompositeReadLimit extends ReadLimit

    /** Represents a ReadLimit where the MicroBatchStream should scan approximately given maximum number of rows with at least the given minimum number of rows.

    /** Represents a ReadLimit where the MicroBatchStream should scan approximately given maximum number of rows with at least the given minimum number of rows.

    Annotations
    @Evolving()
    Since

    3.2.0

    See also

    SupportsAdmissionControl#latestOffset(Offset, ReadLimit)

  3. trait ContinuousPartitionReader[T] extends PartitionReader[T]

    A variation on PartitionReader for use with continuous streaming processing.

    A variation on PartitionReader for use with continuous streaming processing.

    Annotations
    @Evolving()
    Since

    3.0.0

  4. trait ContinuousPartitionReaderFactory extends PartitionReaderFactory

    A variation on PartitionReaderFactory that returns ContinuousPartitionReader instead of PartitionReader.

    A variation on PartitionReaderFactory that returns ContinuousPartitionReader instead of PartitionReader. It's used for continuous streaming processing.

    Annotations
    @Evolving()
    Since

    3.0.0

  5. trait ContinuousStream extends SparkDataStream

    A SparkDataStream for streaming queries with continuous mode.

    A SparkDataStream for streaming queries with continuous mode.

    Annotations
    @Evolving()
    Since

    3.0.0

  6. trait MicroBatchStream extends SparkDataStream

    A SparkDataStream for streaming queries with micro-batch mode.

    A SparkDataStream for streaming queries with micro-batch mode.

    Annotations
    @Evolving()
    Since

    3.0.0

  7. abstract class Offset extends AnyRef

    An abstract representation of progress through a MicroBatchStream or ContinuousStream.

    An abstract representation of progress through a MicroBatchStream or ContinuousStream.

    During execution, offsets provided by the data source implementation will be logged and used as restart checkpoints. Each source should provide an offset implementation which the source can use to reconstruct a position in the stream up to which data has been seen/processed.

    Annotations
    @Evolving()
    Since

    3.0.0

  8. trait PartitionOffset extends Serializable

    Used for per-partition offsets in continuous processing.

    Used for per-partition offsets in continuous processing. ContinuousReader implementations will provide a method to merge these into a global Offset.

    These offsets must be serializable.

    Annotations
    @Evolving()
    Since

    3.0.0

  9. final class ReadAllAvailable extends ReadLimit

    Represents a ReadLimit where the MicroBatchStream must scan all the data available at the streaming source.

    Represents a ReadLimit where the MicroBatchStream must scan all the data available at the streaming source. This is meant to be a hard specification as being able to return all available data is necessary for Trigger.Once() to work correctly. If a source is unable to scan all available data, then it must throw an error.

    Annotations
    @Evolving()
    Since

    3.0.0

    See also

    SupportsAdmissionControl#latestOffset(Offset, ReadLimit)

  10. trait ReadLimit extends AnyRef

    Interface representing limits on how much to read from a MicroBatchStream when it implements SupportsAdmissionControl.

    Interface representing limits on how much to read from a MicroBatchStream when it implements SupportsAdmissionControl. There are several child interfaces representing various kinds of limits.

    Annotations
    @Evolving()
    Since

    3.0.0

    See also

    SupportsAdmissionControl#latestOffset(Offset, ReadLimit)

    ReadAllAvailable

    ReadMaxRows

  11. class ReadMaxBytes extends ReadLimit

    Represents a ReadLimit where the MicroBatchStream should scan files which total size doesn't go beyond a given maximum total size.

    Represents a ReadLimit where the MicroBatchStream should scan files which total size doesn't go beyond a given maximum total size. Always reads at least one file so a stream can make progress and not get stuck on a file larger than a given maximum.

    Annotations
    @Evolving()
    Since

    4.0.0

    See also

    SupportsAdmissionControl#latestOffset(Offset, ReadLimit)

  12. class ReadMaxFiles extends ReadLimit

    Represents a ReadLimit where the MicroBatchStream should scan approximately the given maximum number of files.

    Represents a ReadLimit where the MicroBatchStream should scan approximately the given maximum number of files.

    Annotations
    @Evolving()
    Since

    3.0.0

    See also

    SupportsAdmissionControl#latestOffset(Offset, ReadLimit)

  13. final class ReadMaxRows extends ReadLimit

    Represents a ReadLimit where the MicroBatchStream should scan approximately the given maximum number of rows.

    Represents a ReadLimit where the MicroBatchStream should scan approximately the given maximum number of rows.

    Annotations
    @Evolving()
    Since

    3.0.0

    See also

    SupportsAdmissionControl#latestOffset(Offset, ReadLimit)

  14. final class ReadMinRows extends ReadLimit

    Represents a ReadLimit where the MicroBatchStream should scan approximately at least the given minimum number of rows.

    Represents a ReadLimit where the MicroBatchStream should scan approximately at least the given minimum number of rows.

    Annotations
    @Evolving()
    Since

    3.2.0

    See also

    SupportsAdmissionControl#latestOffset(Offset, ReadLimit)

  15. trait ReportsSinkMetrics extends AnyRef

    A mix-in interface for streaming sinks to signal that they can report metrics.

    A mix-in interface for streaming sinks to signal that they can report metrics.

    Annotations
    @Evolving()
    Since

    3.4.0

  16. trait ReportsSourceMetrics extends SparkDataStream

    A mix-in interface for SparkDataStream streaming sources to signal that they can report metrics.

    A mix-in interface for SparkDataStream streaming sources to signal that they can report metrics.

    Annotations
    @Evolving()
    Since

    3.2.0

  17. trait SparkDataStream extends AnyRef

    The base interface representing a readable data stream in a Spark streaming query.

    The base interface representing a readable data stream in a Spark streaming query. It's responsible to manage the offsets of the streaming source in the streaming query.

    Data sources should implement concrete data stream interfaces: MicroBatchStream and ContinuousStream.

    Annotations
    @Evolving()
    Since

    3.0.0

  18. trait SupportsAdmissionControl extends SparkDataStream

    A mix-in interface for SparkDataStream streaming sources to signal that they can control the rate of data ingested into the system.

    A mix-in interface for SparkDataStream streaming sources to signal that they can control the rate of data ingested into the system. These rate limits can come implicitly from the contract of triggers, e.g. Trigger.Once() requires that a micro-batch process all data available to the system at the start of the micro-batch. Alternatively, sources can decide to limit ingest through data source options.

    Through this interface, a MicroBatchStream should be able to return the next offset that it will process until given a ReadLimit.

    Annotations
    @Evolving()
    Since

    3.0.0

  19. trait SupportsTriggerAvailableNow extends SupportsAdmissionControl

    An interface for streaming sources that supports running in Trigger.AvailableNow mode, which will process all the available data at the beginning of the query in (possibly) multiple batches.

    An interface for streaming sources that supports running in Trigger.AvailableNow mode, which will process all the available data at the beginning of the query in (possibly) multiple batches.

    This mode will have better scalability comparing to Trigger.Once mode.

    Annotations
    @Evolving()
    Since

    3.3.0

Ungrouped