package streaming
Type Members
- trait StreamingDataWriterFactory extends Serializable
A factory of
DataWriter
returned byStreamingWrite#createStreamingWriterFactory(PhysicalWriteInfo)
, which is responsible for creating and initializing the actual data writer at executor side.A factory of
DataWriter
returned byStreamingWrite#createStreamingWriterFactory(PhysicalWriteInfo)
, which is responsible for creating and initializing the actual data writer at executor side.Note that, the writer factory will be serialized and sent to executors, then the data writer will be created on executors and do the actual writing. So this interface must be serializable and
DataWriter
doesn't need to be.- Annotations
- @Evolving()
- Since
3.0.0
- trait StreamingWrite extends AnyRef
An interface that defines how to write the data to data source in streaming queries.
An interface that defines how to write the data to data source in streaming queries.
The writing procedure is:
- Create a writer factory by
#createStreamingWriterFactory(PhysicalWriteInfo)
, serialize and send it to all the partitions of the input data(RDD). - For each epoch in each partition, create the data writer, and write the data of the
epoch in the partition with this writer. If all the data are written successfully, call
DataWriter#commit()
. If exception happens during the writing, callDataWriter#abort()
. - If writers in all partitions of one epoch are successfully committed, call
WriterCommitMessage[])
. If some writers are aborted, or the job failed with an unknown reason, callWriterCommitMessage[])
.
While Spark will retry failed writing tasks, Spark won't retry failed writing jobs. Users should do it manually in their Spark applications if they want to retry.
Please refer to the documentation of commit/abort methods for detailed specifications.
- Annotations
- @Evolving()
- Since
3.0.0
- Create a writer factory by