@InterfaceStability.Evolving public interface StreamWriter extends DataSourceWriter
DataSourceWriter
for use with structured streaming. This writer handles commits and
aborts relative to an epoch ID determined by the execution engine.
DataWriter
implementations generated by a StreamWriter may be reused for multiple epochs,
and so must reset any internal state after a successful commit.Modifier and Type | Method and Description |
---|---|
void |
abort(long epochId,
WriterCommitMessage[] messages)
Aborts this writing job because some data writers are failed and keep failing when retry, or
the Spark job fails with some unknown reasons, or
commit(WriterCommitMessage[]) fails. |
default void |
abort(WriterCommitMessage[] messages)
Aborts this writing job because some data writers are failed and keep failing when retry, or
the Spark job fails with some unknown reasons, or
DataSourceWriter.commit(WriterCommitMessage[]) fails. |
void |
commit(long epochId,
WriterCommitMessage[] messages)
Commits this writing job for the specified epoch with a list of commit messages.
|
default void |
commit(WriterCommitMessage[] messages)
Commits this writing job with a list of commit messages.
|
createWriterFactory
void commit(long epochId, WriterCommitMessage[] messages)
DataWriter.commit()
.
If this method fails (by throwing an exception), this writing job is considered to have been
failed, and the execution engine will attempt to call abort(WriterCommitMessage[])
.
To support exactly-once processing, writer implementations should ensure that this method is
idempotent. The execution engine may call commit() multiple times for the same epoch
in some circumstances.void abort(long epochId, WriterCommitMessage[] messages)
commit(WriterCommitMessage[])
fails.
If this method fails (by throwing an exception), the underlying data source may require manual
cleanup.
Unless the abort is triggered by the failure of commit, the given messages should have some
null slots as there maybe only a few data writers that are committed before the abort
happens, or some data writers were committed but their commit messages haven't reached the
driver when the abort is triggered. So this is just a "best effort" for data sources to
clean up the data left by data writers.default void commit(WriterCommitMessage[] messages)
DataSourceWriter
DataWriter.commit()
.
If this method fails (by throwing an exception), this writing job is considered to to have been
failed, and DataSourceWriter.abort(WriterCommitMessage[])
would be called. The state of the destination
is undefined and @DataSourceWriter.abort(WriterCommitMessage[])
may not be able to deal with it.
Note that, one partition may have multiple committed data writers because of speculative tasks.
Spark will pick the first successful one and get its commit message. Implementations should be
aware of this and handle it correctly, e.g., have a coordinator to make sure only one data
writer can commit, or have a way to clean up the data of already-committed writers.commit
in interface DataSourceWriter
default void abort(WriterCommitMessage[] messages)
DataSourceWriter
DataSourceWriter.commit(WriterCommitMessage[])
fails.
If this method fails (by throwing an exception), the underlying data source may require manual
cleanup.
Unless the abort is triggered by the failure of commit, the given messages should have some
null slots as there maybe only a few data writers that are committed before the abort
happens, or some data writers were committed but their commit messages haven't reached the
driver when the abort is triggered. So this is just a "best effort" for data sources to
clean up the data left by data writers.abort
in interface DataSourceWriter