DataSourceWriter (Spark 2.3.0 JavaDoc)

All Known Subinterfaces:

StreamWriter, SupportsWriteInternalRow
```
@InterfaceStability.Evolving
public interface DataSourceWriter
```
A data source writer that is returned by WriteSupport.createWriter(String, StructType, SaveMode, DataSourceOptions)/ StreamWriteSupport.createStreamWriter( String, StructType, OutputMode, DataSourceOptions). It can mix in various writing optimization interfaces to speed up the data saving. The actual writing logic is delegated to DataWriter. If an exception was throw when applying any of these writing optimizations, the action would fail and no Spark job was submitted. The writing procedure is: 1. Create a writer factory by createWriterFactory(), serialize and send it to all the partitions of the input data(RDD). 2. For each partition, create the data writer, and write the data of the partition with this writer. If all the data are written successfully, call DataWriter.commit(). If exception happens during the writing, call DataWriter.abort(). 3. If all writers are successfully committed, call commit(WriterCommitMessage[]). If some writers are aborted, or the job failed with an unknown reason, call abort(WriterCommitMessage[]). While Spark will retry failed writing tasks, Spark won't retry failed writing jobs. Users should do it manually in their Spark applications if they want to retry. Please refer to the documentation of commit/abort methods for detailed specifications.

Method Summary

All Methods Instance Methods Abstract Methods
Modifier and Type	Method and Description
`void`	`abort(WriterCommitMessage[] messages)` Aborts this writing job because some data writers are failed and keep failing when retry, or the Spark job fails with some unknown reasons, or `commit(WriterCommitMessage[])` fails.
`void`	`commit(WriterCommitMessage[] messages)` Commits this writing job with a list of commit messages.
`DataWriterFactory<Row>`	`createWriterFactory()` Creates a writer factory which will be serialized and sent to executors.

- Method Detail
  - createWriterFactory
```
DataWriterFactory<Row> createWriterFactory()
```
    Creates a writer factory which will be serialized and sent to executors. If this method fails (by throwing an exception), the action would fail and no Spark job was submitted.
  - commit
```
void commit(WriterCommitMessage[] messages)
```
    Commits this writing job with a list of commit messages. The commit messages are collected from successful data writers and are produced by DataWriter.commit(). If this method fails (by throwing an exception), this writing job is considered to to have been failed, and abort(WriterCommitMessage[]) would be called. The state of the destination is undefined and @abort(WriterCommitMessage[]) may not be able to deal with it. Note that, one partition may have multiple committed data writers because of speculative tasks. Spark will pick the first successful one and get its commit message. Implementations should be aware of this and handle it correctly, e.g., have a coordinator to make sure only one data writer can commit, or have a way to clean up the data of already-committed writers.
  - abort
```
void abort(WriterCommitMessage[] messages)
```
    Aborts this writing job because some data writers are failed and keep failing when retry, or the Spark job fails with some unknown reasons, or commit(WriterCommitMessage[]) fails. If this method fails (by throwing an exception), the underlying data source may require manual cleanup. Unless the abort is triggered by the failure of commit, the given messages should have some null slots as there maybe only a few data writers that are committed before the abort happens, or some data writers were committed but their commit messages haven't reached the driver when the abort is triggered. So this is just a "best effort" for data sources to clean up the data left by data writers.

Interface DataSourceWriter

Method Summary

Method Detail

createWriterFactory

commit

abort