@InterfaceStability.Evolving
public interface DataWriterFactory<T>
extends java.io.Serializable
DataWriter
returned by DataSourceWriter.createWriterFactory()
,
which is responsible for creating and initializing the actual data writer at executor side.
Note that, the writer factory will be serialized and sent to executors, then the data writer
will be created on executors and do the actual writing. So DataWriterFactory
must be
serializable and DataWriter
doesn't need to be.Modifier and Type | Method and Description |
---|---|
DataWriter<T> |
createDataWriter(int partitionId,
int attemptNumber)
Returns a data writer to do the actual writing work.
|
DataWriter<T> createDataWriter(int partitionId, int attemptNumber)
partitionId
- A unique id of the RDD partition that the returned writer will process.
Usually Spark processes many RDD partitions at the same time,
implementations should use the partition id to distinguish writers for
different partitions.attemptNumber
- Spark may launch multiple tasks with the same task id. For example, a task
failed, Spark launches a new task wth the same task id but different
attempt number. Or a task is too slow, Spark launches new tasks wth the
same task id but different attempt number, which means there are multiple
tasks with the same task id running at the same time. Implementations can
use this attempt number to distinguish writers of different task attempts.