package write
- Alphabetic
- Public
- Protected
Type Members
-    trait BatchWrite extends AnyRefAn interface that defines how to write the data to data source for batch processing. An interface that defines how to write the data to data source for batch processing. The writing procedure is: - Create a writer factory by #createBatchWriterFactory(PhysicalWriteInfo), serialize and send it to all the partitions of the input data(RDD).
- For each partition, create the data writer, and write the data of the partition with this
    writer. If all the data are written successfully, call DataWriter#commit(). If exception happens during the writing, callDataWriter#abort().
- If all writers are successfully committed, call #commit(WriterCommitMessage[]). If some writers are aborted, or the job failed with an unknown reason, call#abort(WriterCommitMessage[]).
 While Spark will retry failed writing tasks, Spark won't retry failed writing jobs. Users should do it manually in their Spark applications if they want to retry. Please refer to the documentation of commit/abort methods for detailed specifications. - Annotations
- @Evolving()
- Since
- 3.0.0 
 
- Create a writer factory by 
-    trait DataWriter[T] extends CloseableA data writer returned by long)and is responsible for writing data for an input RDD partition.A data writer returned by long)and is responsible for writing data for an input RDD partition.One Spark task has one exclusive data writer, so there is no thread-safe concern. #write(Object)is called for each record in the input RDD partition. If one record fails the#write(Object),#abort()is called afterwards and the remaining records will not be processed. If all records are successfully written,#commit()is called.Once a data writer returns successfully from #commit()or#abort(), Spark will call#close()to let DataWriter doing resource cleanup. After calling#close(), its lifecycle is over and Spark will not use it again.If this data writer succeeds(all records are successfully written and #commit()succeeds), aWriterCommitMessagewill be sent to the driver side and pass toBatchWrite#commit(WriterCommitMessage[])with commit messages from other data writers. If this data writer fails(one record fails to write or#commit()fails), an exception will be sent to the driver side, and Spark may retry this writing task a few times. In each retry,long)will receive a differenttaskId. Spark will callBatchWrite#abort(WriterCommitMessage[])when the configured number of retries is exhausted.Besides the retry mechanism, Spark may launch speculative tasks if the existing writing task takes too long to finish. Different from retried tasks, which are launched one by one after the previous one fails, speculative tasks are running simultaneously. It's possible that one input RDD partition has multiple data writers with different taskIdrunning at the same time, and data sources should guarantee that these data writers don't conflict and can work together. Implementations can coordinate with driver during#commit()to make sure only one of these data writers can commit successfully. Or implementations can allow all of them to commit successfully, and have a way to revert committed data writers without the commit message, because Spark only accepts the commit message that arrives first and ignore others.Note that, Currently the type Tcan only beorg.apache.spark.sql.catalyst.InternalRow.- Annotations
- @Evolving()
- Since
- 3.0.0 
 
-    trait DataWriterFactory extends SerializableA factory of DataWriterreturned byBatchWrite#createBatchWriterFactory(PhysicalWriteInfo), which is responsible for creating and initializing the actual data writer at executor side.A factory of DataWriterreturned byBatchWrite#createBatchWriterFactory(PhysicalWriteInfo), which is responsible for creating and initializing the actual data writer at executor side.Note that, the writer factory will be serialized and sent to executors, then the data writer will be created on executors and do the actual writing. So this interface must be serializable and DataWriterdoesn't need to be.- Annotations
- @Evolving()
- Since
- 3.0.0 
 
-    trait DeltaBatchWrite extends BatchWriteAn interface that defines how to write a delta of rows during batch processing. An interface that defines how to write a delta of rows during batch processing. - Annotations
- @Experimental()
- Since
- 3.4.0 
 
-    trait DeltaWrite extends WriteA logical representation of a data source write that handles a delta of rows. A logical representation of a data source write that handles a delta of rows. - Annotations
- @Experimental()
- Since
- 3.4.0 
 
-    trait DeltaWriteBuilder extends WriteBuilderAn interface for building a DeltaWrite.An interface for building a DeltaWrite.- Annotations
- @Experimental()
- Since
- 3.4.0 
 
-    trait DeltaWriter[T] extends DataWriter[T]A data writer returned by long)and is responsible for writing a delta of rows.A data writer returned by long)and is responsible for writing a delta of rows.- Annotations
- @Experimental()
- Since
- 3.4.0 
 
-    trait DeltaWriterFactory extends DataWriterFactoryA factory for creating DeltaWriters returned byDeltaBatchWrite#createBatchWriterFactory(PhysicalWriteInfo), which is responsible for creating and initializing writers at the executor side.A factory for creating DeltaWriters returned byDeltaBatchWrite#createBatchWriterFactory(PhysicalWriteInfo), which is responsible for creating and initializing writers at the executor side.- Annotations
- @Experimental()
- Since
- 3.4.0 
 
-    trait LogicalWriteInfo extends AnyRefThis interface contains logical write information that data sources can use when generating a WriteBuilder.This interface contains logical write information that data sources can use when generating a WriteBuilder.- Annotations
- @Evolving()
- Since
- 3.0.0 
 
-    trait MergeSummary extends WriteSummaryProvides an informational summary of the MERGE operation producing write. Provides an informational summary of the MERGE operation producing write. - Annotations
- @Evolving()
- Since
- 4.1.0 
 
-    trait PhysicalWriteInfo extends AnyRefThis interface contains physical write information that data sources can use when generating a DataWriterFactoryor aStreamingDataWriterFactory.This interface contains physical write information that data sources can use when generating a DataWriterFactoryor aStreamingDataWriterFactory.- Annotations
- @Evolving()
- Since
- 3.0.0 
 
-    trait RequiresDistributionAndOrdering extends WriteA write that requires a specific distribution and ordering of data. A write that requires a specific distribution and ordering of data. - Annotations
- @Experimental()
- Since
- 3.2.0 
 
-    trait RowLevelOperation extends AnyRefA logical representation of a data source DELETE, UPDATE, or MERGE operation that requires rewriting data. A logical representation of a data source DELETE, UPDATE, or MERGE operation that requires rewriting data. - Annotations
- @Experimental()
- Since
- 3.3.0 
 
-    trait RowLevelOperationBuilder extends AnyRefAn interface for building a RowLevelOperation.An interface for building a RowLevelOperation.- Annotations
- @Experimental()
- Since
- 3.3.0 
 
-    trait RowLevelOperationInfo extends AnyRefAn interface with logical information for a row-level operation such as DELETE, UPDATE, MERGE. An interface with logical information for a row-level operation such as DELETE, UPDATE, MERGE. - Annotations
- @Experimental()
- Since
- 3.3.0 
 
-    trait SupportsDelta extends RowLevelOperationA mix-in interface for RowLevelOperation.A mix-in interface for RowLevelOperation. Data sources can implement this interface to indicate they support handling deltas of rows.- Annotations
- @Experimental()
- Since
- 3.4.0 
 
-    trait SupportsDynamicOverwrite extends WriteBuilderWrite builder trait for tables that support dynamic partition overwrite. Write builder trait for tables that support dynamic partition overwrite. A write that dynamically overwrites partitions removes all existing data in each logical partition for which the write will commit new data. Any existing logical partition for which the write does not contain data will remain unchanged. This is provided to implement SQL compatible with Hive table operations but is not recommended. Instead, use the overwrite by filter APIto explicitly replace data.- Annotations
- @Evolving()
- Since
- 3.0.0 
 
-    trait SupportsOverwrite extends SupportsOverwriteV2Write builder trait for tables that support overwrite by filter. Write builder trait for tables that support overwrite by filter. Overwriting data by filter will delete any data that matches the filter and replace it with data that is committed in the write. - Annotations
- @Evolving()
- Since
- 3.0.0 
 
-    trait SupportsOverwriteV2 extends WriteBuilder with SupportsTruncateWrite builder trait for tables that support overwrite by filter. Write builder trait for tables that support overwrite by filter. Overwriting data by filter will delete any data that matches the filter and replace it with data that is committed in the write. - Annotations
- @Evolving()
- Since
- 3.4.0 
 
-    trait SupportsTruncate extends WriteBuilderWrite builder trait for tables that support truncation. Write builder trait for tables that support truncation. Truncation removes all data in a table and replaces it with data that is committed in the write. - Annotations
- @Evolving()
- Since
- 3.0.0 
 
-    trait V1Write extends WriteA logical write that should be executed using V1 InsertableRelation interface. A logical write that should be executed using V1 InsertableRelation interface. Tables that have TableCapability#V1_BATCH_WRITEin the list of their capabilities must buildV1Write.- Annotations
- @Unstable()
- Since
- 3.2.0 
 
-    trait Write extends AnyRefA logical representation of a data source write. A logical representation of a data source write. This logical representation is shared between batch and streaming write. Data sources must implement the corresponding methods in this interface to match what the table promises to support. For example, #toBatch()must be implemented if theTablethat creates thisWritereturnsTableCapability#BATCH_WRITEsupport in itsTable#capabilities().- Annotations
- @Evolving()
- Since
- 3.2.0 
 
-    trait WriteBuilder extends AnyRefAn interface for building the Write.
-    trait WriteSummary extends AnyRefAn informational summary of the operation producing write. An informational summary of the operation producing write. - Annotations
- @Evolving()
- Since
- 4.1.0 
 
-    trait WriterCommitMessage extends SerializableA commit message returned by DataWriter#commit()and will be sent back to the driver side as the input parameter ofBatchWrite#commit(WriterCommitMessage[])orWriterCommitMessage[]).A commit message returned by DataWriter#commit()and will be sent back to the driver side as the input parameter ofBatchWrite#commit(WriterCommitMessage[])orWriterCommitMessage[]).This is an empty interface, data sources should define their own message class and use it when generating messages at executor side and handling the messages at driver side. - Annotations
- @Evolving()
- Since
- 3.0.0 
 
Value Members
-  object LogicalWriteInfoImpl extends Serializable