All Superinterfaces:: CatalogPlugin, TableCatalog

@Evolving public interface StagingTableCatalog extends TableCatalog

An optional mix-in for implementations of TableCatalog that support staging creation of the a table before committing the table's metadata along with its contents in CREATE TABLE AS SELECT or REPLACE TABLE AS SELECT operations.

It is highly recommended to implement this trait whenever possible so that CREATE TABLE AS SELECT and REPLACE TABLE AS SELECT operations are atomic. For example, when one runs a REPLACE TABLE AS SELECT operation, if the catalog does not implement this trait, the planner will first drop the table via TableCatalog.dropTable(Identifier), then create the table via TableCatalog.createTable(Identifier, StructType, Transform[], Map), and then perform the write via SupportsWrite.newWriteBuilder(LogicalWriteInfo). However, if the write operation fails, the catalog will have already dropped the table, and the planner cannot roll back the dropping of the table.

If the catalog implements this plugin, the catalog can implement the methods to "stage" the creation and the replacement of a table. After the table's BatchWrite.commit(WriterCommitMessage[]) is called, StagedTable.commitStagedChanges() is called, at which point the staged table can complete both the data write and the metadata swap operation atomically.

Since:: 3.0.0

Field Summary

Fields inherited from interface org.apache.spark.sql.connector.catalog.TableCatalog
OPTION_PREFIX, PROP_COMMENT, PROP_EXTERNAL, PROP_IS_MANAGED_LOCATION, PROP_LOCATION, PROP_OWNER, PROP_PROVIDER
Method Summary

Modifier and Type

Method

Description

default StagedTable

stageCreate(Identifier ident, Column[] columns, Transform[] partitions, Map<String,String> properties)

Stage the creation of a table, preparing it to be committed into the metastore.

StagedTable

stageCreate(Identifier ident, StructType schema, Transform[] partitions, Map<String,String> properties)

Deprecated.

default StagedTable

stageCreateOrReplace(Identifier ident, Column[] columns, Transform[] partitions, Map<String,String> properties)

Stage the creation or replacement of a table, preparing it to be committed into the metastore when the returned table's StagedTable.commitStagedChanges() is called.

StagedTable

stageCreateOrReplace(Identifier ident, StructType schema, Transform[] partitions, Map<String,String> properties)

Stage the creation or replacement of a table, preparing it to be committed into the metastore when the returned table's StagedTable.commitStagedChanges() is called.

default StagedTable

stageReplace(Identifier ident, Column[] columns, Transform[] partitions, Map<String,String> properties)

Stage the replacement of a table, preparing it to be committed into the metastore when the returned table's StagedTable.commitStagedChanges() is called.

StagedTable

stageReplace(Identifier ident, StructType schema, Transform[] partitions, Map<String,String> properties)

Stage the replacement of a table, preparing it to be committed into the metastore when the returned table's StagedTable.commitStagedChanges() is called.

Methods inherited from interface org.apache.spark.sql.connector.catalog.CatalogPlugin
defaultNamespace, initialize, name

Methods inherited from interface org.apache.spark.sql.connector.catalog.TableCatalog
alterTable, capabilities, createTable, createTable, dropTable, invalidateTable, listTables, loadTable, loadTable, loadTable, purgeTable, renameTable, tableExists, useNullableQuerySchema

Method Details
- stageCreate
  
  @Deprecated StagedTable stageCreate(Identifier ident, StructType schema, Transform[] partitions, Map<String,String> properties) throws org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException, org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException
  
  Deprecated.
  
  Stage the creation of a table, preparing it to be committed into the metastore.
  This is deprecated. Please override stageCreate(Identifier, Column[], Transform[], Map) instead.
  
  Throws:
  
  org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException
  
  org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException
- stageCreate
  
  default StagedTable stageCreate(Identifier ident, Column[] columns, Transform[] partitions, Map<String,String> properties) throws org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException, org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException
  
  Stage the creation of a table, preparing it to be committed into the metastore.
  When the table is committed, the contents of any writes performed by the Spark planner are committed along with the metadata about the table passed into this method's arguments. If the table exists when this method is called, the method should throw an exception accordingly. If another process concurrently creates the table before this table's staged changes are committed, an exception should be thrown by StagedTable.commitStagedChanges().
  
  Parameters:
  
  ident - a table identifier
  
  columns - the column of the new table
  
  partitions - transforms to use for partitioning data in the table
  
  properties - a string map of table properties
  
  Returns:
  
  metadata for the new table
  
  Throws:
  
  org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException - If a table or view already exists for the identifier
  
  UnsupportedOperationException - If a requested partition transform is not supported
  
  org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException - If the identifier namespace does not exist (optional)
- stageReplace
  
  StagedTable stageReplace(Identifier ident, StructType schema, Transform[] partitions, Map<String,String> properties) throws org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException, org.apache.spark.sql.catalyst.analysis.NoSuchTableException
  
  Stage the replacement of a table, preparing it to be committed into the metastore when the returned table's StagedTable.commitStagedChanges() is called.
  This is deprecated, please override stageReplace(Identifier, StructType, Transform[], Map) instead.
  
  Throws:
  
  org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException
  
  org.apache.spark.sql.catalyst.analysis.NoSuchTableException
- stageReplace
  
  default StagedTable stageReplace(Identifier ident, Column[] columns, Transform[] partitions, Map<String,String> properties) throws org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException, org.apache.spark.sql.catalyst.analysis.NoSuchTableException
  
  Stage the replacement of a table, preparing it to be committed into the metastore when the returned table's StagedTable.commitStagedChanges() is called.
  When the table is committed, the contents of any writes performed by the Spark planner are committed along with the metadata about the table passed into this method's arguments. If the table exists, the metadata and the contents of this table replace the metadata and contents of the existing table. If a concurrent process commits changes to the table's data or metadata while the write is being performed but before the staged changes are committed, the catalog can decide whether to move forward with the table replacement anyways or abort the commit operation.
  If the table does not exist, committing the staged changes should fail with NoSuchTableException. This differs from the semantics of stageCreateOrReplace(Identifier, StructType, Transform[], Map), which should create the table in the data source if the table does not exist at the time of committing the operation.
  
  Parameters:
  
  ident - a table identifier
  
  columns - the columns of the new table
  
  partitions - transforms to use for partitioning data in the table
  
  properties - a string map of table properties
  
  Returns:
  
  metadata for the new table
  
  Throws:
  
  UnsupportedOperationException - If a requested partition transform is not supported
  
  org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException - If the identifier namespace does not exist (optional)
  
  org.apache.spark.sql.catalyst.analysis.NoSuchTableException - If the table does not exist
- stageCreateOrReplace
  
  StagedTable stageCreateOrReplace(Identifier ident, StructType schema, Transform[] partitions, Map<String,String> properties) throws org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException
  
  Stage the creation or replacement of a table, preparing it to be committed into the metastore when the returned table's StagedTable.commitStagedChanges() is called.
  This is deprecated, please override stageCreateOrReplace(Identifier, Column[], Transform[], Map) instead.
  
  Throws:
  
  org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException
- stageCreateOrReplace
  
  default StagedTable stageCreateOrReplace(Identifier ident, Column[] columns, Transform[] partitions, Map<String,String> properties) throws org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException
  
  Stage the creation or replacement of a table, preparing it to be committed into the metastore when the returned table's StagedTable.commitStagedChanges() is called.
  When the table is committed, the contents of any writes performed by the Spark planner are committed along with the metadata about the table passed into this method's arguments. If the table exists, the metadata and the contents of this table replace the metadata and contents of the existing table. If a concurrent process commits changes to the table's data or metadata while the write is being performed but before the staged changes are committed, the catalog can decide whether to move forward with the table replacement anyways or abort the commit operation.
  If the table does not exist when the changes are committed, the table should be created in the backing data source. This differs from the expected semantics of stageReplace(Identifier, StructType, Transform[], Map), which should fail when the staged changes are committed but the table doesn't exist at commit time.
  
  Parameters:
  
  ident - a table identifier
  
  columns - the columns of the new table
  
  partitions - transforms to use for partitioning data in the table
  
  properties - a string map of table properties
  
  Returns:
  
  metadata for the new table
  
  Throws:
  
  UnsupportedOperationException - If a requested partition transform is not supported
  
  org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException - If the identifier namespace does not exist (optional)

Interface StagingTableCatalog

Field Summary

Fields inherited from interface org.apache.spark.sql.connector.catalog.TableCatalog

Method Summary

Methods inherited from interface org.apache.spark.sql.connector.catalog.CatalogPlugin

Methods inherited from interface org.apache.spark.sql.connector.catalog.TableCatalog

Method Details

stageCreate

stageCreate

stageReplace

stageReplace

stageCreateOrReplace

stageCreateOrReplace