Interface StagingTableCatalog
- All Superinterfaces:
CatalogPlugin
,TableCatalog
TableCatalog
that support staging creation of
the a table before committing the table's metadata along with its contents in CREATE TABLE AS
SELECT or REPLACE TABLE AS SELECT operations.
It is highly recommended to implement this trait whenever possible so that CREATE TABLE AS
SELECT and REPLACE TABLE AS SELECT operations are atomic. For example, when one runs a REPLACE
TABLE AS SELECT operation, if the catalog does not implement this trait, the planner will first
drop the table via TableCatalog.dropTable(Identifier)
, then create the table via
TableCatalog.createTable(Identifier, StructType, Transform[], Map)
, and then perform
the write via SupportsWrite.newWriteBuilder(LogicalWriteInfo)
.
However, if the write operation fails, the catalog will have already dropped the table, and the
planner cannot roll back the dropping of the table.
If the catalog implements this plugin, the catalog can implement the methods to "stage" the
creation and the replacement of a table. After the table's
BatchWrite.commit(WriterCommitMessage[])
is called,
StagedTable.commitStagedChanges()
is called, at which point the staged table can
complete both the data write and the metadata swap operation atomically.
- Since:
- 3.0.0
-
Field Summary
Fields inherited from interface org.apache.spark.sql.connector.catalog.TableCatalog
OPTION_PREFIX, PROP_COMMENT, PROP_EXTERNAL, PROP_IS_MANAGED_LOCATION, PROP_LOCATION, PROP_OWNER, PROP_PROVIDER
-
Method Summary
Modifier and TypeMethodDescriptiondefault StagedTable
stageCreate
(Identifier ident, Column[] columns, Transform[] partitions, Map<String, String> properties) Stage the creation of a table, preparing it to be committed into the metastore.stageCreate
(Identifier ident, StructType schema, Transform[] partitions, Map<String, String> properties) Deprecated.default StagedTable
stageCreateOrReplace
(Identifier ident, Column[] columns, Transform[] partitions, Map<String, String> properties) Stage the creation or replacement of a table, preparing it to be committed into the metastore when the returned table'sStagedTable.commitStagedChanges()
is called.stageCreateOrReplace
(Identifier ident, StructType schema, Transform[] partitions, Map<String, String> properties) Stage the creation or replacement of a table, preparing it to be committed into the metastore when the returned table'sStagedTable.commitStagedChanges()
is called.default StagedTable
stageReplace
(Identifier ident, Column[] columns, Transform[] partitions, Map<String, String> properties) Stage the replacement of a table, preparing it to be committed into the metastore when the returned table'sStagedTable.commitStagedChanges()
is called.stageReplace
(Identifier ident, StructType schema, Transform[] partitions, Map<String, String> properties) Stage the replacement of a table, preparing it to be committed into the metastore when the returned table'sStagedTable.commitStagedChanges()
is called.Methods inherited from interface org.apache.spark.sql.connector.catalog.CatalogPlugin
defaultNamespace, initialize, name
Methods inherited from interface org.apache.spark.sql.connector.catalog.TableCatalog
alterTable, capabilities, createTable, createTable, dropTable, invalidateTable, listTables, loadTable, loadTable, loadTable, purgeTable, renameTable, tableExists, useNullableQuerySchema
-
Method Details
-
stageCreate
@Deprecated StagedTable stageCreate(Identifier ident, StructType schema, Transform[] partitions, Map<String, String> properties) throws org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException, org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceExceptionDeprecated.Stage the creation of a table, preparing it to be committed into the metastore.This is deprecated. Please override
stageCreate(Identifier, Column[], Transform[], Map)
instead.- Throws:
org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException
org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException
-
stageCreate
default StagedTable stageCreate(Identifier ident, Column[] columns, Transform[] partitions, Map<String, String> properties) throws org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException, org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceExceptionStage the creation of a table, preparing it to be committed into the metastore.When the table is committed, the contents of any writes performed by the Spark planner are committed along with the metadata about the table passed into this method's arguments. If the table exists when this method is called, the method should throw an exception accordingly. If another process concurrently creates the table before this table's staged changes are committed, an exception should be thrown by
StagedTable.commitStagedChanges()
.- Parameters:
ident
- a table identifiercolumns
- the column of the new tablepartitions
- transforms to use for partitioning data in the tableproperties
- a string map of table properties- Returns:
- metadata for the new table
- Throws:
org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException
- If a table or view already exists for the identifierUnsupportedOperationException
- If a requested partition transform is not supportedorg.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException
- If the identifier namespace does not exist (optional)
-
stageReplace
StagedTable stageReplace(Identifier ident, StructType schema, Transform[] partitions, Map<String, String> properties) throws org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException, org.apache.spark.sql.catalyst.analysis.NoSuchTableExceptionStage the replacement of a table, preparing it to be committed into the metastore when the returned table'sStagedTable.commitStagedChanges()
is called.This is deprecated, please override
stageReplace(Identifier, StructType, Transform[], Map)
instead.- Throws:
org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException
org.apache.spark.sql.catalyst.analysis.NoSuchTableException
-
stageReplace
default StagedTable stageReplace(Identifier ident, Column[] columns, Transform[] partitions, Map<String, String> properties) throws org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException, org.apache.spark.sql.catalyst.analysis.NoSuchTableExceptionStage the replacement of a table, preparing it to be committed into the metastore when the returned table'sStagedTable.commitStagedChanges()
is called.When the table is committed, the contents of any writes performed by the Spark planner are committed along with the metadata about the table passed into this method's arguments. If the table exists, the metadata and the contents of this table replace the metadata and contents of the existing table. If a concurrent process commits changes to the table's data or metadata while the write is being performed but before the staged changes are committed, the catalog can decide whether to move forward with the table replacement anyways or abort the commit operation.
If the table does not exist, committing the staged changes should fail with
NoSuchTableException
. This differs from the semantics ofstageCreateOrReplace(Identifier, StructType, Transform[], Map)
, which should create the table in the data source if the table does not exist at the time of committing the operation.- Parameters:
ident
- a table identifiercolumns
- the columns of the new tablepartitions
- transforms to use for partitioning data in the tableproperties
- a string map of table properties- Returns:
- metadata for the new table
- Throws:
UnsupportedOperationException
- If a requested partition transform is not supportedorg.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException
- If the identifier namespace does not exist (optional)org.apache.spark.sql.catalyst.analysis.NoSuchTableException
- If the table does not exist
-
stageCreateOrReplace
StagedTable stageCreateOrReplace(Identifier ident, StructType schema, Transform[] partitions, Map<String, String> properties) throws org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceExceptionStage the creation or replacement of a table, preparing it to be committed into the metastore when the returned table'sStagedTable.commitStagedChanges()
is called.This is deprecated, please override
stageCreateOrReplace(Identifier, Column[], Transform[], Map)
instead.- Throws:
org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException
-
stageCreateOrReplace
default StagedTable stageCreateOrReplace(Identifier ident, Column[] columns, Transform[] partitions, Map<String, String> properties) throws org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceExceptionStage the creation or replacement of a table, preparing it to be committed into the metastore when the returned table'sStagedTable.commitStagedChanges()
is called.When the table is committed, the contents of any writes performed by the Spark planner are committed along with the metadata about the table passed into this method's arguments. If the table exists, the metadata and the contents of this table replace the metadata and contents of the existing table. If a concurrent process commits changes to the table's data or metadata while the write is being performed but before the staged changes are committed, the catalog can decide whether to move forward with the table replacement anyways or abort the commit operation.
If the table does not exist when the changes are committed, the table should be created in the backing data source. This differs from the expected semantics of
stageReplace(Identifier, StructType, Transform[], Map)
, which should fail when the staged changes are committed but the table doesn't exist at commit time.- Parameters:
ident
- a table identifiercolumns
- the columns of the new tablepartitions
- transforms to use for partitioning data in the tableproperties
- a string map of table properties- Returns:
- metadata for the new table
- Throws:
UnsupportedOperationException
- If a requested partition transform is not supportedorg.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException
- If the identifier namespace does not exist (optional)
-