Interface StagingTableCatalog

All Superinterfaces:
CatalogPlugin, TableCatalog

@Evolving public interface StagingTableCatalog extends TableCatalog
An optional mix-in for implementations of TableCatalog that support staging creation of the a table before committing the table's metadata along with its contents in CREATE TABLE AS SELECT or REPLACE TABLE AS SELECT operations.

It is highly recommended to implement this trait whenever possible so that CREATE TABLE AS SELECT and REPLACE TABLE AS SELECT operations are atomic. For example, when one runs a REPLACE TABLE AS SELECT operation, if the catalog does not implement this trait, the planner will first drop the table via TableCatalog.dropTable(Identifier), then create the table via TableCatalog.createTable(Identifier, StructType, Transform[], Map), and then perform the write via SupportsWrite.newWriteBuilder(LogicalWriteInfo). However, if the write operation fails, the catalog will have already dropped the table, and the planner cannot roll back the dropping of the table.

If the catalog implements this plugin, the catalog can implement the methods to "stage" the creation and the replacement of a table. After the table's BatchWrite.commit(WriterCommitMessage[]) is called, StagedTable.commitStagedChanges() is called, at which point the staged table can complete both the data write and the metadata swap operation atomically.

Since:
3.0.0
  • Method Details

    • stageCreate

      @Deprecated StagedTable stageCreate(Identifier ident, StructType schema, Transform[] partitions, Map<String,String> properties) throws org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException, org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException
      Deprecated.
      Stage the creation of a table, preparing it to be committed into the metastore.

      This is deprecated. Please override stageCreate(Identifier, Column[], Transform[], Map) instead.

      Throws:
      org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException
      org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException
    • stageCreate

      default StagedTable stageCreate(Identifier ident, Column[] columns, Transform[] partitions, Map<String,String> properties) throws org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException, org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException
      Stage the creation of a table, preparing it to be committed into the metastore.

      When the table is committed, the contents of any writes performed by the Spark planner are committed along with the metadata about the table passed into this method's arguments. If the table exists when this method is called, the method should throw an exception accordingly. If another process concurrently creates the table before this table's staged changes are committed, an exception should be thrown by StagedTable.commitStagedChanges().

      Parameters:
      ident - a table identifier
      columns - the column of the new table
      partitions - transforms to use for partitioning data in the table
      properties - a string map of table properties
      Returns:
      metadata for the new table
      Throws:
      org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException - If a table or view already exists for the identifier
      UnsupportedOperationException - If a requested partition transform is not supported
      org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException - If the identifier namespace does not exist (optional)
    • stageReplace

      StagedTable stageReplace(Identifier ident, StructType schema, Transform[] partitions, Map<String,String> properties) throws org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException, org.apache.spark.sql.catalyst.analysis.NoSuchTableException
      Stage the replacement of a table, preparing it to be committed into the metastore when the returned table's StagedTable.commitStagedChanges() is called.

      This is deprecated, please override stageReplace(Identifier, StructType, Transform[], Map) instead.

      Throws:
      org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException
      org.apache.spark.sql.catalyst.analysis.NoSuchTableException
    • stageReplace

      default StagedTable stageReplace(Identifier ident, Column[] columns, Transform[] partitions, Map<String,String> properties) throws org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException, org.apache.spark.sql.catalyst.analysis.NoSuchTableException
      Stage the replacement of a table, preparing it to be committed into the metastore when the returned table's StagedTable.commitStagedChanges() is called.

      When the table is committed, the contents of any writes performed by the Spark planner are committed along with the metadata about the table passed into this method's arguments. If the table exists, the metadata and the contents of this table replace the metadata and contents of the existing table. If a concurrent process commits changes to the table's data or metadata while the write is being performed but before the staged changes are committed, the catalog can decide whether to move forward with the table replacement anyways or abort the commit operation.

      If the table does not exist, committing the staged changes should fail with NoSuchTableException. This differs from the semantics of stageCreateOrReplace(Identifier, StructType, Transform[], Map), which should create the table in the data source if the table does not exist at the time of committing the operation.

      Parameters:
      ident - a table identifier
      columns - the columns of the new table
      partitions - transforms to use for partitioning data in the table
      properties - a string map of table properties
      Returns:
      metadata for the new table
      Throws:
      UnsupportedOperationException - If a requested partition transform is not supported
      org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException - If the identifier namespace does not exist (optional)
      org.apache.spark.sql.catalyst.analysis.NoSuchTableException - If the table does not exist
    • stageCreateOrReplace

      StagedTable stageCreateOrReplace(Identifier ident, StructType schema, Transform[] partitions, Map<String,String> properties) throws org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException
      Stage the creation or replacement of a table, preparing it to be committed into the metastore when the returned table's StagedTable.commitStagedChanges() is called.

      This is deprecated, please override stageCreateOrReplace(Identifier, Column[], Transform[], Map) instead.

      Throws:
      org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException
    • stageCreateOrReplace

      default StagedTable stageCreateOrReplace(Identifier ident, Column[] columns, Transform[] partitions, Map<String,String> properties) throws org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException
      Stage the creation or replacement of a table, preparing it to be committed into the metastore when the returned table's StagedTable.commitStagedChanges() is called.

      When the table is committed, the contents of any writes performed by the Spark planner are committed along with the metadata about the table passed into this method's arguments. If the table exists, the metadata and the contents of this table replace the metadata and contents of the existing table. If a concurrent process commits changes to the table's data or metadata while the write is being performed but before the staged changes are committed, the catalog can decide whether to move forward with the table replacement anyways or abort the commit operation.

      If the table does not exist when the changes are committed, the table should be created in the backing data source. This differs from the expected semantics of stageReplace(Identifier, StructType, Transform[], Map), which should fail when the staged changes are committed but the table doesn't exist at commit time.

      Parameters:
      ident - a table identifier
      columns - the columns of the new table
      partitions - transforms to use for partitioning data in the table
      properties - a string map of table properties
      Returns:
      metadata for the new table
      Throws:
      UnsupportedOperationException - If a requested partition transform is not supported
      org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException - If the identifier namespace does not exist (optional)