Packages

t

org.apache.spark.sql.connector.catalog

StagingTableCatalog

trait StagingTableCatalog extends TableCatalog

An optional mix-in for implementations of TableCatalog that support staging creation of a table before committing the table's metadata along with its contents in CREATE TABLE AS SELECT or REPLACE TABLE AS SELECT operations.

It is highly recommended to implement this trait whenever possible so that CREATE TABLE AS SELECT and REPLACE TABLE AS SELECT operations are atomic. For example, when one runs a REPLACE TABLE AS SELECT operation, if the catalog does not implement this trait, the planner will first drop the table via TableCatalog#dropTable(Identifier), then create the table via TableInfo), and then perform the write via SupportsWrite#newWriteBuilder(LogicalWriteInfo). However, if the write operation fails, the catalog will have already dropped the table, and the planner cannot roll back the dropping of the table.

If the catalog implements this plugin, the catalog can implement the methods to "stage" the creation and the replacement of a table. After the table's BatchWrite#commit(WriterCommitMessage[]) is called, StagedTable#commitStagedChanges() is called, at which point the staged table can complete both the data write and the metadata swap operation atomically.

Annotations
@Evolving()
Source
StagingTableCatalog.java
Since

3.0.0

Linear Supertypes
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. StagingTableCatalog
  2. TableCatalog
  3. CatalogPlugin
  4. AnyRef
  5. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Abstract Value Members

  1. abstract def alterTable(ident: Identifier, changes: <repeated...>[TableChange]): Table

    Apply a set of changes to a table in the catalog.

    Apply a set of changes to a table in the catalog.

    Implementations may reject the requested changes. If any change is rejected, none of the changes should be applied to the table.

    The requested changes must be applied in the order given.

    If the catalog supports views and contains a view for the identifier and not a table, this must throw NoSuchTableException.

    ident

    a table identifier

    changes

    changes to apply to the table

    returns

    updated metadata for the table. This can be null if getting the metadata for the updated table is expensive. Spark always discard the returned table here.

    Definition Classes
    TableCatalog
    Exceptions thrown

    IllegalArgumentException If any change is rejected by the implementation.

    NoSuchTableException If the table doesn't exist or is a view

  2. abstract def dropTable(ident: Identifier): Boolean

    Drop a table in the catalog.

    Drop a table in the catalog.

    If the catalog supports views and contains a view for the identifier and not a table, this must not drop the view and must return false.

    ident

    a table identifier

    returns

    true if a table was deleted, false if no table exists for the identifier

    Definition Classes
    TableCatalog
  3. abstract def initialize(name: String, options: CaseInsensitiveStringMap): Unit

    Called to initialize configuration.

    Called to initialize configuration.

    This method is called once, just after the provider is instantiated.

    name

    the name used to identify and load this catalog

    options

    a case-insensitive string map of configuration

    Definition Classes
    CatalogPlugin
  4. abstract def listTables(namespace: Array[String]): Array[Identifier]

    List the tables in a namespace from the catalog.

    List the tables in a namespace from the catalog.

    If the catalog supports views, this must return identifiers for only tables and not views.

    namespace

    a multi-part namespace

    returns

    an array of Identifiers for tables

    Definition Classes
    TableCatalog
    Exceptions thrown

    NoSuchNamespaceException If the namespace does not exist (optional).

  5. abstract def loadTable(ident: Identifier): Table

    Load table metadata by identifier from the catalog.

    Load table metadata by identifier from the catalog.

    If the catalog supports views and contains a view for the identifier and not a table, this must throw NoSuchTableException.

    ident

    a table identifier

    returns

    the table's metadata

    Definition Classes
    TableCatalog
    Exceptions thrown

    NoSuchTableException If the table doesn't exist or is a view

  6. abstract def name(): String

    Called to get this catalog's name.

    Called to get this catalog's name.

    This method is only called after CaseInsensitiveStringMap) is called to pass the catalog's name.

    Definition Classes
    CatalogPlugin
  7. abstract def renameTable(oldIdent: Identifier, newIdent: Identifier): Unit

    Renames a table in the catalog.

    Renames a table in the catalog.

    If the catalog supports views and contains a view for the old identifier and not a table, this throws NoSuchTableException. Additionally, if the new identifier is a table or a view, this throws TableAlreadyExistsException.

    If the catalog does not support table renames between namespaces, it throws UnsupportedOperationException.

    oldIdent

    the table identifier of the existing table to rename

    newIdent

    the new table identifier of the table

    Definition Classes
    TableCatalog
    Exceptions thrown

    NoSuchTableException If the table to rename doesn't exist or is a view

    TableAlreadyExistsException If the new table name already exists or is a view

    UnsupportedOperationException If the namespaces of old and new identifiers do not match (optional)

Concrete Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##: Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def capabilities(): Set[TableCatalogCapability]

    returns

    the set of capabilities for this TableCatalog

    Definition Classes
    TableCatalog
  6. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @IntrinsicCandidate() @native()
  7. def createTable(ident: Identifier, tableInfo: TableInfo): Table

    Create a table in the catalog.

    Create a table in the catalog.

    ident

    a table identifier

    tableInfo

    information about the table.

    returns

    metadata for the new table. This can be null if getting the metadata for the new table is expensive. Spark will call #loadTable(Identifier) if needed (e.g. CTAS).

    Definition Classes
    TableCatalog
    Since

    4.1.0

    Exceptions thrown

    NoSuchNamespaceException If the identifier namespace does not exist (optional)

    TableAlreadyExistsException If a table or view already exists for the identifier

    UnsupportedOperationException If a requested partition transform is not supported

  8. def defaultNamespace(): Array[String]

    Return a default namespace for the catalog.

    Return a default namespace for the catalog.

    When this catalog is set as the current catalog, the namespace returned by this method will be set as the current namespace.

    The namespace returned by this method is not required to exist.

    returns

    a multi-part namespace

    Definition Classes
    CatalogPlugin
  9. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  10. def equals(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef → Any
  11. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @IntrinsicCandidate() @native()
  12. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @IntrinsicCandidate() @native()
  13. def invalidateTable(ident: Identifier): Unit

    Invalidate cached table metadata for an identifier.

    Invalidate cached table metadata for an identifier.

    If the table is already loaded or cached, drop cached data. If the table does not exist or is not cached, do nothing. Calling this method should not query remote services.

    ident

    a table identifier

    Definition Classes
    TableCatalog
  14. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  15. def listTableSummaries(namespace: Array[String]): Array[TableSummary]

    List the table summaries in a namespace from the catalog.

    List the table summaries in a namespace from the catalog.

    This method should return all tables entities from a catalog regardless of type (i.e. views should be listed as well).

    namespace

    a multi-part namespace

    returns

    an array of Identifiers for tables

    Definition Classes
    TableCatalog
    Exceptions thrown

    NoSuchNamespaceException If the namespace does not exist (optional).

    NoSuchTableException If certain table listed by listTables API does not exist.

  16. def loadTable(ident: Identifier, timestamp: Long): Table

    Load table metadata at a specific time by identifier from the catalog.

    Load table metadata at a specific time by identifier from the catalog.

    If the catalog supports views and contains a view for the identifier and not a table, this must throw NoSuchTableException.

    ident

    a table identifier

    timestamp

    timestamp of the table, which is microseconds since 1970-01-01 00:00:00 UTC

    returns

    the table's metadata

    Definition Classes
    TableCatalog
    Exceptions thrown

    NoSuchTableException If the table doesn't exist or is a view

  17. def loadTable(ident: Identifier, version: String): Table

    Load table metadata of a specific version by identifier from the catalog.

    Load table metadata of a specific version by identifier from the catalog.

    If the catalog supports views and contains a view for the identifier and not a table, this must throw NoSuchTableException.

    ident

    a table identifier

    version

    version of the table

    returns

    the table's metadata

    Definition Classes
    TableCatalog
    Exceptions thrown

    NoSuchTableException If the table doesn't exist or is a view

  18. def loadTable(ident: Identifier, writePrivileges: Set[TableWritePrivilege]): Table

    Load table metadata by identifier from the catalog.

    Load table metadata by identifier from the catalog. Spark will write data into this table later.

    If the catalog supports views and contains a view for the identifier and not a table, this must throw NoSuchTableException.

    ident

    a table identifier

    returns

    the table's metadata

    Definition Classes
    TableCatalog
    Since

    3.5.3

    Exceptions thrown

    NoSuchTableException If the table doesn't exist or is a view

  19. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  20. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @IntrinsicCandidate() @native()
  21. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @IntrinsicCandidate() @native()
  22. def purgeTable(ident: Identifier): Boolean

    Drop a table in the catalog and completely remove its data by skipping a trash even if it is supported.

    Drop a table in the catalog and completely remove its data by skipping a trash even if it is supported.

    If the catalog supports views and contains a view for the identifier and not a table, this must not drop the view and must return false.

    If the catalog supports to purge a table, this method should be overridden. The default implementation throws UnsupportedOperationException.

    ident

    a table identifier

    returns

    true if a table was deleted, false if no table exists for the identifier

    Definition Classes
    TableCatalog
    Since

    3.1.0

    Exceptions thrown

    UnsupportedOperationException If table purging is not supported

  23. def stageCreate(ident: Identifier, tableInfo: TableInfo): StagedTable

    Stage the creation of a table, preparing it to be committed into the metastore.

    Stage the creation of a table, preparing it to be committed into the metastore.

    When the table is committed, the contents of any writes performed by the Spark planner are committed along with the metadata about the table passed into this method's arguments. If the table exists when this method is called, the method should throw an exception accordingly. If another process concurrently creates the table before this table's staged changes are committed, an exception should be thrown by StagedTable#commitStagedChanges().

    ident

    a table identifier

    tableInfo

    information about the table

    returns

    metadata for the new table. This can be null if the catalog does not support atomic creation for this table. Spark will call #loadTable(Identifier) later.

    Exceptions thrown

    NoSuchNamespaceException If the identifier namespace does not exist (optional)

    TableAlreadyExistsException If a table or view already exists for the identifier

    UnsupportedOperationException If a requested partition transform is not supported

  24. def stageCreateOrReplace(ident: Identifier, tableInfo: TableInfo): StagedTable

    Stage the creation or replacement of a table, preparing it to be committed into the metastore when the returned table's StagedTable#commitStagedChanges() is called.

    Stage the creation or replacement of a table, preparing it to be committed into the metastore when the returned table's StagedTable#commitStagedChanges() is called.

    When the table is committed, the contents of any writes performed by the Spark planner are committed along with the metadata about the table passed into this method's arguments. If the table exists, the metadata and the contents of this table replace the metadata and contents of the existing table. If a concurrent process commits changes to the table's data or metadata while the write is being performed but before the staged changes are committed, the catalog can decide whether to move forward with the table replacement anyways or abort the commit operation.

    If the table does not exist when the changes are committed, the table should be created in the backing data source. This differs from the expected semantics of StructType, Transform[], Map), which should fail when the staged changes are committed but the table doesn't exist at commit time.

    ident

    a table identifier

    tableInfo

    information about the table

    returns

    metadata for the new table. This can be null if the catalog does not support atomic creation for this table. Spark will call #loadTable(Identifier) later.

    Exceptions thrown

    NoSuchNamespaceException If the identifier namespace does not exist (optional)

    UnsupportedOperationException If a requested partition transform is not supported

  25. def stageReplace(ident: Identifier, tableInfo: TableInfo): StagedTable

    Stage the replacement of a table, preparing it to be committed into the metastore when the returned table's StagedTable#commitStagedChanges() is called.

    Stage the replacement of a table, preparing it to be committed into the metastore when the returned table's StagedTable#commitStagedChanges() is called.

    When the table is committed, the contents of any writes performed by the Spark planner are committed along with the metadata about the table passed into this method's arguments. If the table exists, the metadata and the contents of this table replace the metadata and contents of the existing table. If a concurrent process commits changes to the table's data or metadata while the write is being performed but before the staged changes are committed, the catalog can decide whether to move forward with the table replacement anyways or abort the commit operation.

    If the table does not exist, committing the staged changes should fail with NoSuchTableException. This differs from the semantics of StructType, Transform[], Map), which should create the table in the data source if the table does not exist at the time of committing the operation.

    ident

    a table identifier

    tableInfo

    information about the table

    returns

    metadata for the new table. This can be null if the catalog does not support atomic creation for this table. Spark will call #loadTable(Identifier) later.

    Exceptions thrown

    NoSuchNamespaceException If the identifier namespace does not exist (optional)

    NoSuchTableException If the table does not exist

    UnsupportedOperationException If a requested partition transform is not supported

  26. def supportedCustomMetrics(): Array[CustomMetric]

    returns

    An Array of commit metrics that are supported by the catalog. This is analogous to Write#supportedCustomMetrics(). The corresponding StagedTable#reportDriverMetrics() method must be called to retrieve the actual metric values after a commit. The methods are not in the same class because the supported metrics are required before the staged table object is created and only the staged table object can capture the write metrics during the commit.

  27. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  28. def tableExists(ident: Identifier): Boolean

    Test whether a table exists using an identifier from the catalog.

    Test whether a table exists using an identifier from the catalog.

    If the catalog supports views and contains a view for the identifier and not a table, this must return false.

    ident

    a table identifier

    returns

    true if the table exists, false otherwise

    Definition Classes
    TableCatalog
  29. def toString(): String
    Definition Classes
    AnyRef → Any
  30. def useNullableQuerySchema(): Boolean

    If true, mark all the fields of the query schema as nullable when executing CREATE/REPLACE TABLE ...

    If true, mark all the fields of the query schema as nullable when executing CREATE/REPLACE TABLE ... AS SELECT ... and creating the table.

    Definition Classes
    TableCatalog
  31. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  32. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()
  33. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])

Deprecated Value Members

  1. def createTable(ident: Identifier, columns: Array[Column], partitions: Array[Transform], properties: Map[String, String]): Table

    Create a table in the catalog.

    Create a table in the catalog.

    Definition Classes
    TableCatalog
    Annotations
    @Deprecated
    Deprecated

    (Since version 4.1.0)

  2. def createTable(ident: Identifier, schema: StructType, partitions: Array[Transform], properties: Map[String, String]): Table

    Create a table in the catalog.

    Create a table in the catalog.

    Definition Classes
    TableCatalog
    Annotations
    @Deprecated
    Deprecated

    (Since version 3.4.0)

  3. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable]) @Deprecated
    Deprecated

    (Since version 9)

  4. def stageCreate(ident: Identifier, columns: Array[Column], partitions: Array[Transform], properties: Map[String, String]): StagedTable

    Stage the creation of a table, preparing it to be committed into the metastore.

    Stage the creation of a table, preparing it to be committed into the metastore.

    Annotations
    @Deprecated
    Deprecated

    (Since version 4.1.0)

  5. def stageCreate(ident: Identifier, schema: StructType, partitions: Array[Transform], properties: Map[String, String]): StagedTable

    Stage the creation of a table, preparing it to be committed into the metastore.

    Stage the creation of a table, preparing it to be committed into the metastore.

    Annotations
    @Deprecated
    Deprecated

    (Since version 3.4.0)

  6. def stageCreateOrReplace(ident: Identifier, columns: Array[Column], partitions: Array[Transform], properties: Map[String, String]): StagedTable

    Stage the creation or replacement of a table, preparing it to be committed into the metastore when the returned table's StagedTable#commitStagedChanges() is called.

    Stage the creation or replacement of a table, preparing it to be committed into the metastore when the returned table's StagedTable#commitStagedChanges() is called.

    This is deprecated, please override TableInfo) instead.

    Annotations
    @Deprecated
    Deprecated

    (Since version 4.1.0)

  7. def stageCreateOrReplace(ident: Identifier, schema: StructType, partitions: Array[Transform], properties: Map[String, String]): StagedTable

    Stage the creation or replacement of a table, preparing it to be committed into the metastore when the returned table's StagedTable#commitStagedChanges() is called.

    Stage the creation or replacement of a table, preparing it to be committed into the metastore when the returned table's StagedTable#commitStagedChanges() is called.

    This is deprecated, please override Column[], Transform[], Map) instead.

    Annotations
    @Deprecated
    Deprecated

    (Since version 3.4.0)

  8. def stageReplace(ident: Identifier, columns: Array[Column], partitions: Array[Transform], properties: Map[String, String]): StagedTable

    Stage the replacement of a table, preparing it to be committed into the metastore when the returned table's StagedTable#commitStagedChanges() is called.

    Stage the replacement of a table, preparing it to be committed into the metastore when the returned table's StagedTable#commitStagedChanges() is called.

    This is deprecated, please override TableInfo) instead.

    Annotations
    @Deprecated
    Deprecated

    (Since version 4.1.0)

  9. def stageReplace(ident: Identifier, schema: StructType, partitions: Array[Transform], properties: Map[String, String]): StagedTable

    Stage the replacement of a table, preparing it to be committed into the metastore when the returned table's StagedTable#commitStagedChanges() is called.

    Stage the replacement of a table, preparing it to be committed into the metastore when the returned table's StagedTable#commitStagedChanges() is called.

    This is deprecated, please override Column[], Transform[], Map) instead.

    Annotations
    @Deprecated
    Deprecated

    (Since version 3.4.0)

Inherited from TableCatalog

Inherited from CatalogPlugin

Inherited from AnyRef

Inherited from Any

Ungrouped