trait StagingTableCatalog extends TableCatalog
An optional mix-in for implementations of TableCatalog that support staging creation of
a table before committing the table's metadata along with its contents in CREATE TABLE AS
SELECT or REPLACE TABLE AS SELECT operations.
It is highly recommended to implement this trait whenever possible so that CREATE TABLE AS
SELECT and REPLACE TABLE AS SELECT operations are atomic. For example, when one runs a REPLACE
TABLE AS SELECT operation, if the catalog does not implement this trait, the planner will first
drop the table via TableCatalog#dropTable(Identifier), then create the table via
TableInfo), and then perform
the write via SupportsWrite#newWriteBuilder(LogicalWriteInfo).
However, if the write operation fails, the catalog will have already dropped the table, and the
planner cannot roll back the dropping of the table.
If the catalog implements this plugin, the catalog can implement the methods to "stage" the
creation and the replacement of a table. After the table's
BatchWrite#commit(WriterCommitMessage[]) is called,
StagedTable#commitStagedChanges() is called, at which point the staged table can
complete both the data write and the metadata swap operation atomically.
- Annotations
- @Evolving()
- Source
- StagingTableCatalog.java
- Since
3.0.0
- Alphabetic
- By Inheritance
- StagingTableCatalog
- TableCatalog
- CatalogPlugin
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Abstract Value Members
- abstract def alterTable(ident: Identifier, changes: <repeated...>[TableChange]): Table
Apply a set of
changesto a table in the catalog.Apply a set of
changesto a table in the catalog.Implementations may reject the requested changes. If any change is rejected, none of the changes should be applied to the table.
The requested changes must be applied in the order given.
If the catalog supports views and contains a view for the identifier and not a table, this must throw
NoSuchTableException.- ident
a table identifier
- changes
changes to apply to the table
- returns
updated metadata for the table. This can be null if getting the metadata for the updated table is expensive. Spark always discard the returned table here.
- Definition Classes
- TableCatalog
- Exceptions thrown
IllegalArgumentExceptionIf any change is rejected by the implementation.NoSuchTableExceptionIf the table doesn't exist or is a view
- abstract def dropTable(ident: Identifier): Boolean
Drop a table in the catalog.
Drop a table in the catalog.
If the catalog supports views and contains a view for the identifier and not a table, this must not drop the view and must return false.
- ident
a table identifier
- returns
true if a table was deleted, false if no table exists for the identifier
- Definition Classes
- TableCatalog
- abstract def initialize(name: String, options: CaseInsensitiveStringMap): Unit
Called to initialize configuration.
Called to initialize configuration.
This method is called once, just after the provider is instantiated.
- name
the name used to identify and load this catalog
- options
a case-insensitive string map of configuration
- Definition Classes
- CatalogPlugin
- abstract def listTables(namespace: Array[String]): Array[Identifier]
List the tables in a namespace from the catalog.
List the tables in a namespace from the catalog.
If the catalog supports views, this must return identifiers for only tables and not views.
- namespace
a multi-part namespace
- returns
an array of Identifiers for tables
- Definition Classes
- TableCatalog
- Exceptions thrown
NoSuchNamespaceExceptionIf the namespace does not exist (optional).
- abstract def loadTable(ident: Identifier): Table
Load table metadata by
identifierfrom the catalog.Load table metadata by
identifierfrom the catalog.If the catalog supports views and contains a view for the identifier and not a table, this must throw
NoSuchTableException.- ident
a table identifier
- returns
the table's metadata
- Definition Classes
- TableCatalog
- Exceptions thrown
NoSuchTableExceptionIf the table doesn't exist or is a view
- abstract def name(): String
Called to get this catalog's name.
Called to get this catalog's name.
This method is only called after
CaseInsensitiveStringMap)is called to pass the catalog's name.- Definition Classes
- CatalogPlugin
- abstract def renameTable(oldIdent: Identifier, newIdent: Identifier): Unit
Renames a table in the catalog.
Renames a table in the catalog.
If the catalog supports views and contains a view for the old identifier and not a table, this throws
NoSuchTableException. Additionally, if the new identifier is a table or a view, this throwsTableAlreadyExistsException.If the catalog does not support table renames between namespaces, it throws
UnsupportedOperationException.- oldIdent
the table identifier of the existing table to rename
- newIdent
the new table identifier of the table
- Definition Classes
- TableCatalog
- Exceptions thrown
NoSuchTableExceptionIf the table to rename doesn't exist or is a viewTableAlreadyExistsExceptionIf the new table name already exists or is a viewUnsupportedOperationExceptionIf the namespaces of old and new identifiers do not match (optional)
Concrete Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def capabilities(): Set[TableCatalogCapability]
- returns
the set of capabilities for this TableCatalog
- Definition Classes
- TableCatalog
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @IntrinsicCandidate() @native()
- def createTable(ident: Identifier, tableInfo: TableInfo): Table
Create a table in the catalog.
Create a table in the catalog.
- ident
a table identifier
- tableInfo
information about the table.
- returns
metadata for the new table. This can be null if getting the metadata for the new table is expensive. Spark will call
#loadTable(Identifier)if needed (e.g. CTAS).
- Definition Classes
- TableCatalog
- Since
4.1.0
- Exceptions thrown
NoSuchNamespaceExceptionIf the identifier namespace does not exist (optional)TableAlreadyExistsExceptionIf a table or view already exists for the identifierUnsupportedOperationExceptionIf a requested partition transform is not supported
- def defaultNamespace(): Array[String]
Return a default namespace for the catalog.
Return a default namespace for the catalog.
When this catalog is set as the current catalog, the namespace returned by this method will be set as the current namespace.
The namespace returned by this method is not required to exist.
- returns
a multi-part namespace
- Definition Classes
- CatalogPlugin
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @IntrinsicCandidate() @native()
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @IntrinsicCandidate() @native()
- def invalidateTable(ident: Identifier): Unit
Invalidate cached table metadata for an
identifier.Invalidate cached table metadata for an
identifier.If the table is already loaded or cached, drop cached data. If the table does not exist or is not cached, do nothing. Calling this method should not query remote services.
- ident
a table identifier
- Definition Classes
- TableCatalog
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- def listTableSummaries(namespace: Array[String]): Array[TableSummary]
List the table summaries in a namespace from the catalog.
List the table summaries in a namespace from the catalog.
This method should return all tables entities from a catalog regardless of type (i.e. views should be listed as well).
- namespace
a multi-part namespace
- returns
an array of Identifiers for tables
- Definition Classes
- TableCatalog
- Exceptions thrown
NoSuchNamespaceExceptionIf the namespace does not exist (optional).NoSuchTableExceptionIf certain table listed by listTables API does not exist.
- def loadTable(ident: Identifier, timestamp: Long): Table
Load table metadata at a specific time by
identifierfrom the catalog.Load table metadata at a specific time by
identifierfrom the catalog.If the catalog supports views and contains a view for the identifier and not a table, this must throw
NoSuchTableException.- ident
a table identifier
- timestamp
timestamp of the table, which is microseconds since 1970-01-01 00:00:00 UTC
- returns
the table's metadata
- Definition Classes
- TableCatalog
- Exceptions thrown
NoSuchTableExceptionIf the table doesn't exist or is a view
- def loadTable(ident: Identifier, version: String): Table
Load table metadata of a specific version by
identifierfrom the catalog.Load table metadata of a specific version by
identifierfrom the catalog.If the catalog supports views and contains a view for the identifier and not a table, this must throw
NoSuchTableException.- ident
a table identifier
- version
version of the table
- returns
the table's metadata
- Definition Classes
- TableCatalog
- Exceptions thrown
NoSuchTableExceptionIf the table doesn't exist or is a view
- def loadTable(ident: Identifier, writePrivileges: Set[TableWritePrivilege]): Table
Load table metadata by
identifierfrom the catalog.Load table metadata by
identifierfrom the catalog. Spark will write data into this table later.If the catalog supports views and contains a view for the identifier and not a table, this must throw
NoSuchTableException.- ident
a table identifier
- returns
the table's metadata
- Definition Classes
- TableCatalog
- Since
3.5.3
- Exceptions thrown
NoSuchTableExceptionIf the table doesn't exist or is a view
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @IntrinsicCandidate() @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @IntrinsicCandidate() @native()
- def purgeTable(ident: Identifier): Boolean
Drop a table in the catalog and completely remove its data by skipping a trash even if it is supported.
Drop a table in the catalog and completely remove its data by skipping a trash even if it is supported.
If the catalog supports views and contains a view for the identifier and not a table, this must not drop the view and must return false.
If the catalog supports to purge a table, this method should be overridden. The default implementation throws
UnsupportedOperationException.- ident
a table identifier
- returns
true if a table was deleted, false if no table exists for the identifier
- Definition Classes
- TableCatalog
- Since
3.1.0
- Exceptions thrown
UnsupportedOperationExceptionIf table purging is not supported
- def stageCreate(ident: Identifier, tableInfo: TableInfo): StagedTable
Stage the creation of a table, preparing it to be committed into the metastore.
Stage the creation of a table, preparing it to be committed into the metastore.
When the table is committed, the contents of any writes performed by the Spark planner are committed along with the metadata about the table passed into this method's arguments. If the table exists when this method is called, the method should throw an exception accordingly. If another process concurrently creates the table before this table's staged changes are committed, an exception should be thrown by
StagedTable#commitStagedChanges().- ident
a table identifier
- tableInfo
information about the table
- returns
metadata for the new table. This can be null if the catalog does not support atomic creation for this table. Spark will call
#loadTable(Identifier)later.
- Exceptions thrown
NoSuchNamespaceExceptionIf the identifier namespace does not exist (optional)TableAlreadyExistsExceptionIf a table or view already exists for the identifierUnsupportedOperationExceptionIf a requested partition transform is not supported
- def stageCreateOrReplace(ident: Identifier, tableInfo: TableInfo): StagedTable
Stage the creation or replacement of a table, preparing it to be committed into the metastore when the returned table's
StagedTable#commitStagedChanges()is called.Stage the creation or replacement of a table, preparing it to be committed into the metastore when the returned table's
StagedTable#commitStagedChanges()is called.When the table is committed, the contents of any writes performed by the Spark planner are committed along with the metadata about the table passed into this method's arguments. If the table exists, the metadata and the contents of this table replace the metadata and contents of the existing table. If a concurrent process commits changes to the table's data or metadata while the write is being performed but before the staged changes are committed, the catalog can decide whether to move forward with the table replacement anyways or abort the commit operation.
If the table does not exist when the changes are committed, the table should be created in the backing data source. This differs from the expected semantics of
StructType, Transform[], Map), which should fail when the staged changes are committed but the table doesn't exist at commit time.- ident
a table identifier
- tableInfo
information about the table
- returns
metadata for the new table. This can be null if the catalog does not support atomic creation for this table. Spark will call
#loadTable(Identifier)later.
- Exceptions thrown
NoSuchNamespaceExceptionIf the identifier namespace does not exist (optional)UnsupportedOperationExceptionIf a requested partition transform is not supported
- def stageReplace(ident: Identifier, tableInfo: TableInfo): StagedTable
Stage the replacement of a table, preparing it to be committed into the metastore when the returned table's
StagedTable#commitStagedChanges()is called.Stage the replacement of a table, preparing it to be committed into the metastore when the returned table's
StagedTable#commitStagedChanges()is called.When the table is committed, the contents of any writes performed by the Spark planner are committed along with the metadata about the table passed into this method's arguments. If the table exists, the metadata and the contents of this table replace the metadata and contents of the existing table. If a concurrent process commits changes to the table's data or metadata while the write is being performed but before the staged changes are committed, the catalog can decide whether to move forward with the table replacement anyways or abort the commit operation.
If the table does not exist, committing the staged changes should fail with
NoSuchTableException. This differs from the semantics ofStructType, Transform[], Map), which should create the table in the data source if the table does not exist at the time of committing the operation.- ident
a table identifier
- tableInfo
information about the table
- returns
metadata for the new table. This can be null if the catalog does not support atomic creation for this table. Spark will call
#loadTable(Identifier)later.
- Exceptions thrown
NoSuchNamespaceExceptionIf the identifier namespace does not exist (optional)NoSuchTableExceptionIf the table does not existUnsupportedOperationExceptionIf a requested partition transform is not supported
- def supportedCustomMetrics(): Array[CustomMetric]
- returns
An Array of commit metrics that are supported by the catalog. This is analogous to
Write#supportedCustomMetrics(). The correspondingStagedTable#reportDriverMetrics()method must be called to retrieve the actual metric values after a commit. The methods are not in the same class because the supported metrics are required before the staged table object is created and only the staged table object can capture the write metrics during the commit.
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def tableExists(ident: Identifier): Boolean
Test whether a table exists using an
identifierfrom the catalog.Test whether a table exists using an
identifierfrom the catalog.If the catalog supports views and contains a view for the identifier and not a table, this must return false.
- ident
a table identifier
- returns
true if the table exists, false otherwise
- Definition Classes
- TableCatalog
- def toString(): String
- Definition Classes
- AnyRef → Any
- def useNullableQuerySchema(): Boolean
If true, mark all the fields of the query schema as nullable when executing CREATE/REPLACE TABLE ...
If true, mark all the fields of the query schema as nullable when executing CREATE/REPLACE TABLE ... AS SELECT ... and creating the table.
- Definition Classes
- TableCatalog
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
Deprecated Value Members
- def createTable(ident: Identifier, columns: Array[Column], partitions: Array[Transform], properties: Map[String, String]): Table
Create a table in the catalog.
Create a table in the catalog.
- Definition Classes
- TableCatalog
- Annotations
- @Deprecated
- Deprecated
(Since version 4.1.0)
- def createTable(ident: Identifier, schema: StructType, partitions: Array[Transform], properties: Map[String, String]): Table
Create a table in the catalog.
Create a table in the catalog.
- Definition Classes
- TableCatalog
- Annotations
- @Deprecated
- Deprecated
(Since version 3.4.0)
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable]) @Deprecated
- Deprecated
(Since version 9)
- def stageCreate(ident: Identifier, columns: Array[Column], partitions: Array[Transform], properties: Map[String, String]): StagedTable
Stage the creation of a table, preparing it to be committed into the metastore.
Stage the creation of a table, preparing it to be committed into the metastore.
- Annotations
- @Deprecated
- Deprecated
(Since version 4.1.0)
- def stageCreate(ident: Identifier, schema: StructType, partitions: Array[Transform], properties: Map[String, String]): StagedTable
Stage the creation of a table, preparing it to be committed into the metastore.
Stage the creation of a table, preparing it to be committed into the metastore.
- Annotations
- @Deprecated
- Deprecated
(Since version 3.4.0)
- def stageCreateOrReplace(ident: Identifier, columns: Array[Column], partitions: Array[Transform], properties: Map[String, String]): StagedTable
Stage the creation or replacement of a table, preparing it to be committed into the metastore when the returned table's
StagedTable#commitStagedChanges()is called.Stage the creation or replacement of a table, preparing it to be committed into the metastore when the returned table's
StagedTable#commitStagedChanges()is called.This is deprecated, please override
TableInfo)instead.- Annotations
- @Deprecated
- Deprecated
(Since version 4.1.0)
- def stageCreateOrReplace(ident: Identifier, schema: StructType, partitions: Array[Transform], properties: Map[String, String]): StagedTable
Stage the creation or replacement of a table, preparing it to be committed into the metastore when the returned table's
StagedTable#commitStagedChanges()is called.Stage the creation or replacement of a table, preparing it to be committed into the metastore when the returned table's
StagedTable#commitStagedChanges()is called.This is deprecated, please override
Column[], Transform[], Map)instead.- Annotations
- @Deprecated
- Deprecated
(Since version 3.4.0)
- def stageReplace(ident: Identifier, columns: Array[Column], partitions: Array[Transform], properties: Map[String, String]): StagedTable
Stage the replacement of a table, preparing it to be committed into the metastore when the returned table's
StagedTable#commitStagedChanges()is called.Stage the replacement of a table, preparing it to be committed into the metastore when the returned table's
StagedTable#commitStagedChanges()is called.This is deprecated, please override
TableInfo)instead.- Annotations
- @Deprecated
- Deprecated
(Since version 4.1.0)
- def stageReplace(ident: Identifier, schema: StructType, partitions: Array[Transform], properties: Map[String, String]): StagedTable
Stage the replacement of a table, preparing it to be committed into the metastore when the returned table's
StagedTable#commitStagedChanges()is called.Stage the replacement of a table, preparing it to be committed into the metastore when the returned table's
StagedTable#commitStagedChanges()is called.This is deprecated, please override
Column[], Transform[], Map)instead.- Annotations
- @Deprecated
- Deprecated
(Since version 3.4.0)