Class Catalog

Object
org.apache.spark.sql.catalog.Catalog

public abstract class Catalog extends Object
Catalog interface for Spark. To access this, use SparkSession.catalog.

Since:
2.0.0
  • Constructor Details

    • Catalog

      public Catalog()
  • Method Details

    • cacheTable

      public abstract void cacheTable(String tableName)
      Caches the specified table in-memory.

      Parameters:
      tableName - is either a qualified or unqualified name that designates a table/view. If no database identifier is provided, it refers to a temporary view or a table/view in the current database.
      Since:
      2.0.0
    • cacheTable

      public abstract void cacheTable(String tableName, StorageLevel storageLevel)
      Caches the specified table with the given storage level.

      Parameters:
      tableName - is either a qualified or unqualified name that designates a table/view. If no database identifier is provided, it refers to a temporary view or a table/view in the current database.
      storageLevel - storage level to cache table.
      Since:
      2.3.0
    • clearCache

      public abstract void clearCache()
      Removes all cached tables from the in-memory cache.

      Since:
      2.0.0
    • createExternalTable

      public Dataset<Row> createExternalTable(String tableName, String path)
      Deprecated.
      use createTable instead. Since 2.2.0.
      Creates a table from the given path and returns the corresponding DataFrame. It will use the default data source configured by spark.sql.sources.default.

      Parameters:
      tableName - is either a qualified or unqualified name that designates a table. If no database identifier is provided, it refers to a table in the current database.
      path - (undocumented)
      Returns:
      (undocumented)
      Since:
      2.0.0
    • createExternalTable

      public Dataset<Row> createExternalTable(String tableName, String path, String source)
      Deprecated.
      use createTable instead. Since 2.2.0.
      Creates a table from the given path based on a data source and returns the corresponding DataFrame.

      Parameters:
      tableName - is either a qualified or unqualified name that designates a table. If no database identifier is provided, it refers to a table in the current database.
      path - (undocumented)
      source - (undocumented)
      Returns:
      (undocumented)
      Since:
      2.0.0
    • createExternalTable

      public Dataset<Row> createExternalTable(String tableName, String source, Map<String,String> options)
      Deprecated.
      use createTable instead. Since 2.2.0.
      Creates a table from the given path based on a data source and a set of options. Then, returns the corresponding DataFrame.

      Parameters:
      tableName - is either a qualified or unqualified name that designates a table. If no database identifier is provided, it refers to a table in the current database.
      source - (undocumented)
      options - (undocumented)
      Returns:
      (undocumented)
      Since:
      2.0.0
    • createExternalTable

      public Dataset<Row> createExternalTable(String tableName, String source, scala.collection.immutable.Map<String,String> options)
      Deprecated.
      use createTable instead. Since 2.2.0.
      (Scala-specific) Creates a table from the given path based on a data source and a set of options. Then, returns the corresponding DataFrame.

      Parameters:
      tableName - is either a qualified or unqualified name that designates a table. If no database identifier is provided, it refers to a table in the current database.
      source - (undocumented)
      options - (undocumented)
      Returns:
      (undocumented)
      Since:
      2.0.0
    • createExternalTable

      public Dataset<Row> createExternalTable(String tableName, String source, StructType schema, Map<String,String> options)
      Deprecated.
      use createTable instead. Since 2.2.0.
      Create a table from the given path based on a data source, a schema and a set of options. Then, returns the corresponding DataFrame.

      Parameters:
      tableName - is either a qualified or unqualified name that designates a table. If no database identifier is provided, it refers to a table in the current database.
      source - (undocumented)
      schema - (undocumented)
      options - (undocumented)
      Returns:
      (undocumented)
      Since:
      2.0.0
    • createExternalTable

      public Dataset<Row> createExternalTable(String tableName, String source, StructType schema, scala.collection.immutable.Map<String,String> options)
      Deprecated.
      use createTable instead. Since 2.2.0.
      (Scala-specific) Create a table from the given path based on a data source, a schema and a set of options. Then, returns the corresponding DataFrame.

      Parameters:
      tableName - is either a qualified or unqualified name that designates a table. If no database identifier is provided, it refers to a table in the current database.
      source - (undocumented)
      schema - (undocumented)
      options - (undocumented)
      Returns:
      (undocumented)
      Since:
      2.0.0
    • createTable

      public abstract Dataset<Row> createTable(String tableName, String path)
      Creates a table from the given path and returns the corresponding DataFrame. It will use the default data source configured by spark.sql.sources.default.

      Parameters:
      tableName - is either a qualified or unqualified name that designates a table. If no database identifier is provided, it refers to a table in the current database.
      path - (undocumented)
      Returns:
      (undocumented)
      Since:
      2.2.0
    • createTable

      public abstract Dataset<Row> createTable(String tableName, String path, String source)
      Creates a table from the given path based on a data source and returns the corresponding DataFrame.

      Parameters:
      tableName - is either a qualified or unqualified name that designates a table. If no database identifier is provided, it refers to a table in the current database.
      path - (undocumented)
      source - (undocumented)
      Returns:
      (undocumented)
      Since:
      2.2.0
    • createTable

      public Dataset<Row> createTable(String tableName, String source, Map<String,String> options)
      Creates a table based on the dataset in a data source and a set of options. Then, returns the corresponding DataFrame.

      Parameters:
      tableName - is either a qualified or unqualified name that designates a table. If no database identifier is provided, it refers to a table in the current database.
      source - (undocumented)
      options - (undocumented)
      Returns:
      (undocumented)
      Since:
      2.2.0
    • createTable

      public abstract Dataset<Row> createTable(String tableName, String source, scala.collection.immutable.Map<String,String> options)
      (Scala-specific) Creates a table based on the dataset in a data source and a set of options. Then, returns the corresponding DataFrame.

      Parameters:
      tableName - is either a qualified or unqualified name that designates a table. If no database identifier is provided, it refers to a table in the current database.
      source - (undocumented)
      options - (undocumented)
      Returns:
      (undocumented)
      Since:
      2.2.0
    • createTable

      public Dataset<Row> createTable(String tableName, String source, String description, Map<String,String> options)
      Creates a table based on the dataset in a data source and a set of options. Then, returns the corresponding DataFrame.

      Parameters:
      tableName - is either a qualified or unqualified name that designates a table. If no database identifier is provided, it refers to a table in the current database.
      source - (undocumented)
      description - (undocumented)
      options - (undocumented)
      Returns:
      (undocumented)
      Since:
      3.1.0
    • createTable

      public abstract Dataset<Row> createTable(String tableName, String source, String description, scala.collection.immutable.Map<String,String> options)
      (Scala-specific) Creates a table based on the dataset in a data source and a set of options. Then, returns the corresponding DataFrame.

      Parameters:
      tableName - is either a qualified or unqualified name that designates a table. If no database identifier is provided, it refers to a table in the current database.
      source - (undocumented)
      description - (undocumented)
      options - (undocumented)
      Returns:
      (undocumented)
      Since:
      3.1.0
    • createTable

      public Dataset<Row> createTable(String tableName, String source, StructType schema, Map<String,String> options)
      Create a table based on the dataset in a data source, a schema and a set of options. Then, returns the corresponding DataFrame.

      Parameters:
      tableName - is either a qualified or unqualified name that designates a table. If no database identifier is provided, it refers to a table in the current database.
      source - (undocumented)
      schema - (undocumented)
      options - (undocumented)
      Returns:
      (undocumented)
      Since:
      2.2.0
    • createTable

      public abstract Dataset<Row> createTable(String tableName, String source, StructType schema, scala.collection.immutable.Map<String,String> options)
      (Scala-specific) Create a table based on the dataset in a data source, a schema and a set of options. Then, returns the corresponding DataFrame.

      Parameters:
      tableName - is either a qualified or unqualified name that designates a table. If no database identifier is provided, it refers to a table in the current database.
      source - (undocumented)
      schema - (undocumented)
      options - (undocumented)
      Returns:
      (undocumented)
      Since:
      2.2.0
    • createTable

      public Dataset<Row> createTable(String tableName, String source, StructType schema, String description, Map<String,String> options)
      Create a table based on the dataset in a data source, a schema and a set of options. Then, returns the corresponding DataFrame.

      Parameters:
      tableName - is either a qualified or unqualified name that designates a table. If no database identifier is provided, it refers to a table in the current database.
      source - (undocumented)
      schema - (undocumented)
      description - (undocumented)
      options - (undocumented)
      Returns:
      (undocumented)
      Since:
      3.1.0
    • createTable

      public abstract Dataset<Row> createTable(String tableName, String source, StructType schema, String description, scala.collection.immutable.Map<String,String> options)
      (Scala-specific) Create a table based on the dataset in a data source, a schema and a set of options. Then, returns the corresponding DataFrame.

      Parameters:
      tableName - is either a qualified or unqualified name that designates a table. If no database identifier is provided, it refers to a table in the current database.
      source - (undocumented)
      schema - (undocumented)
      description - (undocumented)
      options - (undocumented)
      Returns:
      (undocumented)
      Since:
      3.1.0
    • currentCatalog

      public abstract String currentCatalog()
      Returns the current catalog in this session.

      Returns:
      (undocumented)
      Since:
      3.4.0
    • currentDatabase

      public abstract String currentDatabase()
      Returns the current database (namespace) in this session.

      Returns:
      (undocumented)
      Since:
      2.0.0
    • databaseExists

      public abstract boolean databaseExists(String dbName)
      Check if the database (namespace) with the specified name exists (the name can be qualified with catalog).

      Parameters:
      dbName - (undocumented)
      Returns:
      (undocumented)
      Since:
      2.1.0
    • dropGlobalTempView

      public abstract boolean dropGlobalTempView(String viewName)
      Drops the global temporary view with the given view name in the catalog. If the view has been cached before, then it will also be uncached.

      Global temporary view is cross-session. Its lifetime is the lifetime of the Spark application, i.e. it will be automatically dropped when the application terminates. It's tied to a system preserved database global_temp, and we must use the qualified name to refer a global temp view, e.g. SELECT * FROM global_temp.view1.

      Parameters:
      viewName - the unqualified name of the temporary view to be dropped.
      Returns:
      true if the view is dropped successfully, false otherwise.
      Since:
      2.1.0
    • dropTempView

      public abstract boolean dropTempView(String viewName)
      Drops the local temporary view with the given view name in the catalog. If the view has been cached before, then it will also be uncached.

      Local temporary view is session-scoped. Its lifetime is the lifetime of the session that created it, i.e. it will be automatically dropped when the session terminates. It's not tied to any databases, i.e. we can't use db1.view1 to reference a local temporary view.

      Note that, the return type of this method was Unit in Spark 2.0, but changed to Boolean in Spark 2.1.

      Parameters:
      viewName - the name of the temporary view to be dropped.
      Returns:
      true if the view is dropped successfully, false otherwise.
      Since:
      2.0.0
    • functionExists

      public abstract boolean functionExists(String functionName)
      Check if the function with the specified name exists. This can either be a temporary function or a function.

      Parameters:
      functionName - is either a qualified or unqualified name that designates a function. It follows the same resolution rule with SQL: search for built-in/temp functions first then functions in the current database (namespace).
      Returns:
      (undocumented)
      Since:
      2.1.0
    • functionExists

      public abstract boolean functionExists(String dbName, String functionName)
      Check if the function with the specified name exists in the specified database under the Hive Metastore.

      To check existence of functions in other catalogs, please use functionExists(functionName) with qualified function name instead.

      Parameters:
      dbName - is an unqualified name that designates a database.
      functionName - is an unqualified name that designates a function.
      Returns:
      (undocumented)
      Since:
      2.1.0
    • getDatabase

      public abstract Database getDatabase(String dbName) throws AnalysisException
      Get the database (namespace) with the specified name (can be qualified with catalog). This throws an AnalysisException when the database (namespace) cannot be found.

      Parameters:
      dbName - (undocumented)
      Returns:
      (undocumented)
      Throws:
      AnalysisException
      Since:
      2.1.0
    • getFunction

      public abstract Function getFunction(String functionName) throws AnalysisException
      Get the function with the specified name. This function can be a temporary function or a function. This throws an AnalysisException when the function cannot be found.

      Parameters:
      functionName - is either a qualified or unqualified name that designates a function. It follows the same resolution rule with SQL: search for built-in/temp functions first then functions in the current database (namespace).
      Returns:
      (undocumented)
      Throws:
      AnalysisException
      Since:
      2.1.0
    • getFunction

      public abstract Function getFunction(String dbName, String functionName) throws AnalysisException
      Get the function with the specified name in the specified database under the Hive Metastore. This throws an AnalysisException when the function cannot be found.

      To get functions in other catalogs, please use getFunction(functionName) with qualified function name instead.

      Parameters:
      dbName - is an unqualified name that designates a database.
      functionName - is an unqualified name that designates a function in the specified database
      Returns:
      (undocumented)
      Throws:
      AnalysisException
      Since:
      2.1.0
    • getTable

      public abstract Table getTable(String tableName) throws AnalysisException
      Get the table or view with the specified name. This table can be a temporary view or a table/view. This throws an AnalysisException when no Table can be found.

      Parameters:
      tableName - is either a qualified or unqualified name that designates a table/view. It follows the same resolution rule with SQL: search for temp views first then table/views in the current database (namespace).
      Returns:
      (undocumented)
      Throws:
      AnalysisException
      Since:
      2.1.0
    • getTable

      public abstract Table getTable(String dbName, String tableName) throws AnalysisException
      Get the table or view with the specified name in the specified database under the Hive Metastore. This throws an AnalysisException when no Table can be found.

      To get table/view in other catalogs, please use getTable(tableName) with qualified table/view name instead.

      Parameters:
      dbName - (undocumented)
      tableName - (undocumented)
      Returns:
      (undocumented)
      Throws:
      AnalysisException
      Since:
      2.1.0
    • isCached

      public abstract boolean isCached(String tableName)
      Returns true if the table is currently cached in-memory.

      Parameters:
      tableName - is either a qualified or unqualified name that designates a table/view. If no database identifier is provided, it refers to a temporary view or a table/view in the current database.
      Returns:
      (undocumented)
      Since:
      2.0.0
    • listCatalogs

      public abstract Dataset<CatalogMetadata> listCatalogs()
      Returns a list of catalogs available in this session.

      Returns:
      (undocumented)
      Since:
      3.4.0
    • listCatalogs

      public abstract Dataset<CatalogMetadata> listCatalogs(String pattern)
      Returns a list of catalogs which name match the specify pattern and available in this session.

      Parameters:
      pattern - (undocumented)
      Returns:
      (undocumented)
      Since:
      3.5.0
    • listColumns

      public abstract Dataset<Column> listColumns(String tableName) throws AnalysisException
      Returns a list of columns for the given table/view or temporary view.

      Parameters:
      tableName - is either a qualified or unqualified name that designates a table/view. It follows the same resolution rule with SQL: search for temp views first then table/views in the current database (namespace).
      Returns:
      (undocumented)
      Throws:
      AnalysisException
      Since:
      2.0.0
    • listColumns

      public abstract Dataset<Column> listColumns(String dbName, String tableName) throws AnalysisException
      Returns a list of columns for the given table/view in the specified database under the Hive Metastore.

      To list columns for table/view in other catalogs, please use listColumns(tableName) with qualified table/view name instead.

      Parameters:
      dbName - is an unqualified name that designates a database.
      tableName - is an unqualified name that designates a table/view.
      Returns:
      (undocumented)
      Throws:
      AnalysisException
      Since:
      2.0.0
    • listDatabases

      public abstract Dataset<Database> listDatabases()
      Returns a list of databases (namespaces) available within the current catalog.

      Returns:
      (undocumented)
      Since:
      2.0.0
    • listDatabases

      public abstract Dataset<Database> listDatabases(String pattern)
      Returns a list of databases (namespaces) which name match the specify pattern and available within the current catalog.

      Parameters:
      pattern - (undocumented)
      Returns:
      (undocumented)
      Since:
      3.5.0
    • listFunctions

      public abstract Dataset<Function> listFunctions()
      Returns a list of functions registered in the current database (namespace). This includes all temporary functions.

      Returns:
      (undocumented)
      Since:
      2.0.0
    • listFunctions

      public abstract Dataset<Function> listFunctions(String dbName) throws AnalysisException
      Returns a list of functions registered in the specified database (namespace) (the name can be qualified with catalog). This includes all built-in and temporary functions.

      Parameters:
      dbName - (undocumented)
      Returns:
      (undocumented)
      Throws:
      AnalysisException
      Since:
      2.0.0
    • listFunctions

      public abstract Dataset<Function> listFunctions(String dbName, String pattern) throws AnalysisException
      Returns a list of functions registered in the specified database (namespace) which name match the specify pattern (the name can be qualified with catalog). This includes all built-in and temporary functions.

      Parameters:
      dbName - (undocumented)
      pattern - (undocumented)
      Returns:
      (undocumented)
      Throws:
      AnalysisException
      Since:
      3.5.0
    • listTables

      public abstract Dataset<Table> listTables()
      Returns a list of tables/views in the current database (namespace). This includes all temporary views.

      Returns:
      (undocumented)
      Since:
      2.0.0
    • listTables

      public abstract Dataset<Table> listTables(String dbName) throws AnalysisException
      Returns a list of tables/views in the specified database (namespace) (the name can be qualified with catalog). This includes all temporary views.

      Parameters:
      dbName - (undocumented)
      Returns:
      (undocumented)
      Throws:
      AnalysisException
      Since:
      2.0.0
    • listTables

      public abstract Dataset<Table> listTables(String dbName, String pattern) throws AnalysisException
      Returns a list of tables/views in the specified database (namespace) which name match the specify pattern (the name can be qualified with catalog). This includes all temporary views.

      Parameters:
      dbName - (undocumented)
      pattern - (undocumented)
      Returns:
      (undocumented)
      Throws:
      AnalysisException
      Since:
      3.5.0
    • recoverPartitions

      public abstract void recoverPartitions(String tableName)
      Recovers all the partitions in the directory of a table and update the catalog. Only works with a partitioned table, and not a view.

      Parameters:
      tableName - is either a qualified or unqualified name that designates a table. If no database identifier is provided, it refers to a table in the current database.
      Since:
      2.1.1
    • refreshByPath

      public abstract void refreshByPath(String path)
      Invalidates and refreshes all the cached data (and the associated metadata) for any Dataset that contains the given data source path. Path matching is by checking for sub-directories, i.e. "/" would invalidate everything that is cached and "/test/parent" would invalidate everything that is a subdirectory of "/test/parent".

      Parameters:
      path - (undocumented)
      Since:
      2.0.0
    • refreshTable

      public abstract void refreshTable(String tableName)
      Invalidates and refreshes all the cached data and metadata of the given table. For performance reasons, Spark SQL or the external data source library it uses might cache certain metadata about a table, such as the location of blocks. When those change outside of Spark SQL, users should call this function to invalidate the cache.

      If this table is cached as an InMemoryRelation, drop the original cached version and make the new version cached lazily.

      Parameters:
      tableName - is either a qualified or unqualified name that designates a table/view. If no database identifier is provided, it refers to a temporary view or a table/view in the current database.
      Since:
      2.0.0
    • setCurrentCatalog

      public abstract void setCurrentCatalog(String catalogName)
      Sets the current catalog in this session.

      Parameters:
      catalogName - (undocumented)
      Since:
      3.4.0
    • setCurrentDatabase

      public abstract void setCurrentDatabase(String dbName)
      Sets the current database (namespace) in this session.

      Parameters:
      dbName - (undocumented)
      Since:
      2.0.0
    • tableExists

      public abstract boolean tableExists(String tableName)
      Check if the table or view with the specified name exists. This can either be a temporary view or a table/view.

      Parameters:
      tableName - is either a qualified or unqualified name that designates a table/view. It follows the same resolution rule with SQL: search for temp views first then table/views in the current database (namespace).
      Returns:
      (undocumented)
      Since:
      2.1.0
    • tableExists

      public abstract boolean tableExists(String dbName, String tableName)
      Check if the table or view with the specified name exists in the specified database under the Hive Metastore.

      To check existence of table/view in other catalogs, please use tableExists(tableName) with qualified table/view name instead.

      Parameters:
      dbName - is an unqualified name that designates a database.
      tableName - is an unqualified name that designates a table.
      Returns:
      (undocumented)
      Since:
      2.1.0
    • uncacheTable

      public abstract void uncacheTable(String tableName)
      Removes the specified table from the in-memory cache.

      Parameters:
      tableName - is either a qualified or unqualified name that designates a table/view. If no database identifier is provided, it refers to a temporary view or a table/view in the current database.
      Since:
      2.0.0