pyspark.sql.Catalog#
- class pyspark.sql.Catalog(sparkSession)[source]#
Spark SQL catalog interface.
Use
catalogon an active session. This class is a thin wrapper aroundorg.apache.spark.sql.catalog.Catalog.Changed in version 3.4.0: Supports Spark Connect.
Methods
analyzeTable(tableName[, noScan])Computes table statistics (same as SQL
ANALYZE TABLE COMPUTE STATISTICS).cacheTable(tableName[, storageLevel])Caches the specified table in-memory or with given storage level.
Removes all cached tables from the in-memory cache.
createDatabase(dbName[, ifNotExists, properties])Creates a namespace (database/schema).
createExternalTable(tableName[, path, ...])Creates a table based on the dataset in a data source.
createTable(tableName[, path, source, ...])Creates a table based on the dataset in a data source.
Returns the current catalog in this session.
Returns the current database (namespace) in this session.
databaseExists(dbName)Check if the database with the specified name exists.
dropDatabase(dbName[, ifExists, cascade])Drops a namespace.
dropGlobalTempView(viewName)Drops the global temporary view with the given view name in the catalog.
dropTable(tableName[, ifExists, purge])Drops a persistent table.
dropTempView(viewName)Drops the local temporary view with the given view name in the catalog.
dropView(viewName[, ifExists])Drops a persistent view.
functionExists(functionName[, dbName])Check if the function with the specified name exists.
getCreateTableString(tableName[, asSerde])Returns the
SHOW CREATE TABLEDDL string for a relation.getDatabase(dbName)Get the database with the specified name.
getFunction(functionName)Get the function with the specified name.
getTable(tableName)Get the table or view with the specified name.
getTableProperties(tableName)Returns all table properties as a dict (same as
SHOW TBLPROPERTIES).isCached(tableName)Returns true if the table is currently cached in-memory.
Lists named in-memory cache entries (same as
SHOW CACHED TABLES).listCatalogs([pattern])Returns a list of catalogs available in this session.
listColumns(tableName[, dbName])Returns a list of columns for the given table/view in the specified database.
listDatabases([pattern])Returns a list of databases (namespaces) available within the current catalog.
listFunctions([dbName, pattern])Returns a list of functions registered in the current database (namespace), or in the database given by
dbNamewhen provided (the name may be qualified with catalog).listPartitions(tableName)Lists partition value strings for a table (same as
SHOW PARTITIONS).listTables([dbName, pattern])Returns a list of tables/views in the current database (namespace), or in the database given by
dbNamewhen provided (the name may be qualified with catalog).listViews([dbName, pattern])Lists views in a namespace.
recoverPartitions(tableName)Recovers all the partitions in the directory of a table and updates the catalog.
refreshByPath(path)Invalidates and refreshes all the cached data (and the associated metadata) for any DataFrame that contains the given data source path.
refreshTable(tableName)Invalidates and refreshes all the cached data and metadata of the given table.
registerFunction(name, f[, returnType])An alias for
spark.udf.register().setCurrentCatalog(catalogName)Sets the current catalog in this session.
setCurrentDatabase(dbName)Sets the current database (namespace) in this session.
tableExists(tableName[, dbName])Check if the table or view with the specified name exists.
truncateTable(tableName)Truncates a table (removes all data from the table; not supported for views).
uncacheTable(tableName)Removes the specified table from the in-memory cache.