Packages

  • package root
    Definition Classes
    root
  • package org
    Definition Classes
    root
  • package apache
    Definition Classes
    org
  • package spark

    Core Spark functionality.

    Core Spark functionality. org.apache.spark.SparkContext serves as the main entry point to Spark, while org.apache.spark.rdd.RDD is the data type representing a distributed collection, and provides most parallel operations.

    In addition, org.apache.spark.rdd.PairRDDFunctions contains operations available only on RDDs of key-value pairs, such as groupByKey and join; org.apache.spark.rdd.DoubleRDDFunctions contains operations available only on RDDs of Doubles; and org.apache.spark.rdd.SequenceFileRDDFunctions contains operations available on RDDs that can be saved as SequenceFiles. These operations are automatically available on any RDD of the right type (e.g. RDD[(Int, Int)] through implicit conversions.

    Java programmers should reference the org.apache.spark.api.java package for Spark programming APIs in Java.

    Classes and methods marked with Experimental are user-facing features which have not been officially adopted by the Spark project. These are subject to change or removal in minor releases.

    Classes and methods marked with Developer API are intended for advanced users want to extend Spark through lower level interfaces. These are subject to changes or removal in minor releases.

    Definition Classes
    apache
  • package sql

    Allows the execution of relational queries, including those expressed in SQL using Spark.

    Allows the execution of relational queries, including those expressed in SQL using Spark.

    Definition Classes
    spark
  • package catalog
    Definition Classes
    sql
  • Catalog
  • CatalogMetadata
  • Column
  • Database
  • Function
  • Table

abstract class Catalog extends AnyRef

Catalog interface for Spark. To access this, use SparkSession.catalog.

Annotations
@Stable()
Source
Catalog.scala
Since

2.0.0

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. Catalog
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new Catalog()

Abstract Value Members

  1. abstract def cacheTable(tableName: String, storageLevel: StorageLevel): Unit

    Caches the specified table with the given storage level.

    Caches the specified table with the given storage level.

    tableName

    is either a qualified or unqualified name that designates a table/view. If no database identifier is provided, it refers to a temporary view or a table/view in the current database.

    storageLevel

    storage level to cache table.

    Since

    2.3.0

  2. abstract def cacheTable(tableName: String): Unit

    Caches the specified table in-memory.

    Caches the specified table in-memory.

    tableName

    is either a qualified or unqualified name that designates a table/view. If no database identifier is provided, it refers to a temporary view or a table/view in the current database.

    Since

    2.0.0

  3. abstract def clearCache(): Unit

    Removes all cached tables from the in-memory cache.

    Removes all cached tables from the in-memory cache.

    Since

    2.0.0

  4. abstract def createTable(tableName: String, source: String, schema: StructType, description: String, options: Map[String, String]): DataFrame

    (Scala-specific) Create a table based on the dataset in a data source, a schema and a set of options.

    (Scala-specific) Create a table based on the dataset in a data source, a schema and a set of options. Then, returns the corresponding DataFrame.

    tableName

    is either a qualified or unqualified name that designates a table. If no database identifier is provided, it refers to a table in the current database.

    Since

    3.1.0

  5. abstract def createTable(tableName: String, source: String, schema: StructType, options: Map[String, String]): DataFrame

    (Scala-specific) Create a table based on the dataset in a data source, a schema and a set of options.

    (Scala-specific) Create a table based on the dataset in a data source, a schema and a set of options. Then, returns the corresponding DataFrame.

    tableName

    is either a qualified or unqualified name that designates a table. If no database identifier is provided, it refers to a table in the current database.

    Since

    2.2.0

  6. abstract def createTable(tableName: String, source: String, description: String, options: Map[String, String]): DataFrame

    (Scala-specific) Creates a table based on the dataset in a data source and a set of options.

    (Scala-specific) Creates a table based on the dataset in a data source and a set of options. Then, returns the corresponding DataFrame.

    tableName

    is either a qualified or unqualified name that designates a table. If no database identifier is provided, it refers to a table in the current database.

    Since

    3.1.0

  7. abstract def createTable(tableName: String, source: String, options: Map[String, String]): DataFrame

    (Scala-specific) Creates a table based on the dataset in a data source and a set of options.

    (Scala-specific) Creates a table based on the dataset in a data source and a set of options. Then, returns the corresponding DataFrame.

    tableName

    is either a qualified or unqualified name that designates a table. If no database identifier is provided, it refers to a table in the current database.

    Since

    2.2.0

  8. abstract def createTable(tableName: String, path: String, source: String): DataFrame

    Creates a table from the given path based on a data source and returns the corresponding DataFrame.

    Creates a table from the given path based on a data source and returns the corresponding DataFrame.

    tableName

    is either a qualified or unqualified name that designates a table. If no database identifier is provided, it refers to a table in the current database.

    Since

    2.2.0

  9. abstract def createTable(tableName: String, path: String): DataFrame

    Creates a table from the given path and returns the corresponding DataFrame.

    Creates a table from the given path and returns the corresponding DataFrame. It will use the default data source configured by spark.sql.sources.default.

    tableName

    is either a qualified or unqualified name that designates a table. If no database identifier is provided, it refers to a table in the current database.

    Since

    2.2.0

  10. abstract def currentCatalog(): String

    Returns the current catalog in this session.

    Returns the current catalog in this session.

    Since

    3.4.0

  11. abstract def currentDatabase: String

    Returns the current database (namespace) in this session.

    Returns the current database (namespace) in this session.

    Since

    2.0.0

  12. abstract def databaseExists(dbName: String): Boolean

    Check if the database (namespace) with the specified name exists (the name can be qualified with catalog).

    Check if the database (namespace) with the specified name exists (the name can be qualified with catalog).

    Since

    2.1.0

  13. abstract def dropGlobalTempView(viewName: String): Boolean

    Drops the global temporary view with the given view name in the catalog.

    Drops the global temporary view with the given view name in the catalog. If the view has been cached before, then it will also be uncached.

    Global temporary view is cross-session. Its lifetime is the lifetime of the Spark application, i.e. it will be automatically dropped when the application terminates. It's tied to a system preserved database global_temp, and we must use the qualified name to refer a global temp view, e.g. SELECT * FROM global_temp.view1.

    viewName

    the unqualified name of the temporary view to be dropped.

    returns

    true if the view is dropped successfully, false otherwise.

    Since

    2.1.0

  14. abstract def dropTempView(viewName: String): Boolean

    Drops the local temporary view with the given view name in the catalog.

    Drops the local temporary view with the given view name in the catalog. If the view has been cached before, then it will also be uncached.

    Local temporary view is session-scoped. Its lifetime is the lifetime of the session that created it, i.e. it will be automatically dropped when the session terminates. It's not tied to any databases, i.e. we can't use db1.view1 to reference a local temporary view.

    Note that, the return type of this method was Unit in Spark 2.0, but changed to Boolean in Spark 2.1.

    viewName

    the name of the temporary view to be dropped.

    returns

    true if the view is dropped successfully, false otherwise.

    Since

    2.0.0

  15. abstract def functionExists(dbName: String, functionName: String): Boolean

    Check if the function with the specified name exists in the specified database under the Hive Metastore.

    Check if the function with the specified name exists in the specified database under the Hive Metastore.

    To check existence of functions in other catalogs, please use functionExists(functionName) with qualified function name instead.

    dbName

    is an unqualified name that designates a database.

    functionName

    is an unqualified name that designates a function.

    Since

    2.1.0

  16. abstract def functionExists(functionName: String): Boolean

    Check if the function with the specified name exists.

    Check if the function with the specified name exists. This can either be a temporary function or a function.

    functionName

    is either a qualified or unqualified name that designates a function. It follows the same resolution rule with SQL: search for built-in/temp functions first then functions in the current database (namespace).

    Since

    2.1.0

  17. abstract def getDatabase(dbName: String): Database

    Get the database (namespace) with the specified name (can be qualified with catalog).

    Get the database (namespace) with the specified name (can be qualified with catalog). This throws an AnalysisException when the database (namespace) cannot be found.

    Annotations
    @throws( "database does not exist" )
    Since

    2.1.0

  18. abstract def getFunction(dbName: String, functionName: String): Function

    Get the function with the specified name in the specified database under the Hive Metastore.

    Get the function with the specified name in the specified database under the Hive Metastore. This throws an AnalysisException when the function cannot be found.

    To get functions in other catalogs, please use getFunction(functionName) with qualified function name instead.

    dbName

    is an unqualified name that designates a database.

    functionName

    is an unqualified name that designates a function in the specified database

    Annotations
    @throws( ... )
    Since

    2.1.0

  19. abstract def getFunction(functionName: String): Function

    Get the function with the specified name.

    Get the function with the specified name. This function can be a temporary function or a function. This throws an AnalysisException when the function cannot be found.

    functionName

    is either a qualified or unqualified name that designates a function. It follows the same resolution rule with SQL: search for built-in/temp functions first then functions in the current database (namespace).

    Annotations
    @throws( "function does not exist" )
    Since

    2.1.0

  20. abstract def getTable(dbName: String, tableName: String): Table

    Get the table or view with the specified name in the specified database under the Hive Metastore.

    Get the table or view with the specified name in the specified database under the Hive Metastore. This throws an AnalysisException when no Table can be found.

    To get table/view in other catalogs, please use getTable(tableName) with qualified table/view name instead.

    Annotations
    @throws( "database or table does not exist" )
    Since

    2.1.0

  21. abstract def getTable(tableName: String): Table

    Get the table or view with the specified name.

    Get the table or view with the specified name. This table can be a temporary view or a table/view. This throws an AnalysisException when no Table can be found.

    tableName

    is either a qualified or unqualified name that designates a table/view. It follows the same resolution rule with SQL: search for temp views first then table/views in the current database (namespace).

    Annotations
    @throws( "table does not exist" )
    Since

    2.1.0

  22. abstract def isCached(tableName: String): Boolean

    Returns true if the table is currently cached in-memory.

    Returns true if the table is currently cached in-memory.

    tableName

    is either a qualified or unqualified name that designates a table/view. If no database identifier is provided, it refers to a temporary view or a table/view in the current database.

    Since

    2.0.0

  23. abstract def listCatalogs(): Dataset[CatalogMetadata]

    Returns a list of catalogs available in this session.

    Returns a list of catalogs available in this session.

    Since

    3.4.0

  24. abstract def listColumns(dbName: String, tableName: String): Dataset[Column]

    Returns a list of columns for the given table/view in the specified database under the Hive Metastore.

    Returns a list of columns for the given table/view in the specified database under the Hive Metastore.

    To list columns for table/view in other catalogs, please use listColumns(tableName) with qualified table/view name instead.

    dbName

    is an unqualified name that designates a database.

    tableName

    is an unqualified name that designates a table/view.

    Annotations
    @throws( "database or table does not exist" )
    Since

    2.0.0

  25. abstract def listColumns(tableName: String): Dataset[Column]

    Returns a list of columns for the given table/view or temporary view.

    Returns a list of columns for the given table/view or temporary view.

    tableName

    is either a qualified or unqualified name that designates a table/view. It follows the same resolution rule with SQL: search for temp views first then table/views in the current database (namespace).

    Annotations
    @throws( "table does not exist" )
    Since

    2.0.0

  26. abstract def listDatabases(): Dataset[Database]

    Returns a list of databases (namespaces) available within the current catalog.

    Returns a list of databases (namespaces) available within the current catalog.

    Since

    2.0.0

  27. abstract def listFunctions(dbName: String): Dataset[Function]

    Returns a list of functions registered in the specified database (namespace) (the name can be qualified with catalog).

    Returns a list of functions registered in the specified database (namespace) (the name can be qualified with catalog). This includes all built-in and temporary functions.

    Annotations
    @throws( "database does not exist" )
    Since

    2.0.0

  28. abstract def listFunctions(): Dataset[Function]

    Returns a list of functions registered in the current database (namespace).

    Returns a list of functions registered in the current database (namespace). This includes all temporary functions.

    Since

    2.0.0

  29. abstract def listTables(dbName: String): Dataset[Table]

    Returns a list of tables/views in the specified database (namespace) (the name can be qualified with catalog).

    Returns a list of tables/views in the specified database (namespace) (the name can be qualified with catalog). This includes all temporary views.

    Annotations
    @throws( "database does not exist" )
    Since

    2.0.0

  30. abstract def listTables(): Dataset[Table]

    Returns a list of tables/views in the current database (namespace).

    Returns a list of tables/views in the current database (namespace). This includes all temporary views.

    Since

    2.0.0

  31. abstract def recoverPartitions(tableName: String): Unit

    Recovers all the partitions in the directory of a table and update the catalog.

    Recovers all the partitions in the directory of a table and update the catalog. Only works with a partitioned table, and not a view.

    tableName

    is either a qualified or unqualified name that designates a table. If no database identifier is provided, it refers to a table in the current database.

    Since

    2.1.1

  32. abstract def refreshByPath(path: String): Unit

    Invalidates and refreshes all the cached data (and the associated metadata) for any Dataset that contains the given data source path.

    Invalidates and refreshes all the cached data (and the associated metadata) for any Dataset that contains the given data source path. Path matching is by prefix, i.e. "/" would invalidate everything that is cached.

    Since

    2.0.0

  33. abstract def refreshTable(tableName: String): Unit

    Invalidates and refreshes all the cached data and metadata of the given table.

    Invalidates and refreshes all the cached data and metadata of the given table. For performance reasons, Spark SQL or the external data source library it uses might cache certain metadata about a table, such as the location of blocks. When those change outside of Spark SQL, users should call this function to invalidate the cache.

    If this table is cached as an InMemoryRelation, drop the original cached version and make the new version cached lazily.

    tableName

    is either a qualified or unqualified name that designates a table/view. If no database identifier is provided, it refers to a temporary view or a table/view in the current database.

    Since

    2.0.0

  34. abstract def setCurrentCatalog(catalogName: String): Unit

    Sets the current catalog in this session.

    Sets the current catalog in this session.

    Since

    3.4.0

  35. abstract def setCurrentDatabase(dbName: String): Unit

    Sets the current database (namespace) in this session.

    Sets the current database (namespace) in this session.

    Since

    2.0.0

  36. abstract def tableExists(dbName: String, tableName: String): Boolean

    Check if the table or view with the specified name exists in the specified database under the Hive Metastore.

    Check if the table or view with the specified name exists in the specified database under the Hive Metastore.

    To check existence of table/view in other catalogs, please use tableExists(tableName) with qualified table/view name instead.

    dbName

    is an unqualified name that designates a database.

    tableName

    is an unqualified name that designates a table.

    Since

    2.1.0

  37. abstract def tableExists(tableName: String): Boolean

    Check if the table or view with the specified name exists.

    Check if the table or view with the specified name exists. This can either be a temporary view or a table/view.

    tableName

    is either a qualified or unqualified name that designates a table/view. It follows the same resolution rule with SQL: search for temp views first then table/views in the current database (namespace).

    Since

    2.1.0

  38. abstract def uncacheTable(tableName: String): Unit

    Removes the specified table from the in-memory cache.

    Removes the specified table from the in-memory cache.

    tableName

    is either a qualified or unqualified name that designates a table/view. If no database identifier is provided, it refers to a temporary view or a table/view in the current database.

    Since

    2.0.0

Concrete Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  6. def createTable(tableName: String, source: String, schema: StructType, description: String, options: Map[String, String]): DataFrame

    Create a table based on the dataset in a data source, a schema and a set of options.

    Create a table based on the dataset in a data source, a schema and a set of options. Then, returns the corresponding DataFrame.

    tableName

    is either a qualified or unqualified name that designates a table. If no database identifier is provided, it refers to a table in the current database.

    Since

    3.1.0

  7. def createTable(tableName: String, source: String, schema: StructType, options: Map[String, String]): DataFrame

    Create a table based on the dataset in a data source, a schema and a set of options.

    Create a table based on the dataset in a data source, a schema and a set of options. Then, returns the corresponding DataFrame.

    tableName

    is either a qualified or unqualified name that designates a table. If no database identifier is provided, it refers to a table in the current database.

    Since

    2.2.0

  8. def createTable(tableName: String, source: String, description: String, options: Map[String, String]): DataFrame

    Creates a table based on the dataset in a data source and a set of options.

    Creates a table based on the dataset in a data source and a set of options. Then, returns the corresponding DataFrame.

    tableName

    is either a qualified or unqualified name that designates a table. If no database identifier is provided, it refers to a table in the current database.

    Since

    3.1.0

  9. def createTable(tableName: String, source: String, options: Map[String, String]): DataFrame

    Creates a table based on the dataset in a data source and a set of options.

    Creates a table based on the dataset in a data source and a set of options. Then, returns the corresponding DataFrame.

    tableName

    is either a qualified or unqualified name that designates a table. If no database identifier is provided, it refers to a table in the current database.

    Since

    2.2.0

  10. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  11. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  12. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  13. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  14. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  15. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  16. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  17. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  18. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  19. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  20. def toString(): String
    Definition Classes
    AnyRef → Any
  21. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  22. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  23. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()

Deprecated Value Members

  1. def createExternalTable(tableName: String, source: String, schema: StructType, options: Map[String, String]): DataFrame

    (Scala-specific) Create a table from the given path based on a data source, a schema and a set of options.

    (Scala-specific) Create a table from the given path based on a data source, a schema and a set of options. Then, returns the corresponding DataFrame.

    tableName

    is either a qualified or unqualified name that designates a table. If no database identifier is provided, it refers to a table in the current database.

    Annotations
    @deprecated
    Deprecated

    (Since version 2.2.0) use createTable instead.

    Since

    2.0.0

  2. def createExternalTable(tableName: String, source: String, schema: StructType, options: Map[String, String]): DataFrame

    Create a table from the given path based on a data source, a schema and a set of options.

    Create a table from the given path based on a data source, a schema and a set of options. Then, returns the corresponding DataFrame.

    tableName

    is either a qualified or unqualified name that designates a table. If no database identifier is provided, it refers to a table in the current database.

    Annotations
    @deprecated
    Deprecated

    (Since version 2.2.0) use createTable instead.

    Since

    2.0.0

  3. def createExternalTable(tableName: String, source: String, options: Map[String, String]): DataFrame

    (Scala-specific) Creates a table from the given path based on a data source and a set of options.

    (Scala-specific) Creates a table from the given path based on a data source and a set of options. Then, returns the corresponding DataFrame.

    tableName

    is either a qualified or unqualified name that designates a table. If no database identifier is provided, it refers to a table in the current database.

    Annotations
    @deprecated
    Deprecated

    (Since version 2.2.0) use createTable instead.

    Since

    2.0.0

  4. def createExternalTable(tableName: String, source: String, options: Map[String, String]): DataFrame

    Creates a table from the given path based on a data source and a set of options.

    Creates a table from the given path based on a data source and a set of options. Then, returns the corresponding DataFrame.

    tableName

    is either a qualified or unqualified name that designates a table. If no database identifier is provided, it refers to a table in the current database.

    Annotations
    @deprecated
    Deprecated

    (Since version 2.2.0) use createTable instead.

    Since

    2.0.0

  5. def createExternalTable(tableName: String, path: String, source: String): DataFrame

    Creates a table from the given path based on a data source and returns the corresponding DataFrame.

    Creates a table from the given path based on a data source and returns the corresponding DataFrame.

    tableName

    is either a qualified or unqualified name that designates a table. If no database identifier is provided, it refers to a table in the current database.

    Annotations
    @deprecated
    Deprecated

    (Since version 2.2.0) use createTable instead.

    Since

    2.0.0

  6. def createExternalTable(tableName: String, path: String): DataFrame

    Creates a table from the given path and returns the corresponding DataFrame.

    Creates a table from the given path and returns the corresponding DataFrame. It will use the default data source configured by spark.sql.sources.default.

    tableName

    is either a qualified or unqualified name that designates a table. If no database identifier is provided, it refers to a table in the current database.

    Annotations
    @deprecated
    Deprecated

    (Since version 2.2.0) use createTable instead.

    Since

    2.0.0

Inherited from AnyRef

Inherited from Any

Ungrouped