Package org.apache.spark.sql
Class SQLContext
Object
org.apache.spark.sql.SQLContext
- All Implemented Interfaces:
Serializable
,org.apache.spark.internal.Logging
The entry point for working with structured data (rows and columns) in Spark 1.x.
As of Spark 2.0, this is replaced by SparkSession
. However, we are keeping the class
here for backward compatibility.
- Since:
- 1.0.0
- See Also:
-
Nested Class Summary
Modifier and TypeClassDescriptionclass
(Scala-specific) Implicit methods available in Scala for converting common Scala objects intoDataFrame
s.Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging
org.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
-
Constructor Summary
ConstructorDescriptionSQLContext
(JavaSparkContext sparkContext) Deprecated.Use SparkSession.builder instead.Deprecated.Use SparkSession.builder instead. -
Method Summary
Modifier and TypeMethodDescriptionapplySchema
(JavaRDD<?> rdd, Class<?> beanClass) Deprecated.Use createDataFrame instead.applySchema
(JavaRDD<Row> rowRDD, StructType schema) Deprecated.Use createDataFrame instead.applySchema
(RDD<?> rdd, Class<?> beanClass) Deprecated.Use createDataFrame instead.applySchema
(RDD<Row> rowRDD, StructType schema) Deprecated.Use createDataFrame instead.baseRelationToDataFrame
(BaseRelation baseRelation) void
cacheTable
(String tableName) Caches the specified table in-memory.static void
Deprecated.Use SparkSession.clearActiveSession instead.void
Removes all cached tables from the in-memory cache.createDataFrame
(List<?> data, Class<?> beanClass) createDataFrame
(List<Row> rows, StructType schema) createDataFrame
(JavaRDD<?> rdd, Class<?> beanClass) createDataFrame
(JavaRDD<Row> rowRDD, StructType schema) createDataFrame
(RDD<?> rdd, Class<?> beanClass) createDataFrame
(RDD<A> rdd, scala.reflect.api.TypeTags.TypeTag<A> evidence$1) createDataFrame
(RDD<Row> rowRDD, StructType schema) createDataFrame
(scala.collection.immutable.Seq<A> data, scala.reflect.api.TypeTags.TypeTag<A> evidence$2) <T> Dataset<T>
createDataset
(List<T> data, Encoder<T> evidence$5) <T> Dataset<T>
createDataset
(RDD<T> data, Encoder<T> evidence$4) <T> Dataset<T>
createDataset
(scala.collection.immutable.Seq<T> data, Encoder<T> evidence$3) createExternalTable
(String tableName, String path) Deprecated.use sparkSession.catalog.createTable instead.createExternalTable
(String tableName, String path, String source) Deprecated.use sparkSession.catalog.createTable instead.Deprecated.use sparkSession.catalog.createTable instead.createExternalTable
(String tableName, String source, StructType schema, Map<String, String> options) Deprecated.use sparkSession.catalog.createTable instead.createExternalTable
(String tableName, String source, StructType schema, scala.collection.immutable.Map<String, String> options) Deprecated.use sparkSession.catalog.createTable instead.createExternalTable
(String tableName, String source, scala.collection.immutable.Map<String, String> options) Deprecated.use sparkSession.catalog.createTable instead.void
dropTempTable
(String tableName) Returns aDataFrame
with no rows or columns.:: Experimental :: A collection of methods that are considered experimental, but can be used to hook into the query planner for advanced functionality.Return all the configuration properties that have been set (i.e.Return the value of Spark SQL configuration property for the given key.Return the value of Spark SQL configuration property for the given key.static SQLContext
getOrCreate
(SparkContext sparkContext) Deprecated.Use SparkSession.builder instead.Accessor for nested Scala objectboolean
Returns true if the table is currently cached in-memory.Deprecated.As of 1.4.0, replaced byread().jdbc()
.Deprecated.As of 1.4.0, replaced byread().jdbc()
.jdbc
(String url, String table, String columnName, long lowerBound, long upperBound, int numPartitions) Deprecated.As of 1.4.0, replaced byread().jdbc()
.Deprecated.As of 1.4.0, replaced byread().json()
.Deprecated.As of 1.4.0, replaced byread().json()
.jsonFile
(String path, StructType schema) Deprecated.As of 1.4.0, replaced byread().json()
.Deprecated.As of 1.4.0, replaced byread().json()
.Deprecated.As of 1.4.0, replaced byread().json()
.jsonRDD
(JavaRDD<String> json, StructType schema) Deprecated.As of 1.4.0, replaced byread().json()
.Deprecated.As of 1.4.0, replaced byread().json()
.Deprecated.As of 1.4.0, replaced byread().json()
.jsonRDD
(RDD<String> json, StructType schema) Deprecated.As of 1.4.0, replaced byread().json()
.An interface to register customQueryExecutionListener
s that listen for execution metrics.Deprecated.As of 1.4.0, replaced byread().load(path)
.Deprecated.As of 1.4.0, replaced byread().format(source).load(path)
.Deprecated.As of 1.4.0, replaced byread().format(source).options(options).load()
.Deprecated.As of 1.4.0, replaced byread().format(source).schema(schema).options(options).load()
.load
(String source, StructType schema, scala.collection.immutable.Map<String, String> options) Deprecated.As of 1.4.0, replaced byread().format(source).schema(schema).options(options).load()
.Deprecated.As of 1.4.0, replaced byread().format(source).options(options).load()
.Returns aSQLContext
as new session, with separated SQL configurations, temporary tables, registered functions, but sharing the sameSparkContext
, cached data and other things.parquetFile
(String... paths) Deprecated.As of 1.4.0, replaced byread().parquet()
.parquetFile
(scala.collection.immutable.Seq<String> paths) Deprecated.Use read.parquet() instead.range
(long end) range
(long start, long end) range
(long start, long end, long step) range
(long start, long end, long step, int numPartitions) read()
static void
setActive
(SQLContext sqlContext) Deprecated.Use SparkSession.setActiveSession instead.void
Set the given Spark SQL configuration property.void
setConf
(Properties props) Set Spark SQL configuration properties.streams()
String[]
String[]
tableNames
(String databaseName) tables()
udf()
A collection of methods for registering user-defined functions (UDF).void
uncacheTable
(String tableName) Removes the specified table from the in-memory cache.Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.apache.spark.internal.Logging
initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContext
-
Constructor Details
-
SQLContext
Deprecated.Use SparkSession.builder instead. Since 2.0.0. -
SQLContext
Deprecated.Use SparkSession.builder instead. Since 2.0.0.
-
-
Method Details
-
getOrCreate
Deprecated.Use SparkSession.builder instead. Since 2.0.0.Get the singleton SQLContext if it exists or create a new one using the given SparkContext.This function can be used to create a singleton SQLContext object that can be shared across the JVM.
If there is an active SQLContext for current thread, it will be returned instead of the global one.
- Parameters:
sparkContext
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
setActive
Deprecated.Use SparkSession.setActiveSession instead. Since 2.0.0.Changes the SQLContext that will be returned in this thread and its children when SQLContext.getOrCreate() is called. This can be used to ensure that a given thread receives a SQLContext with an isolated session, instead of the global (first created) context.- Parameters:
sqlContext
- (undocumented)- Since:
- 1.6.0
-
clearActive
public static void clearActive()Deprecated.Use SparkSession.clearActiveSession instead. Since 2.0.0.Clears the active SQLContext for current thread. Subsequent calls to getOrCreate will return the first created context instead of a thread-local override.- Since:
- 1.6.0
-
implicits
Accessor for nested Scala object- Returns:
- (undocumented)
-
parquetFile
Deprecated.As of 1.4.0, replaced byread().parquet()
.Loads a Parquet file, returning the result as aDataFrame
. This function returns an emptyDataFrame
if no paths are passed in.- Parameters:
paths
- (undocumented)- Returns:
- (undocumented)
-
sparkSession
-
sparkContext
-
newSession
Returns aSQLContext
as new session, with separated SQL configurations, temporary tables, registered functions, but sharing the sameSparkContext
, cached data and other things.- Returns:
- (undocumented)
- Since:
- 1.6.0
-
listenerManager
An interface to register customQueryExecutionListener
s that listen for execution metrics.- Returns:
- (undocumented)
-
setConf
Set Spark SQL configuration properties.- Parameters:
props
- (undocumented)- Since:
- 1.0.0
-
setConf
Set the given Spark SQL configuration property.- Parameters:
key
- (undocumented)value
- (undocumented)- Since:
- 1.0.0
-
getConf
Return the value of Spark SQL configuration property for the given key.- Parameters:
key
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.0.0
-
getConf
Return the value of Spark SQL configuration property for the given key. If the key is not set yet, returndefaultValue
.- Parameters:
key
- (undocumented)defaultValue
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.0.0
-
getAllConfs
Return all the configuration properties that have been set (i.e. not the default). This creates a new copy of the config properties in the form of a Map.- Returns:
- (undocumented)
- Since:
- 1.0.0
-
experimental
:: Experimental :: A collection of methods that are considered experimental, but can be used to hook into the query planner for advanced functionality.- Returns:
- (undocumented)
- Since:
- 1.3.0
-
emptyDataFrame
Returns aDataFrame
with no rows or columns.- Returns:
- (undocumented)
- Since:
- 1.3.0
-
udf
A collection of methods for registering user-defined functions (UDF).The following example registers a Scala closure as UDF:
sqlContext.udf.register("myUDF", (arg1: Int, arg2: String) => arg2 + arg1)
The following example registers a UDF in Java:
sqlContext.udf().register("myUDF", (Integer arg1, String arg2) -> arg2 + arg1, DataTypes.StringType);
- Returns:
- (undocumented)
- Since:
- 1.3.0
- Note:
- The user-defined functions must be deterministic. Due to optimization, duplicate invocations may be eliminated or the function may even be invoked more times than it is present in the query.
-
isCached
Returns true if the table is currently cached in-memory.- Parameters:
tableName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
cacheTable
Caches the specified table in-memory.- Parameters:
tableName
- (undocumented)- Since:
- 1.3.0
-
uncacheTable
Removes the specified table from the in-memory cache.- Parameters:
tableName
- (undocumented)- Since:
- 1.3.0
-
clearCache
public void clearCache()Removes all cached tables from the in-memory cache.- Since:
- 1.3.0
-
createDataFrame
-
createDataFrame
-
baseRelationToDataFrame
-
createDataFrame
-
createDataset
-
createDataset
-
createDataset
-
createDataFrame
-
createDataFrame
-
createDataFrame
-
createDataFrame
-
createDataFrame
-
read
-
readStream
-
createExternalTable
Deprecated.use sparkSession.catalog.createTable instead. Since 2.2.0. -
createExternalTable
Deprecated.use sparkSession.catalog.createTable instead. Since 2.2.0. -
createExternalTable
public Dataset<Row> createExternalTable(String tableName, String source, Map<String, String> options) Deprecated.use sparkSession.catalog.createTable instead. Since 2.2.0. -
createExternalTable
public Dataset<Row> createExternalTable(String tableName, String source, scala.collection.immutable.Map<String, String> options) Deprecated.use sparkSession.catalog.createTable instead. Since 2.2.0. -
createExternalTable
public Dataset<Row> createExternalTable(String tableName, String source, StructType schema, Map<String, String> options) Deprecated.use sparkSession.catalog.createTable instead. Since 2.2.0. -
createExternalTable
public Dataset<Row> createExternalTable(String tableName, String source, StructType schema, scala.collection.immutable.Map<String, String> options) Deprecated.use sparkSession.catalog.createTable instead. Since 2.2.0. -
dropTempTable
-
range
-
range
-
range
-
range
-
sql
-
table
-
tables
-
tables
-
streams
-
tableNames
-
tableNames
-
applySchema
Deprecated.Use createDataFrame instead. Since 1.3.0. -
applySchema
Deprecated.Use createDataFrame instead. Since 1.3.0. -
applySchema
Deprecated.Use createDataFrame instead. Since 1.3.0. -
applySchema
Deprecated.Use createDataFrame instead. Since 1.3.0. -
parquetFile
Deprecated.Use read.parquet() instead. Since 1.4.0. -
jsonFile
Deprecated.As of 1.4.0, replaced byread().json()
.Loads a JSON file (one object per line), returning the result as aDataFrame
. It goes through the entire dataset once to determine the schema.- Parameters:
path
- (undocumented)- Returns:
- (undocumented)
-
jsonFile
Deprecated.As of 1.4.0, replaced byread().json()
.Loads a JSON file (one object per line) and applies the given schema, returning the result as aDataFrame
.- Parameters:
path
- (undocumented)schema
- (undocumented)- Returns:
- (undocumented)
-
jsonFile
Deprecated.As of 1.4.0, replaced byread().json()
.- Parameters:
path
- (undocumented)samplingRatio
- (undocumented)- Returns:
- (undocumented)
-
jsonRDD
Deprecated.As of 1.4.0, replaced byread().json()
.Loads an RDD[String] storing JSON objects (one object per record), returning the result as aDataFrame
. It goes through the entire dataset once to determine the schema.- Parameters:
json
- (undocumented)- Returns:
- (undocumented)
-
jsonRDD
Deprecated.As of 1.4.0, replaced byread().json()
.Loads an RDD[String] storing JSON objects (one object per record), returning the result as aDataFrame
. It goes through the entire dataset once to determine the schema.- Parameters:
json
- (undocumented)- Returns:
- (undocumented)
-
jsonRDD
Deprecated.As of 1.4.0, replaced byread().json()
.Loads an RDD[String] storing JSON objects (one object per record) and applies the given schema, returning the result as aDataFrame
.- Parameters:
json
- (undocumented)schema
- (undocumented)- Returns:
- (undocumented)
-
jsonRDD
Deprecated.As of 1.4.0, replaced byread().json()
.Loads an JavaRDD[String] storing JSON objects (one object per record) and applies the given schema, returning the result as aDataFrame
.- Parameters:
json
- (undocumented)schema
- (undocumented)- Returns:
- (undocumented)
-
jsonRDD
Deprecated.As of 1.4.0, replaced byread().json()
.Loads an RDD[String] storing JSON objects (one object per record) inferring the schema, returning the result as aDataFrame
.- Parameters:
json
- (undocumented)samplingRatio
- (undocumented)- Returns:
- (undocumented)
-
jsonRDD
Deprecated.As of 1.4.0, replaced byread().json()
.Loads a JavaRDD[String] storing JSON objects (one object per record) inferring the schema, returning the result as aDataFrame
.- Parameters:
json
- (undocumented)samplingRatio
- (undocumented)- Returns:
- (undocumented)
-
load
Deprecated.As of 1.4.0, replaced byread().load(path)
.Returns the dataset stored at path as a DataFrame, using the default data source configured by spark.sql.sources.default.- Parameters:
path
- (undocumented)- Returns:
- (undocumented)
-
load
Deprecated.As of 1.4.0, replaced byread().format(source).load(path)
.Returns the dataset stored at path as a DataFrame, using the given data source.- Parameters:
path
- (undocumented)source
- (undocumented)- Returns:
- (undocumented)
-
load
Deprecated.As of 1.4.0, replaced byread().format(source).options(options).load()
.(Java-specific) Returns the dataset specified by the given data source and a set of options as a DataFrame.- Parameters:
source
- (undocumented)options
- (undocumented)- Returns:
- (undocumented)
-
load
Deprecated.As of 1.4.0, replaced byread().format(source).options(options).load()
.(Scala-specific) Returns the dataset specified by the given data source and a set of options as a DataFrame.- Parameters:
source
- (undocumented)options
- (undocumented)- Returns:
- (undocumented)
-
load
Deprecated.As of 1.4.0, replaced byread().format(source).schema(schema).options(options).load()
.(Java-specific) Returns the dataset specified by the given data source and a set of options as a DataFrame, using the given schema as the schema of the DataFrame.- Parameters:
source
- (undocumented)schema
- (undocumented)options
- (undocumented)- Returns:
- (undocumented)
-
load
public Dataset<Row> load(String source, StructType schema, scala.collection.immutable.Map<String, String> options) Deprecated.As of 1.4.0, replaced byread().format(source).schema(schema).options(options).load()
.(Scala-specific) Returns the dataset specified by the given data source and a set of options as a DataFrame, using the given schema as the schema of the DataFrame.- Parameters:
source
- (undocumented)schema
- (undocumented)options
- (undocumented)- Returns:
- (undocumented)
-
jdbc
Deprecated.As of 1.4.0, replaced byread().jdbc()
.Construct aDataFrame
representing the database table accessible via JDBC URL url named table.- Parameters:
url
- (undocumented)table
- (undocumented)- Returns:
- (undocumented)
-
jdbc
public Dataset<Row> jdbc(String url, String table, String columnName, long lowerBound, long upperBound, int numPartitions) Deprecated.As of 1.4.0, replaced byread().jdbc()
.Construct aDataFrame
representing the database table accessible via JDBC URL url named table. Partitions of the table will be retrieved in parallel based on the parameters passed to this function.- Parameters:
columnName
- the name of a column of integral type that will be used for partitioning.lowerBound
- the minimum value ofcolumnName
used to decide partition strideupperBound
- the maximum value ofcolumnName
used to decide partition stridenumPartitions
- the number of partitions. the rangeminValue
-maxValue
will be split evenly into this many partitionsurl
- (undocumented)table
- (undocumented)- Returns:
- (undocumented)
-
jdbc
Deprecated.As of 1.4.0, replaced byread().jdbc()
.Construct aDataFrame
representing the database table accessible via JDBC URL url named table. The theParts parameter gives a list expressions suitable for inclusion in WHERE clauses; each one defines one partition of theDataFrame
.- Parameters:
url
- (undocumented)table
- (undocumented)theParts
- (undocumented)- Returns:
- (undocumented)
-