org.apache.spark.sql
Class SQLContext

Object
  extended by org.apache.spark.sql.SQLContext
All Implemented Interfaces:
java.io.Serializable, Logging
Direct Known Subclasses:
HiveContext, LocalSQLContext

public class SQLContext
extends Object
implements Logging, scala.Serializable

The entry point for working with structured data (rows and columns) in Spark. Allows the creation of DataFrame objects as well as the execution of SQL queries.

Since:
1.0.0
See Also:
Serialized Form

Nested Class Summary
 class SQLContext.implicits$
          :: Experimental :: (Scala-specific) Implicit methods available in Scala for converting common Scala objects into DataFrames.
 
Constructor Summary
SQLContext(JavaSparkContext sparkContext)
           
SQLContext(SparkContext sparkContext)
           
 
Method Summary
 DataFrame applySchema(JavaRDD<?> rdd, Class<?> beanClass)
           
 DataFrame applySchema(JavaRDD<Row> rowRDD, StructType schema)
           
 DataFrame applySchema(RDD<?> rdd, Class<?> beanClass)
           
 DataFrame applySchema(RDD<Row> rowRDD, StructType schema)
           
 DataFrame baseRelationToDataFrame(BaseRelation baseRelation)
           
 void cacheTable(String tableName)
          Caches the specified table in-memory.
 void clearCache()
          Removes all cached tables from the in-memory cache.
 DataFrame createDataFrame(JavaRDD<?> rdd, Class<?> beanClass)
           
 DataFrame createDataFrame(JavaRDD<Row> rowRDD, StructType schema)
           
 DataFrame createDataFrame(RDD<?> rdd, Class<?> beanClass)
           
<A extends scala.Product>
DataFrame
createDataFrame(RDD<A> rdd, scala.reflect.api.TypeTags.TypeTag<A> evidence$3)
           
 DataFrame createDataFrame(RDD<Row> rowRDD, StructType schema)
           
<A extends scala.Product>
DataFrame
createDataFrame(scala.collection.Seq<A> data, scala.reflect.api.TypeTags.TypeTag<A> evidence$4)
           
 DataFrame createExternalTable(String tableName, String path)
           
 DataFrame createExternalTable(String tableName, String source, java.util.Map<String,String> options)
           
 DataFrame createExternalTable(String tableName, String source, scala.collection.immutable.Map<String,String> options)
           
 DataFrame createExternalTable(String tableName, String path, String source)
           
 DataFrame createExternalTable(String tableName, String source, StructType schema, java.util.Map<String,String> options)
           
 DataFrame createExternalTable(String tableName, String source, StructType schema, scala.collection.immutable.Map<String,String> options)
           
 void dropTempTable(String tableName)
           
 DataFrame emptyDataFrame()
          :: Experimental :: Returns a DataFrame with no rows or columns.
 ExperimentalMethods experimental()
          :: Experimental :: A collection of methods that are considered experimental, but can be used to hook into the query planner for advanced functionality.
 scala.collection.immutable.Map<String,String> getAllConfs()
          Return all the configuration properties that have been set (i.e.
 String getConf(String key)
          Return the value of Spark SQL configuration property for the given key.
 String getConf(String key, String defaultValue)
          Return the value of Spark SQL configuration property for the given key.
static SQLContext getOrCreate(SparkContext sparkContext)
          Get the singleton SQLContext if it exists or create a new one using the given SparkContext.
 SQLContext.implicits$ implicits()
          Accessor for nested Scala object
 boolean isCached(String tableName)
          Returns true if the table is currently cached in-memory.
 DataFrame jdbc(String url, String table)
          Deprecated. As of 1.4.0, replaced by read().jdbc().
 DataFrame jdbc(String url, String table, String[] theParts)
          Deprecated. As of 1.4.0, replaced by read().jdbc().
 DataFrame jdbc(String url, String table, String columnName, long lowerBound, long upperBound, int numPartitions)
          Deprecated. As of 1.4.0, replaced by read().jdbc().
 DataFrame jsonFile(String path)
          Deprecated. As of 1.4.0, replaced by read().json().
 DataFrame jsonFile(String path, double samplingRatio)
          Deprecated. As of 1.4.0, replaced by read().json().
 DataFrame jsonFile(String path, StructType schema)
          Deprecated. As of 1.4.0, replaced by read().json().
 DataFrame jsonRDD(JavaRDD<String> json)
          Deprecated. As of 1.4.0, replaced by read().json().
 DataFrame jsonRDD(JavaRDD<String> json, double samplingRatio)
          Deprecated. As of 1.4.0, replaced by read().json().
 DataFrame jsonRDD(JavaRDD<String> json, StructType schema)
          Deprecated. As of 1.4.0, replaced by read().json().
 DataFrame jsonRDD(RDD<String> json)
          Deprecated. As of 1.4.0, replaced by read().json().
 DataFrame jsonRDD(RDD<String> json, double samplingRatio)
          Deprecated. As of 1.4.0, replaced by read().json().
 DataFrame jsonRDD(RDD<String> json, StructType schema)
          Deprecated. As of 1.4.0, replaced by read().json().
 DataFrame load(String path)
          Deprecated. As of 1.4.0, replaced by read().load(path).
 DataFrame load(String source, java.util.Map<String,String> options)
          Deprecated. As of 1.4.0, replaced by read().format(source).options(options).load().
 DataFrame load(String source, scala.collection.immutable.Map<String,String> options)
          Deprecated. As of 1.4.0, replaced by read().format(source).options(options).load().
 DataFrame load(String path, String source)
          Deprecated. As of 1.4.0, replaced by read().format(source).load(path).
 DataFrame load(String source, StructType schema, java.util.Map<String,String> options)
          Deprecated. As of 1.4.0, replaced by read().format(source).schema(schema).options(options).load().
 DataFrame load(String source, StructType schema, scala.collection.immutable.Map<String,String> options)
          Deprecated. As of 1.4.0, replaced by read().format(source).schema(schema).options(options).load().
 DataFrame parquetFile(scala.collection.Seq<String> paths)
           
 DataFrame parquetFile(String... paths)
          Deprecated. As of 1.4.0, replaced by read().parquet().
 DataFrame range(long end)
           
 DataFrame range(long start, long end)
           
 DataFrame range(long start, long end, long step, int numPartitions)
           
 DataFrameReader read()
           
 void setConf(java.util.Properties props)
          Set Spark SQL configuration properties.
 void setConf(String key, String value)
          Set the given Spark SQL configuration property.
 SparkContext sparkContext()
           
 DataFrame sql(String sqlText)
           
 DataFrame table(String tableName)
           
 String[] tableNames()
           
 String[] tableNames(String databaseName)
           
 DataFrame tables()
           
 DataFrame tables(String databaseName)
           
 UDFRegistration udf()
          A collection of methods for registering user-defined functions (UDF).
 void uncacheTable(String tableName)
          Removes the specified table from the in-memory cache.
 
Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.spark.Logging
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning
 

Constructor Detail

SQLContext

public SQLContext(SparkContext sparkContext)

SQLContext

public SQLContext(JavaSparkContext sparkContext)
Method Detail

getOrCreate

public static SQLContext getOrCreate(SparkContext sparkContext)
Get the singleton SQLContext if it exists or create a new one using the given SparkContext. This function can be used to create a singleton SQLContext object that can be shared across the JVM.

Parameters:
sparkContext - (undocumented)
Returns:
(undocumented)

parquetFile

public DataFrame parquetFile(String... paths)
Deprecated. As of 1.4.0, replaced by read().parquet().

Loads a Parquet file, returning the result as a DataFrame. This function returns an empty DataFrame if no paths are passed in.

Parameters:
paths - (undocumented)
Returns:
(undocumented)

sparkContext

public SparkContext sparkContext()

setConf

public void setConf(java.util.Properties props)
Set Spark SQL configuration properties.

Parameters:
props - (undocumented)
Since:
1.0.0

setConf

public void setConf(String key,
                    String value)
Set the given Spark SQL configuration property.

Parameters:
key - (undocumented)
value - (undocumented)
Since:
1.0.0

getConf

public String getConf(String key)
Return the value of Spark SQL configuration property for the given key.

Parameters:
key - (undocumented)
Returns:
(undocumented)
Since:
1.0.0

getConf

public String getConf(String key,
                      String defaultValue)
Return the value of Spark SQL configuration property for the given key. If the key is not set yet, return defaultValue.

Parameters:
key - (undocumented)
defaultValue - (undocumented)
Returns:
(undocumented)
Since:
1.0.0

getAllConfs

public scala.collection.immutable.Map<String,String> getAllConfs()
Return all the configuration properties that have been set (i.e. not the default). This creates a new copy of the config properties in the form of a Map.

Returns:
(undocumented)
Since:
1.0.0

experimental

public ExperimentalMethods experimental()
:: Experimental :: A collection of methods that are considered experimental, but can be used to hook into the query planner for advanced functionality.

Returns:
(undocumented)
Since:
1.3.0

emptyDataFrame

public DataFrame emptyDataFrame()
:: Experimental :: Returns a DataFrame with no rows or columns.

Returns:
(undocumented)
Since:
1.3.0

udf

public UDFRegistration udf()
A collection of methods for registering user-defined functions (UDF).

The following example registers a Scala closure as UDF:


   sqlContext.udf.register("myUdf", (arg1: Int, arg2: String) => arg2 + arg1)
 

The following example registers a UDF in Java:


   sqlContext.udf().register("myUDF",
       new UDF2<Integer, String, String>() {
           @Override
           public String call(Integer arg1, String arg2) {
               return arg2 + arg1;
           }
      }, DataTypes.StringType);
 

Or, to use Java 8 lambda syntax:


   sqlContext.udf().register("myUDF",
       (Integer arg1, String arg2) -> arg2 + arg1,
       DataTypes.StringType);
 

Returns:
(undocumented)
Since:
1.3.0 TODO move to SQLSession?

isCached

public boolean isCached(String tableName)
Returns true if the table is currently cached in-memory.

Parameters:
tableName - (undocumented)
Returns:
(undocumented)
Since:
1.3.0

cacheTable

public void cacheTable(String tableName)
Caches the specified table in-memory.

Parameters:
tableName - (undocumented)
Since:
1.3.0

uncacheTable

public void uncacheTable(String tableName)
Removes the specified table from the in-memory cache.

Parameters:
tableName - (undocumented)
Since:
1.3.0

clearCache

public void clearCache()
Removes all cached tables from the in-memory cache.

Since:
1.3.0

implicits

public SQLContext.implicits$ implicits()
Accessor for nested Scala object

Returns:
(undocumented)

createDataFrame

public <A extends scala.Product> DataFrame createDataFrame(RDD<A> rdd,
                                                           scala.reflect.api.TypeTags.TypeTag<A> evidence$3)

createDataFrame

public <A extends scala.Product> DataFrame createDataFrame(scala.collection.Seq<A> data,
                                                           scala.reflect.api.TypeTags.TypeTag<A> evidence$4)

baseRelationToDataFrame

public DataFrame baseRelationToDataFrame(BaseRelation baseRelation)

createDataFrame

public DataFrame createDataFrame(RDD<Row> rowRDD,
                                 StructType schema)

createDataFrame

public DataFrame createDataFrame(JavaRDD<Row> rowRDD,
                                 StructType schema)

createDataFrame

public DataFrame createDataFrame(RDD<?> rdd,
                                 Class<?> beanClass)

createDataFrame

public DataFrame createDataFrame(JavaRDD<?> rdd,
                                 Class<?> beanClass)

read

public DataFrameReader read()

createExternalTable

public DataFrame createExternalTable(String tableName,
                                     String path)

createExternalTable

public DataFrame createExternalTable(String tableName,
                                     String path,
                                     String source)

createExternalTable

public DataFrame createExternalTable(String tableName,
                                     String source,
                                     java.util.Map<String,String> options)

createExternalTable

public DataFrame createExternalTable(String tableName,
                                     String source,
                                     scala.collection.immutable.Map<String,String> options)

createExternalTable

public DataFrame createExternalTable(String tableName,
                                     String source,
                                     StructType schema,
                                     java.util.Map<String,String> options)

createExternalTable

public DataFrame createExternalTable(String tableName,
                                     String source,
                                     StructType schema,
                                     scala.collection.immutable.Map<String,String> options)

dropTempTable

public void dropTempTable(String tableName)

range

public DataFrame range(long start,
                       long end)

range

public DataFrame range(long end)

range

public DataFrame range(long start,
                       long end,
                       long step,
                       int numPartitions)

sql

public DataFrame sql(String sqlText)

table

public DataFrame table(String tableName)

tables

public DataFrame tables()

tables

public DataFrame tables(String databaseName)

tableNames

public String[] tableNames()

tableNames

public String[] tableNames(String databaseName)

applySchema

public DataFrame applySchema(RDD<Row> rowRDD,
                             StructType schema)

applySchema

public DataFrame applySchema(JavaRDD<Row> rowRDD,
                             StructType schema)

applySchema

public DataFrame applySchema(RDD<?> rdd,
                             Class<?> beanClass)

applySchema

public DataFrame applySchema(JavaRDD<?> rdd,
                             Class<?> beanClass)

parquetFile

public DataFrame parquetFile(scala.collection.Seq<String> paths)

jsonFile

public DataFrame jsonFile(String path)
Deprecated. As of 1.4.0, replaced by read().json().

Loads a JSON file (one object per line), returning the result as a DataFrame. It goes through the entire dataset once to determine the schema.

Parameters:
path - (undocumented)
Returns:
(undocumented)

jsonFile

public DataFrame jsonFile(String path,
                          StructType schema)
Deprecated. As of 1.4.0, replaced by read().json().

Loads a JSON file (one object per line) and applies the given schema, returning the result as a DataFrame.

Parameters:
path - (undocumented)
schema - (undocumented)
Returns:
(undocumented)

jsonFile

public DataFrame jsonFile(String path,
                          double samplingRatio)
Deprecated. As of 1.4.0, replaced by read().json().

Parameters:
path - (undocumented)
samplingRatio - (undocumented)
Returns:
(undocumented)

jsonRDD

public DataFrame jsonRDD(RDD<String> json)
Deprecated. As of 1.4.0, replaced by read().json().

Loads an RDD[String] storing JSON objects (one object per record), returning the result as a DataFrame. It goes through the entire dataset once to determine the schema.

Parameters:
json - (undocumented)
Returns:
(undocumented)

jsonRDD

public DataFrame jsonRDD(JavaRDD<String> json)
Deprecated. As of 1.4.0, replaced by read().json().

Loads an RDD[String] storing JSON objects (one object per record), returning the result as a DataFrame. It goes through the entire dataset once to determine the schema.

Parameters:
json - (undocumented)
Returns:
(undocumented)

jsonRDD

public DataFrame jsonRDD(RDD<String> json,
                         StructType schema)
Deprecated. As of 1.4.0, replaced by read().json().

Loads an RDD[String] storing JSON objects (one object per record) and applies the given schema, returning the result as a DataFrame.

Parameters:
json - (undocumented)
schema - (undocumented)
Returns:
(undocumented)

jsonRDD

public DataFrame jsonRDD(JavaRDD<String> json,
                         StructType schema)
Deprecated. As of 1.4.0, replaced by read().json().

Loads an JavaRDD storing JSON objects (one object per record) and applies the given schema, returning the result as a DataFrame.

Parameters:
json - (undocumented)
schema - (undocumented)
Returns:
(undocumented)

jsonRDD

public DataFrame jsonRDD(RDD<String> json,
                         double samplingRatio)
Deprecated. As of 1.4.0, replaced by read().json().

Loads an RDD[String] storing JSON objects (one object per record) inferring the schema, returning the result as a DataFrame.

Parameters:
json - (undocumented)
samplingRatio - (undocumented)
Returns:
(undocumented)

jsonRDD

public DataFrame jsonRDD(JavaRDD<String> json,
                         double samplingRatio)
Deprecated. As of 1.4.0, replaced by read().json().

Loads a JavaRDD[String] storing JSON objects (one object per record) inferring the schema, returning the result as a DataFrame.

Parameters:
json - (undocumented)
samplingRatio - (undocumented)
Returns:
(undocumented)

load

public DataFrame load(String path)
Deprecated. As of 1.4.0, replaced by read().load(path).

Returns the dataset stored at path as a DataFrame, using the default data source configured by spark.sql.sources.default.

Parameters:
path - (undocumented)
Returns:
(undocumented)

load

public DataFrame load(String path,
                      String source)
Deprecated. As of 1.4.0, replaced by read().format(source).load(path).

Returns the dataset stored at path as a DataFrame, using the given data source.

Parameters:
path - (undocumented)
source - (undocumented)
Returns:
(undocumented)

load

public DataFrame load(String source,
                      java.util.Map<String,String> options)
Deprecated. As of 1.4.0, replaced by read().format(source).options(options).load().

(Java-specific) Returns the dataset specified by the given data source and a set of options as a DataFrame.

Parameters:
source - (undocumented)
options - (undocumented)
Returns:
(undocumented)

load

public DataFrame load(String source,
                      scala.collection.immutable.Map<String,String> options)
Deprecated. As of 1.4.0, replaced by read().format(source).options(options).load().

(Scala-specific) Returns the dataset specified by the given data source and a set of options as a DataFrame.

Parameters:
source - (undocumented)
options - (undocumented)
Returns:
(undocumented)

load

public DataFrame load(String source,
                      StructType schema,
                      java.util.Map<String,String> options)
Deprecated. As of 1.4.0, replaced by read().format(source).schema(schema).options(options).load().

(Java-specific) Returns the dataset specified by the given data source and a set of options as a DataFrame, using the given schema as the schema of the DataFrame.

Parameters:
source - (undocumented)
schema - (undocumented)
options - (undocumented)
Returns:
(undocumented)

load

public DataFrame load(String source,
                      StructType schema,
                      scala.collection.immutable.Map<String,String> options)
Deprecated. As of 1.4.0, replaced by read().format(source).schema(schema).options(options).load().

(Scala-specific) Returns the dataset specified by the given data source and a set of options as a DataFrame, using the given schema as the schema of the DataFrame.

Parameters:
source - (undocumented)
schema - (undocumented)
options - (undocumented)
Returns:
(undocumented)

jdbc

public DataFrame jdbc(String url,
                      String table)
Deprecated. As of 1.4.0, replaced by read().jdbc().

Construct a DataFrame representing the database table accessible via JDBC URL url named table.

Parameters:
url - (undocumented)
table - (undocumented)
Returns:
(undocumented)

jdbc

public DataFrame jdbc(String url,
                      String table,
                      String columnName,
                      long lowerBound,
                      long upperBound,
                      int numPartitions)
Deprecated. As of 1.4.0, replaced by read().jdbc().

Construct a DataFrame representing the database table accessible via JDBC URL url named table. Partitions of the table will be retrieved in parallel based on the parameters passed to this function.

Parameters:
columnName - the name of a column of integral type that will be used for partitioning.
lowerBound - the minimum value of columnName used to decide partition stride
upperBound - the maximum value of columnName used to decide partition stride
numPartitions - the number of partitions. the range minValue-maxValue will be split evenly into this many partitions
url - (undocumented)
table - (undocumented)
Returns:
(undocumented)

jdbc

public DataFrame jdbc(String url,
                      String table,
                      String[] theParts)
Deprecated. As of 1.4.0, replaced by read().jdbc().

Construct a DataFrame representing the database table accessible via JDBC URL url named table. The theParts parameter gives a list expressions suitable for inclusion in WHERE clauses; each one defines one partition of the DataFrame.

Parameters:
url - (undocumented)
table - (undocumented)
theParts - (undocumented)
Returns:
(undocumented)