SQLContext (Spark 1.4.0 JavaDoc)

Object
- org.apache.spark.sql.SQLContext

All Implemented Interfaces:

java.io.Serializable, Logging

Direct Known Subclasses:

HiveContext, LocalSQLContext
```
public class SQLContext
extends Object
implements Logging, scala.Serializable
```
The entry point for working with structured data (rows and columns) in Spark. Allows the creation of DataFrame objects as well as the execution of SQL queries.

Since:

1.0.0

See Also:
Serialized Form

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`class`	`SQLContext.implicits$` :: Experimental :: (Scala-specific) Implicit methods available in Scala for converting common Scala objects into `DataFrame`s.

Constructor Summary

Constructors
Constructor and Description

SQLContext(JavaSparkContext sparkContext)

SQLContext(SparkContext sparkContext)

Constructors
Constructor and Description
`SQLContext(JavaSparkContext sparkContext)`
`SQLContext(SparkContext sparkContext)`

Method Summary

Methods
Modifier and Type	Method and Description
`DataFrame`	`applySchema(JavaRDD<?> rdd, Class<?> beanClass)`
`DataFrame`	`applySchema(JavaRDD<Row> rowRDD, StructType schema)`
`DataFrame`	`applySchema(RDD<?> rdd, Class<?> beanClass)`
`DataFrame`	`applySchema(RDD<Row> rowRDD, StructType schema)`
`DataFrame`	`baseRelationToDataFrame(BaseRelation baseRelation)`
`void`	`cacheTable(String tableName)` Caches the specified table in-memory.
`void`	`clearCache()` Removes all cached tables from the in-memory cache.
`DataFrame`	`createDataFrame(JavaRDD<?> rdd, Class<?> beanClass)`
`DataFrame`	`createDataFrame(JavaRDD<Row> rowRDD, StructType schema)`
`DataFrame`	`createDataFrame(RDD<?> rdd, Class<?> beanClass)`
`<A extends scala.Product> DataFrame`	`createDataFrame(RDD<A> rdd, scala.reflect.api.TypeTags.TypeTag<A> evidence$3)`
`DataFrame`	`createDataFrame(RDD<Row> rowRDD, StructType schema)`
`<A extends scala.Product> DataFrame`	`createDataFrame(scala.collection.Seq<A> data, scala.reflect.api.TypeTags.TypeTag<A> evidence$4)`
`DataFrame`	`createExternalTable(String tableName, String path)`
`DataFrame`	`createExternalTable(String tableName, String source, java.util.Map<String,String> options)`
`DataFrame`	`createExternalTable(String tableName, String source, scala.collection.immutable.Map<String,String> options)`
`DataFrame`	`createExternalTable(String tableName, String path, String source)`
`DataFrame`	`createExternalTable(String tableName, String source, StructType schema, java.util.Map<String,String> options)`
`DataFrame`	`createExternalTable(String tableName, String source, StructType schema, scala.collection.immutable.Map<String,String> options)`
`void`	`dropTempTable(String tableName)`
`DataFrame`	`emptyDataFrame()` :: Experimental :: Returns a `DataFrame` with no rows or columns.
`ExperimentalMethods`	`experimental()` :: Experimental :: A collection of methods that are considered experimental, but can be used to hook into the query planner for advanced functionality.
`scala.collection.immutable.Map<String,String>`	`getAllConfs()` Return all the configuration properties that have been set (i.e.
`String`	`getConf(String key)` Return the value of Spark SQL configuration property for the given key.
`String`	`getConf(String key, String defaultValue)` Return the value of Spark SQL configuration property for the given key.
`static SQLContext`	`getOrCreate(SparkContext sparkContext)` Get the singleton SQLContext if it exists or create a new one using the given SparkContext.
`SQLContext.implicits$`	`implicits()` Accessor for nested Scala object
`boolean`	`isCached(String tableName)` Returns true if the table is currently cached in-memory.
`DataFrame`	`jdbc(String url, String table)` Deprecated. As of 1.4.0, replaced by `read().jdbc()`.
`DataFrame`	`jdbc(String url, String table, String[] theParts)` Deprecated. As of 1.4.0, replaced by `read().jdbc()`.
`DataFrame`	`jdbc(String url, String table, String columnName, long lowerBound, long upperBound, int numPartitions)` Deprecated. As of 1.4.0, replaced by `read().jdbc()`.
`DataFrame`	`jsonFile(String path)` Deprecated. As of 1.4.0, replaced by `read().json()`.
`DataFrame`	`jsonFile(String path, double samplingRatio)` Deprecated. As of 1.4.0, replaced by `read().json()`.
`DataFrame`	`jsonFile(String path, StructType schema)` Deprecated. As of 1.4.0, replaced by `read().json()`.
`DataFrame`	`jsonRDD(JavaRDD<String> json)` Deprecated. As of 1.4.0, replaced by `read().json()`.
`DataFrame`	`jsonRDD(JavaRDD<String> json, double samplingRatio)` Deprecated. As of 1.4.0, replaced by `read().json()`.
`DataFrame`	`jsonRDD(JavaRDD<String> json, StructType schema)` Deprecated. As of 1.4.0, replaced by `read().json()`.
`DataFrame`	`jsonRDD(RDD<String> json)` Deprecated. As of 1.4.0, replaced by `read().json()`.
`DataFrame`	`jsonRDD(RDD<String> json, double samplingRatio)` Deprecated. As of 1.4.0, replaced by `read().json()`.
`DataFrame`	`jsonRDD(RDD<String> json, StructType schema)` Deprecated. As of 1.4.0, replaced by `read().json()`.
`DataFrame`	`load(String path)` Deprecated. As of 1.4.0, replaced by `read().load(path)`.
`DataFrame`	`load(String source, java.util.Map<String,String> options)` Deprecated. As of 1.4.0, replaced by `read().format(source).options(options).load()`.
`DataFrame`	`load(String source, scala.collection.immutable.Map<String,String> options)` Deprecated. As of 1.4.0, replaced by `read().format(source).options(options).load()`.
`DataFrame`	`load(String path, String source)` Deprecated. As of 1.4.0, replaced by `read().format(source).load(path)`.
`DataFrame`	`load(String source, StructType schema, java.util.Map<String,String> options)` Deprecated. As of 1.4.0, replaced by `read().format(source).schema(schema).options(options).load()`.
`DataFrame`	`load(String source, StructType schema, scala.collection.immutable.Map<String,String> options)` Deprecated. As of 1.4.0, replaced by `read().format(source).schema(schema).options(options).load()`.
`DataFrame`	`parquetFile(scala.collection.Seq<String> paths)`
`DataFrame`	`parquetFile(String... paths)` Deprecated. As of 1.4.0, replaced by `read().parquet()`.
`DataFrame`	`range(long end)`
`DataFrame`	`range(long start, long end)`
`DataFrame`	`range(long start, long end, long step, int numPartitions)`
`DataFrameReader`	`read()`
`void`	`setConf(java.util.Properties props)` Set Spark SQL configuration properties.
`void`	`setConf(String key, String value)` Set the given Spark SQL configuration property.
`SparkContext`	`sparkContext()`
`DataFrame`	`sql(String sqlText)`
`DataFrame`	`table(String tableName)`
`String[]`	`tableNames()`
`String[]`	`tableNames(String databaseName)`
`DataFrame`	`tables()`
`DataFrame`	`tables(String databaseName)`
`UDFRegistration`	`udf()` A collection of methods for registering user-defined functions (UDF).
`void`	`uncacheTable(String tableName)` Removes the specified table from the in-memory cache.

Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.spark.Logging
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning

- Constructor Detail
  - SQLContext
```
public SQLContext(SparkContext sparkContext)
```
  - SQLContext
```
public SQLContext(JavaSparkContext sparkContext)
```
- Method Detail
  - getOrCreate
```
public static SQLContext getOrCreate(SparkContext sparkContext)
```
    Get the singleton SQLContext if it exists or create a new one using the given SparkContext. This function can be used to create a singleton SQLContext object that can be shared across the JVM.
    
    Parameters:
    sparkContext - (undocumented)
    
    Returns:
    (undocumented)
  - parquetFile
```
public DataFrame parquetFile(String... paths)
```
    Deprecated. As of 1.4.0, replaced by read().parquet().
    
    Loads a Parquet file, returning the result as a DataFrame. This function returns an empty DataFrame if no paths are passed in.
    
    Parameters:
    paths - (undocumented)
    
    Returns:
    (undocumented)
  - sparkContext
```
public SparkContext sparkContext()
```
  - setConf
```
public void setConf(java.util.Properties props)
```
    Set Spark SQL configuration properties.
    
    Parameters:
    props - (undocumented)
    Since:
    
    1.0.0
  - setConf
```
public void setConf(String key,
           String value)
```
    Set the given Spark SQL configuration property.
    
    Parameters:
    key - (undocumented)
    value - (undocumented)
    Since:
    
    1.0.0
  - getConf
```
public String getConf(String key)
```
    Return the value of Spark SQL configuration property for the given key.
    
    Parameters:
    key - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.0.0
  - getConf
```
public String getConf(String key,
             String defaultValue)
```
    Return the value of Spark SQL configuration property for the given key. If the key is not set yet, return defaultValue.
    
    Parameters:
    key - (undocumented)
    defaultValue - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.0.0
  - getAllConfs
```
public scala.collection.immutable.Map<String,String> getAllConfs()
```
    Return all the configuration properties that have been set (i.e. not the default). This creates a new copy of the config properties in the form of a Map.
    
    Returns:
    (undocumented)
    Since:
    
    1.0.0
  - experimental
```
public ExperimentalMethods experimental()
```
    :: Experimental :: A collection of methods that are considered experimental, but can be used to hook into the query planner for advanced functionality.
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - emptyDataFrame
```
public DataFrame emptyDataFrame()
```
    :: Experimental :: Returns a DataFrame with no rows or columns.
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - udf
```
public UDFRegistration udf()
```
    A collection of methods for registering user-defined functions (UDF).
    The following example registers a Scala closure as UDF:
```
   sqlContext.udf.register("myUdf", (arg1: Int, arg2: String) => arg2 + arg1)
 
```
    The following example registers a UDF in Java:
```
   sqlContext.udf().register("myUDF",
       new UDF2<Integer, String, String>() {
           @Override
           public String call(Integer arg1, String arg2) {
               return arg2 + arg1;
           }
      }, DataTypes.StringType);
 
```
    Or, to use Java 8 lambda syntax:
```
   sqlContext.udf().register("myUDF",
       (Integer arg1, String arg2) -> arg2 + arg1),
       DataTypes.StringType);
 
```
    Returns:
    (undocumented)
    Since:
    
    1.3.0 TODO move to SQLSession?
  - isCached
```
public boolean isCached(String tableName)
```
    Returns true if the table is currently cached in-memory.
    
    Parameters:
    tableName - (undocumented)
    
    Returns:
    (undocumented)
    Since:
    
    1.3.0
  - cacheTable
```
public void cacheTable(String tableName)
```
    Caches the specified table in-memory.
    
    Parameters:
    tableName - (undocumented)
    Since:
    
    1.3.0
  - uncacheTable
```
public void uncacheTable(String tableName)
```
    Removes the specified table from the in-memory cache.
    
    Parameters:
    tableName - (undocumented)
    Since:
    
    1.3.0
  - clearCache
```
public void clearCache()
```
    Removes all cached tables from the in-memory cache.
    
    Since:
    
    1.3.0
  - implicits
```
public SQLContext.implicits$ implicits()
```
    Accessor for nested Scala object
    
    Returns:
    (undocumented)
  - createDataFrame
```
public <A extends scala.Product> DataFrame createDataFrame(RDD<A> rdd,
                                                  scala.reflect.api.TypeTags.TypeTag<A> evidence$3)
```
  - createDataFrame
```
public <A extends scala.Product> DataFrame createDataFrame(scala.collection.Seq<A> data,
                                                  scala.reflect.api.TypeTags.TypeTag<A> evidence$4)
```
  - baseRelationToDataFrame
```
public DataFrame baseRelationToDataFrame(BaseRelation baseRelation)
```
  - createDataFrame
```
public DataFrame createDataFrame(RDD<Row> rowRDD,
                        StructType schema)
```
  - createDataFrame
```
public DataFrame createDataFrame(JavaRDD<Row> rowRDD,
                        StructType schema)
```
  - createDataFrame
```
public DataFrame createDataFrame(RDD<?> rdd,
                        Class<?> beanClass)
```
  - createDataFrame
```
public DataFrame createDataFrame(JavaRDD<?> rdd,
                        Class<?> beanClass)
```
  - read
```
public DataFrameReader read()
```
  - createExternalTable
```
public DataFrame createExternalTable(String tableName,
                            String path)
```
  - createExternalTable
```
public DataFrame createExternalTable(String tableName,
                            String path,
                            String source)
```
  - createExternalTable
```
public DataFrame createExternalTable(String tableName,
                            String source,
                            java.util.Map<String,String> options)
```
  - createExternalTable
```
public DataFrame createExternalTable(String tableName,
                            String source,
                            scala.collection.immutable.Map<String,String> options)
```
  - createExternalTable
```
public DataFrame createExternalTable(String tableName,
                            String source,
                            StructType schema,
                            java.util.Map<String,String> options)
```
  - createExternalTable
```
public DataFrame createExternalTable(String tableName,
                            String source,
                            StructType schema,
                            scala.collection.immutable.Map<String,String> options)
```
  - dropTempTable
```
public void dropTempTable(String tableName)
```
  - range
```
public DataFrame range(long start,
              long end)
```
  - range
```
public DataFrame range(long end)
```
  - range
```
public DataFrame range(long start,
              long end,
              long step,
              int numPartitions)
```
  - sql
```
public DataFrame sql(String sqlText)
```
  - table
```
public DataFrame table(String tableName)
```
  - tables
```
public DataFrame tables()
```
  - tables
```
public DataFrame tables(String databaseName)
```
  - tableNames
```
public String[] tableNames()
```
  - tableNames
```
public String[] tableNames(String databaseName)
```
  - applySchema
```
public DataFrame applySchema(RDD<Row> rowRDD,
                    StructType schema)
```
  - applySchema
```
public DataFrame applySchema(JavaRDD<Row> rowRDD,
                    StructType schema)
```
  - applySchema
```
public DataFrame applySchema(RDD<?> rdd,
                    Class<?> beanClass)
```
  - applySchema
```
public DataFrame applySchema(JavaRDD<?> rdd,
                    Class<?> beanClass)
```
  - parquetFile
```
public DataFrame parquetFile(scala.collection.Seq<String> paths)
```
  - jsonFile
```
public DataFrame jsonFile(String path)
```
    Deprecated. As of 1.4.0, replaced by read().json().
    
    Loads a JSON file (one object per line), returning the result as a DataFrame. It goes through the entire dataset once to determine the schema.
    
    Parameters:
    path - (undocumented)
    
    Returns:
    (undocumented)
  - jsonFile
```
public DataFrame jsonFile(String path,
                 StructType schema)
```
    Deprecated. As of 1.4.0, replaced by read().json().
    
    Loads a JSON file (one object per line) and applies the given schema, returning the result as a DataFrame.
    
    Parameters:
    path - (undocumented)
    schema - (undocumented)
    
    Returns:
    (undocumented)
  - jsonFile
```
public DataFrame jsonFile(String path,
                 double samplingRatio)
```
    Deprecated. As of 1.4.0, replaced by read().json().
    
    Parameters:
    path - (undocumented)
    samplingRatio - (undocumented)
    
    Returns:
    (undocumented)
  - jsonRDD
```
public DataFrame jsonRDD(RDD<String> json)
```
    Deprecated. As of 1.4.0, replaced by read().json().
    
    Loads an RDD[String] storing JSON objects (one object per record), returning the result as a DataFrame. It goes through the entire dataset once to determine the schema.
    
    Parameters:
    json - (undocumented)
    
    Returns:
    (undocumented)
  - jsonRDD
```
public DataFrame jsonRDD(JavaRDD<String> json)
```
    Deprecated. As of 1.4.0, replaced by read().json().
    
    Loads an RDD[String] storing JSON objects (one object per record), returning the result as a DataFrame. It goes through the entire dataset once to determine the schema.
    
    Parameters:
    json - (undocumented)
    
    Returns:
    (undocumented)
  - jsonRDD
```
public DataFrame jsonRDD(RDD<String> json,
                StructType schema)
```
    Deprecated. As of 1.4.0, replaced by read().json().
    
    Loads an RDD[String] storing JSON objects (one object per record) and applies the given schema, returning the result as a DataFrame.
    
    Parameters:
    json - (undocumented)
    schema - (undocumented)
    
    Returns:
    (undocumented)
  - jsonRDD
```
public DataFrame jsonRDD(JavaRDD<String> json,
                StructType schema)
```
    Deprecated. As of 1.4.0, replaced by read().json().
    
    Loads an JavaRDD storing JSON objects (one object per record) and applies the given schema, returning the result as a DataFrame.
    
    Parameters:
    json - (undocumented)
    schema - (undocumented)
    
    Returns:
    (undocumented)
  - jsonRDD
```
public DataFrame jsonRDD(RDD<String> json,
                double samplingRatio)
```
    Deprecated. As of 1.4.0, replaced by read().json().
    
    Loads an RDD[String] storing JSON objects (one object per record) inferring the schema, returning the result as a DataFrame.
    
    Parameters:
    json - (undocumented)
    samplingRatio - (undocumented)
    
    Returns:
    (undocumented)
  - jsonRDD
```
public DataFrame jsonRDD(JavaRDD<String> json,
                double samplingRatio)
```
    Deprecated. As of 1.4.0, replaced by read().json().
    
    Loads a JavaRDD[String] storing JSON objects (one object per record) inferring the schema, returning the result as a DataFrame.
    
    Parameters:
    json - (undocumented)
    samplingRatio - (undocumented)
    
    Returns:
    (undocumented)
  - load
```
public DataFrame load(String path)
```
    Deprecated. As of 1.4.0, replaced by read().load(path).
    
    Returns the dataset stored at path as a DataFrame, using the default data source configured by spark.sql.sources.default.
    
    Parameters:
    path - (undocumented)
    
    Returns:
    (undocumented)
  - load
```
public DataFrame load(String path,
             String source)
```
    Deprecated. As of 1.4.0, replaced by read().format(source).load(path).
    
    Returns the dataset stored at path as a DataFrame, using the given data source.
    
    Parameters:
    path - (undocumented)
    source - (undocumented)
    
    Returns:
    (undocumented)
  - load
```
public DataFrame load(String source,
             java.util.Map<String,String> options)
```
    Deprecated. As of 1.4.0, replaced by read().format(source).options(options).load().
    
    (Java-specific) Returns the dataset specified by the given data source and a set of options as a DataFrame.
    
    Parameters:
    source - (undocumented)
    options - (undocumented)
    
    Returns:
    (undocumented)
  - load
```
public DataFrame load(String source,
             scala.collection.immutable.Map<String,String> options)
```
    Deprecated. As of 1.4.0, replaced by read().format(source).options(options).load().
    
    (Scala-specific) Returns the dataset specified by the given data source and a set of options as a DataFrame.
    
    Parameters:
    source - (undocumented)
    options - (undocumented)
    
    Returns:
    (undocumented)
  - load
```
public DataFrame load(String source,
             StructType schema,
             java.util.Map<String,String> options)
```
    Deprecated. As of 1.4.0, replaced by read().format(source).schema(schema).options(options).load().
    
    (Java-specific) Returns the dataset specified by the given data source and a set of options as a DataFrame, using the given schema as the schema of the DataFrame.
    
    Parameters:
    source - (undocumented)
    schema - (undocumented)
    options - (undocumented)
    
    Returns:
    (undocumented)
  - load
```
public DataFrame load(String source,
             StructType schema,
             scala.collection.immutable.Map<String,String> options)
```
    Deprecated. As of 1.4.0, replaced by read().format(source).schema(schema).options(options).load().
    
    (Scala-specific) Returns the dataset specified by the given data source and a set of options as a DataFrame, using the given schema as the schema of the DataFrame.
    
    Parameters:
    source - (undocumented)
    schema - (undocumented)
    options - (undocumented)
    
    Returns:
    (undocumented)
  - jdbc
```
public DataFrame jdbc(String url,
             String table)
```
    Deprecated. As of 1.4.0, replaced by read().jdbc().
    
    Construct a DataFrame representing the database table accessible via JDBC URL url named table.
    
    Parameters:
    url - (undocumented)
    table - (undocumented)
    
    Returns:
    (undocumented)
  - jdbc
```
public DataFrame jdbc(String url,
             String table,
             String columnName,
             long lowerBound,
             long upperBound,
             int numPartitions)
```
    Deprecated. As of 1.4.0, replaced by read().jdbc().
    
    Construct a DataFrame representing the database table accessible via JDBC URL url named table. Partitions of the table will be retrieved in parallel based on the parameters passed to this function.
    
    Parameters:
    columnName - the name of a column of integral type that will be used for partitioning.
    lowerBound - the minimum value of columnName used to decide partition stride
    upperBound - the maximum value of columnName used to decide partition stride
    numPartitions - the number of partitions. the range minValue-maxValue will be split evenly into this many partitions
    url - (undocumented)
    table - (undocumented)
    
    Returns:
    (undocumented)
  - jdbc
```
public DataFrame jdbc(String url,
             String table,
             String[] theParts)
```
    Deprecated. As of 1.4.0, replaced by read().jdbc().
    
    Construct a DataFrame representing the database table accessible via JDBC URL url named table. The theParts parameter gives a list expressions suitable for inclusion in WHERE clauses; each one defines one partition of the DataFrame.
    
    Parameters:
    url - (undocumented)
    table - (undocumented)
    theParts - (undocumented)
    
    Returns:
    (undocumented)

Class SQLContext

Nested Class Summary

Constructor Summary

Method Summary

Methods inherited from class Object

Methods inherited from interface org.apache.spark.Logging

Constructor Detail

SQLContext

SQLContext

Method Detail

getOrCreate

parquetFile

sparkContext

setConf

setConf

getConf

getConf

getAllConfs

experimental

emptyDataFrame

udf

isCached

cacheTable

uncacheTable

clearCache

implicits

createDataFrame

createDataFrame

baseRelationToDataFrame

createDataFrame

createDataFrame

createDataFrame

createDataFrame

read

createExternalTable

createExternalTable

createExternalTable

createExternalTable

createExternalTable

createExternalTable

dropTempTable

range

range

range

sql

table

tables

tables

tableNames

tableNames

applySchema

applySchema

applySchema

applySchema

parquetFile

jsonFile

jsonFile

jsonFile

jsonRDD

jsonRDD

jsonRDD

jsonRDD

jsonRDD

jsonRDD

load

load

load

load

load

load

jdbc

jdbc

jdbc