pyspark.sql.SparkSession

class pyspark.sql.SparkSession(sparkContext: pyspark.context.SparkContext, jsparkSession: Optional[py4j.java_gateway.JavaObject] = None, options: Dict[str, Any] = {})[source]

The entry point to programming Spark with the Dataset and DataFrame API.

A SparkSession can be used to create DataFrame, register DataFrame as tables, execute SQL over tables, cache tables, and read parquet files. To create a SparkSession, use the following builder pattern:

Changed in version 3.4.0: Supports Spark Connect.

builder[source]

Examples

Create a Spark session.

>>> spark = (
...     SparkSession.builder
...         .master("local")
...         .appName("Word Count")
...         .config("spark.some.config.option", "some-value")
...         .getOrCreate()
... )

Create a Spark session with Spark Connect.

>>> spark = (
...     SparkSession.builder
...         .remote("sc://localhost")
...         .appName("Word Count")
...         .config("spark.some.config.option", "some-value")
...         .getOrCreate()
... )  

Methods

createDataFrame(data[, schema, …])

Creates a DataFrame from an RDD, a list, a pandas.DataFrame or a numpy.ndarray.

getActiveSession()

Returns the active SparkSession for the current thread, returned by the builder

newSession()

Returns a new SparkSession as new session, that has separate SQLConf, registered temporary views and UDFs, but shared SparkContext and table cache.

range(start[, end, step, numPartitions])

Create a DataFrame with single pyspark.sql.types.LongType column named id, containing elements in a range from start to end (exclusive) with step value step.

sql(sqlQuery[, args])

Returns a DataFrame representing the result of the given query.

stop()

Stop the underlying SparkContext.

table(tableName)

Returns the specified table as a DataFrame.

Attributes

builder

catalog

Interface through which the user may create, drop, alter or query underlying databases, tables, functions, etc.

conf

Runtime configuration interface for Spark.

read

Returns a DataFrameReader that can be used to read data in as a DataFrame.

readStream

Returns a DataStreamReader that can be used to read data streams as a streaming DataFrame.

sparkContext

Returns the underlying SparkContext.

streams

Returns a StreamingQueryManager that allows managing all the StreamingQuery instances active on this context.

udf

Returns a UDFRegistration for UDF registration.

version

The version of Spark on which this application is running.