Core Classes

SparkSession(sparkContext[, jsparkSession, …])

The entry point to programming Spark with the Dataset and DataFrame API.

Catalog(sparkSession)

User-facing catalog API, accessible through SparkSession.catalog.

DataFrame(jdf, sql_ctx)

A distributed collection of data grouped into named columns.

Column(jc)

A column in a DataFrame.

Observation([name])

Class to observe (named) metrics on a DataFrame.

Row

A row in DataFrame.

GroupedData(jgd, df)

A set of methods for aggregations on a DataFrame, created by DataFrame.groupBy().

PandasCogroupedOps(gd1, gd2)

A logical grouping of two GroupedData, created by GroupedData.cogroup().

DataFrameNaFunctions(df)

Functionality for working with missing data in DataFrame.

DataFrameStatFunctions(df)

Functionality for statistic functions with DataFrame.

Window

Utility functions for defining window in DataFrames.

DataFrameReader(spark)

Interface used to load a DataFrame from external storage systems (e.g.

DataFrameWriter(df)

Interface used to write a DataFrame to external storage systems (e.g.

DataFrameWriterV2(df, table)

Interface used to write a class:pyspark.sql.dataframe.DataFrame to external storage using the v2 API.

UDFRegistration(sparkSession)

Wrapper for user-defined function registration.

udf.UserDefinedFunction(func[, returnType, …])

User defined function in Python