Welcome to Spark Python API Docs!

Contents:

Core classes:

pyspark.SparkContext

Main entry point for Spark functionality.

pyspark.RDD

A Resilient Distributed Dataset (RDD), the basic abstraction in Spark.

pyspark.streaming.StreamingContext

Main entry point for Spark Streaming functionality.

pyspark.streaming.DStream

A Discretized Stream (DStream), the basic abstraction in Spark Streaming.

pyspark.sql.SparkSession

Main entry point for DataFrame and SQL functionality.

pyspark.sql.DataFrame

A distributed collection of data grouped into named columns.

Indices and tables