pyspark.streaming.StreamingContext#

class pyspark.streaming.StreamingContext(sparkContext, batchDuration=None, jssc=None)[source]#

Main entry point for Spark Streaming functionality. A StreamingContext represents the connection to a Spark cluster, and can be used to create DStream various input sources. It can be from an existing SparkContext. After creating and transforming DStreams, the streaming computation can be started and stopped using context.start() and context.stop(), respectively. context.awaitTermination() allows the current thread to wait for the termination of the context by stop() or by an exception.

Deprecated since version Spark: 3.4.0 This is deprecated as of Spark 3.4.0. There are no longer updates to DStream and it’s a legacy project. There is a newer and easier to use streaming engine in Spark called Structured Streaming. You should use Spark Structured Streaming for your streaming applications.

Parameters
sparkContextSparkContext

SparkContext object.

batchDurationint, optional

the time interval (in seconds) at which streaming data will be divided into batches

Methods

addStreamingListener(streamingListener)

Add a [[org.apache.spark.streaming.scheduler.StreamingListener]] object for receiving system events related to streaming.

awaitTermination([timeout])

Wait for the execution to stop.

awaitTerminationOrTimeout(timeout)

Wait for the execution to stop.

binaryRecordsStream(directory, recordLength)

Create an input stream that monitors a Hadoop-compatible file system for new files and reads them as flat binary files with records of fixed length.

checkpoint(directory)

Sets the context to periodically checkpoint the DStream operations for master fault-tolerance.

getActive()

Return either the currently active StreamingContext (i.e., if there is a context started but not stopped) or None.

getActiveOrCreate(checkpointPath, setupFunc)

Either return the active StreamingContext (i.e.

getOrCreate(checkpointPath, setupFunc)

Either recreate a StreamingContext from checkpoint data or create a new StreamingContext.

queueStream(rdds[, oneAtATime, default])

Create an input stream from a queue of RDDs or list.

remember(duration)

Set each DStreams in this context to remember RDDs it generated in the last given duration.

socketTextStream(hostname, port[, storageLevel])

Create an input from TCP source hostname:port.

start()

Start the execution of the streams.

stop([stopSparkContext, stopGraceFully])

Stop the execution of the streams, with option of ensuring all received data has been processed.

textFileStream(directory)

Create an input stream that monitors a Hadoop-compatible file system for new files and reads them as text files.

transform(dstreams, transformFunc)

Create a new DStream in which each RDD is generated by applying a function on RDDs of the DStreams.

union(*dstreams)

Create a unified DStream from multiple DStreams of the same type and same slide duration.

Attributes

sparkContext

Return SparkContext which is associated with this StreamingContext.