pyspark.streaming.
StreamingContext
Main entry point for Spark Streaming functionality. A StreamingContext represents the connection to a Spark cluster, and can be used to create DStream various input sources. It can be from an existing SparkContext. After creating and transforming DStreams, the streaming computation can be started and stopped using context.start() and context.stop(), respectively. context.awaitTermination() allows the current thread to wait for the termination of the context by stop() or by an exception.
DStream
SparkContext
Deprecated since version Spark: 3.4.0 This is deprecated as of Spark 3.4.0. There are no longer updates to DStream and it’s a legacy project. There is a newer and easier to use streaming engine in Spark called Structured Streaming. You should use Spark Structured Streaming for your streaming applications.
SparkContext object.
the time interval (in seconds) at which streaming data will be divided into batches
Methods
addStreamingListener(streamingListener)
addStreamingListener
Add a [[org.apache.spark.streaming.scheduler.StreamingListener]] object for receiving system events related to streaming.
awaitTermination([timeout])
awaitTermination
Wait for the execution to stop.
awaitTerminationOrTimeout(timeout)
awaitTerminationOrTimeout
binaryRecordsStream(directory, recordLength)
binaryRecordsStream
Create an input stream that monitors a Hadoop-compatible file system for new files and reads them as flat binary files with records of fixed length.
checkpoint(directory)
checkpoint
Sets the context to periodically checkpoint the DStream operations for master fault-tolerance.
getActive()
getActive
Return either the currently active StreamingContext (i.e., if there is a context started but not stopped) or None.
getActiveOrCreate(checkpointPath, setupFunc)
getActiveOrCreate
Either return the active StreamingContext (i.e.
getOrCreate(checkpointPath, setupFunc)
getOrCreate
Either recreate a StreamingContext from checkpoint data or create a new StreamingContext.
queueStream(rdds[, oneAtATime, default])
queueStream
Create an input stream from a queue of RDDs or list.
remember(duration)
remember
Set each DStreams in this context to remember RDDs it generated in the last given duration.
socketTextStream(hostname, port[, storageLevel])
socketTextStream
Create an input from TCP source hostname:port.
start()
start
Start the execution of the streams.
stop([stopSparkContext, stopGraceFully])
stop
Stop the execution of the streams, with option of ensuring all received data has been processed.
textFileStream(directory)
textFileStream
Create an input stream that monitors a Hadoop-compatible file system for new files and reads them as text files.
transform(dstreams, transformFunc)
transform
Create a new DStream in which each RDD is generated by applying a function on RDDs of the DStreams.
union(*dstreams)
union
Create a unified DStream from multiple DStreams of the same type and same slide duration.
Attributes
sparkContext
Return SparkContext which is associated with this StreamingContext.