Package org.apache.spark.streaming
Class StreamingContext
Object
org.apache.spark.streaming.StreamingContext
- All Implemented Interfaces:
org.apache.spark.internal.Logging
Deprecated.
This is deprecated as of Spark 3.4.0.
There are no longer updates to DStream and it's a legacy project.
There is a newer and easier to use streaming engine
in Spark called Structured Streaming.
You should use Spark Structured Streaming for your streaming applications.
Main entry point for Spark Streaming functionality. It provides methods used to create
DStream
s from various input sources. It can be either
created by providing a Spark master URL and an appName, or from a org.apache.spark.SparkConf
configuration (see core Spark documentation), or from an existing org.apache.spark.SparkContext.
The associated SparkContext can be accessed using context.sparkContext
. After
creating and transforming DStreams, the streaming computation can be started and stopped
using context.start()
and context.stop()
, respectively.
context.awaitTermination()
allows the current thread to wait for the termination
of the context by stop()
or by an exception.-
Nested Class Summary
Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging
org.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
-
Constructor Summary
ConstructorDescriptionStreamingContext
(String path) Deprecated.Recreate a StreamingContext from a checkpoint file.StreamingContext
(String master, String appName, Duration batchDuration, String sparkHome, scala.collection.immutable.Seq<String> jars, scala.collection.Map<String, String> environment) Deprecated.Create a StreamingContext by providing the details necessary for creating a new SparkContext.StreamingContext
(String path, org.apache.hadoop.conf.Configuration hadoopConf) Deprecated.Recreate a StreamingContext from a checkpoint file.StreamingContext
(String path, SparkContext sparkContext) Deprecated.Recreate a StreamingContext from a checkpoint file using an existing SparkContext.StreamingContext
(SparkConf conf, Duration batchDuration) Deprecated.Create a StreamingContext by providing the configuration necessary for a new SparkContext.StreamingContext
(SparkContext sparkContext, Duration batchDuration) Deprecated.Create a StreamingContext using an existing SparkContext. -
Method Summary
Modifier and TypeMethodDescriptionvoid
addStreamingListener
(StreamingListener streamingListener) Deprecated.Add aStreamingListener
object for receiving system events related to streaming.void
Deprecated.Wait for the execution to stop.boolean
awaitTerminationOrTimeout
(long timeout) Deprecated.Wait for the execution to stop.DStream<byte[]>
binaryRecordsStream
(String directory, int recordLength) Deprecated.Create an input stream that monitors a Hadoop-compatible filesystem for new files and reads them as flat binary files, assuming a fixed length per record, generating one byte array per record.void
checkpoint
(String directory) Deprecated.Set the context to periodically checkpoint the DStream operations for driver fault-tolerance.<K,
V, F extends org.apache.hadoop.mapreduce.InputFormat<K, V>>
InputDStream<scala.Tuple2<K,V>> fileStream
(String directory, scala.Function1<org.apache.hadoop.fs.Path, Object> filter, boolean newFilesOnly, org.apache.hadoop.conf.Configuration conf, scala.reflect.ClassTag<K> evidence$10, scala.reflect.ClassTag<V> evidence$11, scala.reflect.ClassTag<F> evidence$12) Deprecated.Create an input stream that monitors a Hadoop-compatible filesystem for new files and reads them using the given key-value types and input format.<K,
V, F extends org.apache.hadoop.mapreduce.InputFormat<K, V>>
InputDStream<scala.Tuple2<K,V>> fileStream
(String directory, scala.Function1<org.apache.hadoop.fs.Path, Object> filter, boolean newFilesOnly, scala.reflect.ClassTag<K> evidence$7, scala.reflect.ClassTag<V> evidence$8, scala.reflect.ClassTag<F> evidence$9) Deprecated.Create an input stream that monitors a Hadoop-compatible filesystem for new files and reads them using the given key-value types and input format.<K,
V, F extends org.apache.hadoop.mapreduce.InputFormat<K, V>>
InputDStream<scala.Tuple2<K,V>> fileStream
(String directory, scala.reflect.ClassTag<K> evidence$4, scala.reflect.ClassTag<V> evidence$5, scala.reflect.ClassTag<F> evidence$6) Deprecated.Create an input stream that monitors a Hadoop-compatible filesystem for new files and reads them using the given key-value types and input format.static scala.Option<StreamingContext>
Deprecated.Get the currently active context, if there is one.static StreamingContext
getActiveOrCreate
(String checkpointPath, scala.Function0<StreamingContext> creatingFunc, org.apache.hadoop.conf.Configuration hadoopConf, boolean createOnError) Deprecated.Either get the currently active StreamingContext (that is, started but not stopped), OR recreate a StreamingContext from checkpoint data in the given path.static StreamingContext
getActiveOrCreate
(scala.Function0<StreamingContext> creatingFunc) Deprecated.Either return the "active" StreamingContext (that is, started but not stopped), or create a new StreamingContext that isstatic StreamingContext
getOrCreate
(String checkpointPath, scala.Function0<StreamingContext> creatingFunc, org.apache.hadoop.conf.Configuration hadoopConf, boolean createOnError) Deprecated.Either recreate a StreamingContext from checkpoint data or create a new StreamingContext.getState()
Deprecated.:: DeveloperApi ::static scala.Option<String>
jarOfClass
(Class<?> cls) Deprecated.Find the JAR from which a given class was loaded, to make it easy for users to pass their JARs to StreamingContext.static org.apache.spark.internal.Logging.LogStringContext
LogStringContext
(scala.StringContext sc) Deprecated.static org.slf4j.Logger
Deprecated.static void
org$apache$spark$internal$Logging$$log__$eq
(org.slf4j.Logger x$1) Deprecated.<T> InputDStream<T>
queueStream
(scala.collection.mutable.Queue<RDD<T>> queue, boolean oneAtATime, RDD<T> defaultRDD, scala.reflect.ClassTag<T> evidence$14) Deprecated.Create an input stream from a queue of RDDs.<T> InputDStream<T>
queueStream
(scala.collection.mutable.Queue<RDD<T>> queue, boolean oneAtATime, scala.reflect.ClassTag<T> evidence$13) Deprecated.Create an input stream from a queue of RDDs.<T> ReceiverInputDStream<T>
rawSocketStream
(String hostname, int port, StorageLevel storageLevel, scala.reflect.ClassTag<T> evidence$3) Deprecated.Create an input stream from network source hostname:port, where data is received as serialized blocks (serialized using the Spark's serializer) that can be directly pushed into the block manager without deserializing them.<T> ReceiverInputDStream<T>
receiverStream
(Receiver<T> receiver, scala.reflect.ClassTag<T> evidence$1) Deprecated.Create an input stream with any arbitrary user implemented receiver.void
Deprecated.Set each DStream in this context to remember RDDs it generated in the last given duration.void
removeStreamingListener
(StreamingListener streamingListener) Deprecated.<T> ReceiverInputDStream<T>
socketStream
(String hostname, int port, scala.Function1<InputStream, scala.collection.Iterator<T>> converter, StorageLevel storageLevel, scala.reflect.ClassTag<T> evidence$2) Deprecated.Creates an input stream from TCP source hostname:port.socketTextStream
(String hostname, int port, StorageLevel storageLevel) Deprecated.Creates an input stream from TCP source hostname:port.Deprecated.Return the associated Spark contextvoid
start()
Deprecated.Start the execution of the streams.void
stop
(boolean stopSparkContext) Deprecated.Stop the execution of the streams immediately (does not wait for all received data to be processed).void
stop
(boolean stopSparkContext, boolean stopGracefully) Deprecated.Stop the execution of the streams, with option of ensuring all received data has been processed.textFileStream
(String directory) Deprecated.Create an input stream that monitors a Hadoop-compatible filesystem for new files and reads them as text files (using key as LongWritable, value as Text and input format as TextInputFormat).<T> DStream<T>
transform
(scala.collection.immutable.Seq<DStream<?>> dstreams, scala.Function2<scala.collection.immutable.Seq<RDD<?>>, Time, RDD<T>> transformFunc, scala.reflect.ClassTag<T> evidence$16) Deprecated.Create a new DStream in which each RDD is generated by applying a function on RDDs of the DStreams.<T> DStream<T>
Deprecated.Create a unified DStream from multiple DStreams of the same type and same slide duration.Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.apache.spark.internal.Logging
initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContext
-
Constructor Details
-
StreamingContext
Deprecated.Create a StreamingContext using an existing SparkContext.- Parameters:
sparkContext
- existing SparkContextbatchDuration
- the time interval at which streaming data will be divided into batches
-
StreamingContext
Deprecated.Create a StreamingContext by providing the configuration necessary for a new SparkContext.- Parameters:
conf
- a org.apache.spark.SparkConf object specifying Spark parametersbatchDuration
- the time interval at which streaming data will be divided into batches
-
StreamingContext
public StreamingContext(String master, String appName, Duration batchDuration, String sparkHome, scala.collection.immutable.Seq<String> jars, scala.collection.Map<String, String> environment) Deprecated.Create a StreamingContext by providing the details necessary for creating a new SparkContext.- Parameters:
master
- cluster URL to connect to (e.g. spark://host:port, local[4]).appName
- a name for your job, to display on the cluster web UIbatchDuration
- the time interval at which streaming data will be divided into batchessparkHome
- (undocumented)jars
- (undocumented)environment
- (undocumented)
-
StreamingContext
Deprecated.Recreate a StreamingContext from a checkpoint file.- Parameters:
path
- Path to the directory that was specified as the checkpoint directoryhadoopConf
- Optional, configuration object if necessary for reading from HDFS compatible filesystems
-
StreamingContext
Deprecated.Recreate a StreamingContext from a checkpoint file.- Parameters:
path
- Path to the directory that was specified as the checkpoint directory
-
StreamingContext
Deprecated.Recreate a StreamingContext from a checkpoint file using an existing SparkContext.- Parameters:
path
- Path to the directory that was specified as the checkpoint directorysparkContext
- Existing SparkContext
-
-
Method Details
-
getActive
Deprecated.Get the currently active context, if there is one. Active means started but not stopped.- Returns:
- (undocumented)
-
getActiveOrCreate
Deprecated.Either return the "active" StreamingContext (that is, started but not stopped), or create a new StreamingContext that is- Parameters:
creatingFunc
- Function to create a new StreamingContext- Returns:
- (undocumented)
-
getActiveOrCreate
public static StreamingContext getActiveOrCreate(String checkpointPath, scala.Function0<StreamingContext> creatingFunc, org.apache.hadoop.conf.Configuration hadoopConf, boolean createOnError) Deprecated.Either get the currently active StreamingContext (that is, started but not stopped), OR recreate a StreamingContext from checkpoint data in the given path. If checkpoint data does not exist in the provided, then create a new StreamingContext by calling the providedcreatingFunc
.- Parameters:
checkpointPath
- Checkpoint directory used in an earlier StreamingContext programcreatingFunc
- Function to create a new StreamingContexthadoopConf
- Optional Hadoop configuration if necessary for reading from the file systemcreateOnError
- Optional, whether to create a new StreamingContext if there is an error in reading checkpoint data. By default, an exception will be thrown on error.- Returns:
- (undocumented)
-
getOrCreate
public static StreamingContext getOrCreate(String checkpointPath, scala.Function0<StreamingContext> creatingFunc, org.apache.hadoop.conf.Configuration hadoopConf, boolean createOnError) Deprecated.Either recreate a StreamingContext from checkpoint data or create a new StreamingContext. If checkpoint data exists in the providedcheckpointPath
, then StreamingContext will be recreated from the checkpoint data. If the data does not exist, then the StreamingContext will be created by called the providedcreatingFunc
.- Parameters:
checkpointPath
- Checkpoint directory used in an earlier StreamingContext programcreatingFunc
- Function to create a new StreamingContexthadoopConf
- Optional Hadoop configuration if necessary for reading from the file systemcreateOnError
- Optional, whether to create a new StreamingContext if there is an error in reading checkpoint data. By default, an exception will be thrown on error.- Returns:
- (undocumented)
-
jarOfClass
Deprecated.Find the JAR from which a given class was loaded, to make it easy for users to pass their JARs to StreamingContext.- Parameters:
cls
- (undocumented)- Returns:
- (undocumented)
-
org$apache$spark$internal$Logging$$log_
public static org.slf4j.Logger org$apache$spark$internal$Logging$$log_()Deprecated. -
org$apache$spark$internal$Logging$$log__$eq
public static void org$apache$spark$internal$Logging$$log__$eq(org.slf4j.Logger x$1) Deprecated. -
LogStringContext
public static org.apache.spark.internal.Logging.LogStringContext LogStringContext(scala.StringContext sc) Deprecated. -
sparkContext
Deprecated.Return the associated Spark context- Returns:
- (undocumented)
-
remember
Deprecated.Set each DStream in this context to remember RDDs it generated in the last given duration. DStreams remember RDDs only for a limited duration of time and release them for garbage collection. This method allows the developer to specify how long to remember the RDDs ( if the developer wishes to query old data outside the DStream computation).- Parameters:
duration
- Minimum duration that each DStream should remember its RDDs
-
checkpoint
Deprecated.Set the context to periodically checkpoint the DStream operations for driver fault-tolerance.- Parameters:
directory
- HDFS-compatible directory where the checkpoint data will be reliably stored. Note that this must be a fault-tolerant file system like HDFS.
-
receiverStream
public <T> ReceiverInputDStream<T> receiverStream(Receiver<T> receiver, scala.reflect.ClassTag<T> evidence$1) Deprecated.Create an input stream with any arbitrary user implemented receiver. Find more details at https://spark.apache.org/docs/latest/streaming-custom-receivers.html- Parameters:
receiver
- Custom implementation of Receiverevidence$1
- (undocumented)- Returns:
- (undocumented)
-
socketTextStream
public ReceiverInputDStream<String> socketTextStream(String hostname, int port, StorageLevel storageLevel) Deprecated.Creates an input stream from TCP source hostname:port. Data is received using a TCP socket and the receive bytes is interpreted as UTF8 encoded\n
delimited lines.- Parameters:
hostname
- Hostname to connect to for receiving dataport
- Port to connect to for receiving datastorageLevel
- Storage level to use for storing the received objects (default: StorageLevel.MEMORY_AND_DISK_SER_2)- Returns:
- (undocumented)
- See Also:
-
socketStream
public <T> ReceiverInputDStream<T> socketStream(String hostname, int port, scala.Function1<InputStream, scala.collection.Iterator<T>> converter, StorageLevel storageLevel, scala.reflect.ClassTag<T> evidence$2) Deprecated.Creates an input stream from TCP source hostname:port. Data is received using a TCP socket and the receive bytes it interpreted as object using the given converter.- Parameters:
hostname
- Hostname to connect to for receiving dataport
- Port to connect to for receiving dataconverter
- Function to convert the byte stream to objectsstorageLevel
- Storage level to use for storing the received objectsevidence$2
- (undocumented)- Returns:
- (undocumented)
-
rawSocketStream
public <T> ReceiverInputDStream<T> rawSocketStream(String hostname, int port, StorageLevel storageLevel, scala.reflect.ClassTag<T> evidence$3) Deprecated.Create an input stream from network source hostname:port, where data is received as serialized blocks (serialized using the Spark's serializer) that can be directly pushed into the block manager without deserializing them. This is the most efficient way to receive data.- Parameters:
hostname
- Hostname to connect to for receiving dataport
- Port to connect to for receiving datastorageLevel
- Storage level to use for storing the received objects (default: StorageLevel.MEMORY_AND_DISK_SER_2)evidence$3
- (undocumented)- Returns:
- (undocumented)
-
fileStream
public <K,V, InputDStream<scala.Tuple2<K,F extends org.apache.hadoop.mapreduce.InputFormat<K, V>> V>> fileStream(String directory, scala.reflect.ClassTag<K> evidence$4, scala.reflect.ClassTag<V> evidence$5, scala.reflect.ClassTag<F> evidence$6) Deprecated.Create an input stream that monitors a Hadoop-compatible filesystem for new files and reads them using the given key-value types and input format. Files must be written to the monitored directory by "moving" them from another location within the same file system. File names starting with . are ignored.- Parameters:
directory
- HDFS directory to monitor for new fileevidence$4
- (undocumented)evidence$5
- (undocumented)evidence$6
- (undocumented)- Returns:
- (undocumented)
-
fileStream
public <K,V, InputDStream<scala.Tuple2<K,F extends org.apache.hadoop.mapreduce.InputFormat<K, V>> V>> fileStream(String directory, scala.Function1<org.apache.hadoop.fs.Path, Object> filter, boolean newFilesOnly, scala.reflect.ClassTag<K> evidence$7, scala.reflect.ClassTag<V> evidence$8, scala.reflect.ClassTag<F> evidence$9) Deprecated.Create an input stream that monitors a Hadoop-compatible filesystem for new files and reads them using the given key-value types and input format. Files must be written to the monitored directory by "moving" them from another location within the same file system.- Parameters:
directory
- HDFS directory to monitor for new filefilter
- Function to filter paths to processnewFilesOnly
- Should process only new files and ignore existing files in the directoryevidence$7
- (undocumented)evidence$8
- (undocumented)evidence$9
- (undocumented)- Returns:
- (undocumented)
-
fileStream
public <K,V, InputDStream<scala.Tuple2<K,F extends org.apache.hadoop.mapreduce.InputFormat<K, V>> V>> fileStream(String directory, scala.Function1<org.apache.hadoop.fs.Path, Object> filter, boolean newFilesOnly, org.apache.hadoop.conf.Configuration conf, scala.reflect.ClassTag<K> evidence$10, scala.reflect.ClassTag<V> evidence$11, scala.reflect.ClassTag<F> evidence$12) Deprecated.Create an input stream that monitors a Hadoop-compatible filesystem for new files and reads them using the given key-value types and input format. Files must be written to the monitored directory by "moving" them from another location within the same file system. File names starting with . are ignored.- Parameters:
directory
- HDFS directory to monitor for new filefilter
- Function to filter paths to processnewFilesOnly
- Should process only new files and ignore existing files in the directoryconf
- Hadoop configurationevidence$10
- (undocumented)evidence$11
- (undocumented)evidence$12
- (undocumented)- Returns:
- (undocumented)
-
textFileStream
Deprecated.Create an input stream that monitors a Hadoop-compatible filesystem for new files and reads them as text files (using key as LongWritable, value as Text and input format as TextInputFormat). Files must be written to the monitored directory by "moving" them from another location within the same file system. File names starting with . are ignored. The text files must be encoded as UTF-8.- Parameters:
directory
- HDFS directory to monitor for new file- Returns:
- (undocumented)
-
binaryRecordsStream
Deprecated.Create an input stream that monitors a Hadoop-compatible filesystem for new files and reads them as flat binary files, assuming a fixed length per record, generating one byte array per record. Files must be written to the monitored directory by "moving" them from another location within the same file system. File names starting with . are ignored.- Parameters:
directory
- HDFS directory to monitor for new filerecordLength
- length of each record in bytes- Returns:
- (undocumented)
- Note:
- We ensure that the byte array for each record in the resulting RDDs of the DStream has the provided record length.
-
queueStream
public <T> InputDStream<T> queueStream(scala.collection.mutable.Queue<RDD<T>> queue, boolean oneAtATime, scala.reflect.ClassTag<T> evidence$13) Deprecated.Create an input stream from a queue of RDDs. In each batch, it will process either one or all of the RDDs returned by the queue.- Parameters:
queue
- Queue of RDDs. Modifications to this data structure must be synchronized.oneAtATime
- Whether only one RDD should be consumed from the queue in every intervalevidence$13
- (undocumented)- Returns:
- (undocumented)
- Note:
- Arbitrary RDDs can be added to
queueStream
, there is no way to recover data of those RDDs, soqueueStream
doesn't support checkpointing.
-
queueStream
public <T> InputDStream<T> queueStream(scala.collection.mutable.Queue<RDD<T>> queue, boolean oneAtATime, RDD<T> defaultRDD, scala.reflect.ClassTag<T> evidence$14) Deprecated.Create an input stream from a queue of RDDs. In each batch, it will process either one or all of the RDDs returned by the queue.- Parameters:
queue
- Queue of RDDs. Modifications to this data structure must be synchronized.oneAtATime
- Whether only one RDD should be consumed from the queue in every intervaldefaultRDD
- Default RDD is returned by the DStream when the queue is empty. Set as null if no RDD should be returned when emptyevidence$14
- (undocumented)- Returns:
- (undocumented)
- Note:
- Arbitrary RDDs can be added to
queueStream
, there is no way to recover data of those RDDs, soqueueStream
doesn't support checkpointing.
-
union
public <T> DStream<T> union(scala.collection.immutable.Seq<DStream<T>> streams, scala.reflect.ClassTag<T> evidence$15) Deprecated.Create a unified DStream from multiple DStreams of the same type and same slide duration.- Parameters:
streams
- (undocumented)evidence$15
- (undocumented)- Returns:
- (undocumented)
-
transform
public <T> DStream<T> transform(scala.collection.immutable.Seq<DStream<?>> dstreams, scala.Function2<scala.collection.immutable.Seq<RDD<?>>, Time, RDD<T>> transformFunc, scala.reflect.ClassTag<T> evidence$16) Deprecated.Create a new DStream in which each RDD is generated by applying a function on RDDs of the DStreams.- Parameters:
dstreams
- (undocumented)transformFunc
- (undocumented)evidence$16
- (undocumented)- Returns:
- (undocumented)
-
addStreamingListener
Deprecated.Add aStreamingListener
object for receiving system events related to streaming.- Parameters:
streamingListener
- (undocumented)
-
removeStreamingListener
Deprecated. -
getState
Deprecated.:: DeveloperApi ::Return the current state of the context. The context can be in three possible states -
- StreamingContextState.INITIALIZED - The context has been created, but not started yet. Input DStreams, transformations and output operations can be created on the context. - StreamingContextState.ACTIVE - The context has been started, and not stopped. Input DStreams, transformations and output operations cannot be created on the context. - StreamingContextState.STOPPED - The context has been stopped and cannot be used any more.
- Returns:
- (undocumented)
-
start
public void start()Deprecated.Start the execution of the streams.- Throws:
IllegalStateException
- if the StreamingContext is already stopped.
-
awaitTermination
public void awaitTermination()Deprecated.Wait for the execution to stop. Any exceptions that occurs during the execution will be thrown in this thread. -
awaitTerminationOrTimeout
public boolean awaitTerminationOrTimeout(long timeout) Deprecated.Wait for the execution to stop. Any exceptions that occurs during the execution will be thrown in this thread.- Parameters:
timeout
- time to wait in milliseconds- Returns:
true
if it's stopped; or throw the reported error during the execution; orfalse
if the waiting time elapsed before returning from the method.
-
stop
public void stop(boolean stopSparkContext) Deprecated.Stop the execution of the streams immediately (does not wait for all received data to be processed). By default, ifstopSparkContext
is not specified, the underlying SparkContext will also be stopped. This implicit behavior can be configured using the SparkConf configuration spark.streaming.stopSparkContextByDefault.- Parameters:
stopSparkContext
- If true, stops the associated SparkContext. The underlying SparkContext will be stopped regardless of whether this StreamingContext has been started.
-
stop
public void stop(boolean stopSparkContext, boolean stopGracefully) Deprecated.Stop the execution of the streams, with option of ensuring all received data has been processed.- Parameters:
stopSparkContext
- if true, stops the associated SparkContext. The underlying SparkContext will be stopped regardless of whether this StreamingContext has been started.stopGracefully
- if true, stops gracefully by waiting for the processing of all received data to be completed
-