public class JavaStreamingContext
extends Object
implements java.io.Closeable
StreamingContext
which is the main
entry point for Spark Streaming functionality. It provides methods to create
JavaDStream
and
JavaPairDStream
from input sources. The internal
org.apache.spark.api.java.JavaSparkContext (see core Spark documentation) can be accessed
using context.sparkContext
. After creating and transforming DStreams, the streaming
computation can be started and stopped using context.start()
and context.stop()
,
respectively. context.awaitTermination()
allows the current thread to wait for the
termination of a context by stop()
or by an exception.Constructor and Description |
---|
JavaStreamingContext(JavaSparkContext sparkContext,
Duration batchDuration)
Deprecated.
Create a JavaStreamingContext using an existing JavaSparkContext.
|
JavaStreamingContext(SparkConf conf,
Duration batchDuration)
Deprecated.
Create a JavaStreamingContext using a SparkConf configuration.
|
JavaStreamingContext(StreamingContext ssc)
Deprecated.
|
JavaStreamingContext(String path)
Deprecated.
Recreate a JavaStreamingContext from a checkpoint file.
|
JavaStreamingContext(String path,
org.apache.hadoop.conf.Configuration hadoopConf)
Deprecated.
Re-creates a JavaStreamingContext from a checkpoint file.
|
JavaStreamingContext(String master,
String appName,
Duration batchDuration)
Deprecated.
Create a StreamingContext.
|
JavaStreamingContext(String master,
String appName,
Duration batchDuration,
String sparkHome,
String jarFile)
Deprecated.
Create a StreamingContext.
|
JavaStreamingContext(String master,
String appName,
Duration batchDuration,
String sparkHome,
String[] jars)
Deprecated.
Create a StreamingContext.
|
JavaStreamingContext(String master,
String appName,
Duration batchDuration,
String sparkHome,
String[] jars,
java.util.Map<String,String> environment)
Deprecated.
Create a StreamingContext.
|
Modifier and Type | Method and Description |
---|---|
void |
addStreamingListener(StreamingListener streamingListener)
Deprecated.
Add a
StreamingListener object for
receiving system events related to streaming. |
void |
awaitTermination()
Deprecated.
Wait for the execution to stop.
|
boolean |
awaitTerminationOrTimeout(long timeout)
Deprecated.
Wait for the execution to stop.
|
JavaDStream<byte[]> |
binaryRecordsStream(String directory,
int recordLength)
Deprecated.
Create an input stream that monitors a Hadoop-compatible filesystem
for new files and reads them as flat binary files with fixed record lengths,
yielding byte arrays
|
void |
checkpoint(String directory)
Deprecated.
Sets the context to periodically checkpoint the DStream operations for master
fault-tolerance.
|
void |
close()
Deprecated.
|
<K,V,F extends org.apache.hadoop.mapreduce.InputFormat<K,V>> |
fileStream(String directory,
Class<K> kClass,
Class<V> vClass,
Class<F> fClass)
Deprecated.
Create an input stream that monitors a Hadoop-compatible filesystem
for new files and reads them using the given key-value types and input format.
|
<K,V,F extends org.apache.hadoop.mapreduce.InputFormat<K,V>> |
fileStream(String directory,
Class<K> kClass,
Class<V> vClass,
Class<F> fClass,
Function<org.apache.hadoop.fs.Path,Boolean> filter,
boolean newFilesOnly)
Deprecated.
Create an input stream that monitors a Hadoop-compatible filesystem
for new files and reads them using the given key-value types and input format.
|
<K,V,F extends org.apache.hadoop.mapreduce.InputFormat<K,V>> |
fileStream(String directory,
Class<K> kClass,
Class<V> vClass,
Class<F> fClass,
Function<org.apache.hadoop.fs.Path,Boolean> filter,
boolean newFilesOnly,
org.apache.hadoop.conf.Configuration conf)
Deprecated.
Create an input stream that monitors a Hadoop-compatible filesystem
for new files and reads them using the given key-value types and input format.
|
static JavaStreamingContext |
getOrCreate(String checkpointPath,
Function0<JavaStreamingContext> creatingFunc)
Deprecated.
Either recreate a StreamingContext from checkpoint data or create a new StreamingContext.
|
static JavaStreamingContext |
getOrCreate(String checkpointPath,
Function0<JavaStreamingContext> creatingFunc,
org.apache.hadoop.conf.Configuration hadoopConf)
Deprecated.
Either recreate a StreamingContext from checkpoint data or create a new StreamingContext.
|
static JavaStreamingContext |
getOrCreate(String checkpointPath,
Function0<JavaStreamingContext> creatingFunc,
org.apache.hadoop.conf.Configuration hadoopConf,
boolean createOnError)
Deprecated.
Either recreate a StreamingContext from checkpoint data or create a new StreamingContext.
|
StreamingContextState |
getState()
Deprecated.
:: DeveloperApi ::
|
static String[] |
jarOfClass(Class<?> cls)
Deprecated.
Find the JAR from which a given class was loaded, to make it easy for users to pass
their JARs to StreamingContext.
|
<T> JavaDStream<T> |
queueStream(java.util.Queue<JavaRDD<T>> queue)
Deprecated.
Create an input stream from a queue of RDDs.
|
<T> JavaInputDStream<T> |
queueStream(java.util.Queue<JavaRDD<T>> queue,
boolean oneAtATime)
Deprecated.
Create an input stream from a queue of RDDs.
|
<T> JavaInputDStream<T> |
queueStream(java.util.Queue<JavaRDD<T>> queue,
boolean oneAtATime,
JavaRDD<T> defaultRDD)
Deprecated.
Create an input stream from a queue of RDDs.
|
<T> JavaReceiverInputDStream<T> |
rawSocketStream(String hostname,
int port)
Deprecated.
Create an input stream from network source hostname:port, where data is received
as serialized blocks (serialized using the Spark's serializer) that can be directly
pushed into the block manager without deserializing them.
|
<T> JavaReceiverInputDStream<T> |
rawSocketStream(String hostname,
int port,
StorageLevel storageLevel)
Deprecated.
Create an input stream from network source hostname:port, where data is received
as serialized blocks (serialized using the Spark's serializer) that can be directly
pushed into the block manager without deserializing them.
|
<T> JavaReceiverInputDStream<T> |
receiverStream(Receiver<T> receiver)
Deprecated.
Create an input stream with any arbitrary user implemented receiver.
|
void |
remember(Duration duration)
Deprecated.
Sets each DStreams in this context to remember RDDs it generated in the last given duration.
|
<T> JavaReceiverInputDStream<T> |
socketStream(String hostname,
int port,
Function<java.io.InputStream,Iterable<T>> converter,
StorageLevel storageLevel)
Deprecated.
Create an input stream from network source hostname:port.
|
JavaReceiverInputDStream<String> |
socketTextStream(String hostname,
int port)
Deprecated.
Create an input stream from network source hostname:port.
|
JavaReceiverInputDStream<String> |
socketTextStream(String hostname,
int port,
StorageLevel storageLevel)
Deprecated.
Create an input stream from network source hostname:port.
|
JavaSparkContext |
sparkContext()
Deprecated.
The underlying SparkContext
|
StreamingContext |
ssc()
Deprecated.
|
void |
start()
Deprecated.
Start the execution of the streams.
|
void |
stop()
Deprecated.
Stop the execution of the streams.
|
void |
stop(boolean stopSparkContext)
Deprecated.
Stop the execution of the streams.
|
void |
stop(boolean stopSparkContext,
boolean stopGracefully)
Deprecated.
Stop the execution of the streams.
|
JavaDStream<String> |
textFileStream(String directory)
Deprecated.
Create an input stream that monitors a Hadoop-compatible filesystem
for new files and reads them as text files (using key as LongWritable, value
as Text and input format as TextInputFormat).
|
<T> JavaDStream<T> |
transform(java.util.List<JavaDStream<?>> dstreams,
Function2<java.util.List<JavaRDD<?>>,Time,JavaRDD<T>> transformFunc)
Deprecated.
Create a new DStream in which each RDD is generated by applying a function on RDDs of
the DStreams.
|
<K,V> JavaPairDStream<K,V> |
transformToPair(java.util.List<JavaDStream<?>> dstreams,
Function2<java.util.List<JavaRDD<?>>,Time,JavaPairRDD<K,V>> transformFunc)
Deprecated.
Create a new DStream in which each RDD is generated by applying a function on RDDs of
the DStreams.
|
<T> JavaDStream<T> |
union(JavaDStream<T>... jdstreams)
Deprecated.
Create a unified DStream from multiple DStreams of the same type and same slide duration.
|
<K,V> JavaPairDStream<K,V> |
union(JavaPairDStream<K,V>... jdstreams)
Deprecated.
Create a unified DStream from multiple DStreams of the same type and same slide duration.
|
<T> JavaDStream<T> |
union(scala.collection.Seq<JavaDStream<T>> jdstreams)
Deprecated.
Create a unified DStream from multiple DStreams of the same type and same slide duration.
|
<K,V> JavaPairDStream<K,V> |
union(scala.collection.Seq<JavaPairDStream<K,V>> jdstreams)
Deprecated.
Create a unified DStream from multiple DStreams of the same type and same slide duration.
|
public JavaStreamingContext(StreamingContext ssc)
public JavaStreamingContext(String master, String appName, Duration batchDuration)
master
- Name of the Spark MasterappName
- Name to be used when registering with the schedulerbatchDuration
- The time interval at which streaming data will be divided into batchespublic JavaStreamingContext(String master, String appName, Duration batchDuration, String sparkHome, String jarFile)
master
- Name of the Spark MasterappName
- Name to be used when registering with the schedulerbatchDuration
- The time interval at which streaming data will be divided into batchessparkHome
- The SPARK_HOME directory on the worker nodesjarFile
- JAR file containing job code, to ship to cluster. This can be a path on the
local file system or an HDFS, HTTP, HTTPS, or FTP URL.public JavaStreamingContext(String master, String appName, Duration batchDuration, String sparkHome, String[] jars)
master
- Name of the Spark MasterappName
- Name to be used when registering with the schedulerbatchDuration
- The time interval at which streaming data will be divided into batchessparkHome
- The SPARK_HOME directory on the worker nodesjars
- Collection of JARs to send to the cluster. These can be paths on the local file
system or HDFS, HTTP, HTTPS, or FTP URLs.public JavaStreamingContext(String master, String appName, Duration batchDuration, String sparkHome, String[] jars, java.util.Map<String,String> environment)
master
- Name of the Spark MasterappName
- Name to be used when registering with the schedulerbatchDuration
- The time interval at which streaming data will be divided into batchessparkHome
- The SPARK_HOME directory on the worker nodesjars
- Collection of JARs to send to the cluster. These can be paths on the local file
system or HDFS, HTTP, HTTPS, or FTP URLs.environment
- Environment variables to set on worker nodespublic JavaStreamingContext(JavaSparkContext sparkContext, Duration batchDuration)
sparkContext
- The underlying JavaSparkContext to usebatchDuration
- The time interval at which streaming data will be divided into batchespublic JavaStreamingContext(SparkConf conf, Duration batchDuration)
conf
- A Spark application configurationbatchDuration
- The time interval at which streaming data will be divided into batchespublic JavaStreamingContext(String path)
path
- Path to the directory that was specified as the checkpoint directorypublic JavaStreamingContext(String path, org.apache.hadoop.conf.Configuration hadoopConf)
path
- Path to the directory that was specified as the checkpoint directory
hadoopConf
- (undocumented)public static JavaStreamingContext getOrCreate(String checkpointPath, Function0<JavaStreamingContext> creatingFunc)
checkpointPath
, then StreamingContext will be
recreated from the checkpoint data. If the data does not exist, then the provided factory
will be used to create a JavaStreamingContext.
checkpointPath
- Checkpoint directory used in an earlier JavaStreamingContext programcreatingFunc
- Function to create a new JavaStreamingContextpublic static JavaStreamingContext getOrCreate(String checkpointPath, Function0<JavaStreamingContext> creatingFunc, org.apache.hadoop.conf.Configuration hadoopConf)
checkpointPath
, then StreamingContext will be
recreated from the checkpoint data. If the data does not exist, then the provided factory
will be used to create a JavaStreamingContext.
checkpointPath
- Checkpoint directory used in an earlier StreamingContext programcreatingFunc
- Function to create a new JavaStreamingContexthadoopConf
- Hadoop configuration if necessary for reading from any HDFS compatible
file systempublic static JavaStreamingContext getOrCreate(String checkpointPath, Function0<JavaStreamingContext> creatingFunc, org.apache.hadoop.conf.Configuration hadoopConf, boolean createOnError)
checkpointPath
, then StreamingContext will be
recreated from the checkpoint data. If the data does not exist, then the provided factory
will be used to create a JavaStreamingContext.
checkpointPath
- Checkpoint directory used in an earlier StreamingContext programcreatingFunc
- Function to create a new JavaStreamingContexthadoopConf
- Hadoop configuration if necessary for reading from any HDFS compatible
file systemcreateOnError
- Whether to create a new JavaStreamingContext if there is an
error in reading checkpoint data.public static String[] jarOfClass(Class<?> cls)
cls
- (undocumented)public <T> JavaDStream<T> union(JavaDStream<T>... jdstreams)
jdstreams
- (undocumented)public <K,V> JavaPairDStream<K,V> union(JavaPairDStream<K,V>... jdstreams)
jdstreams
- (undocumented)public StreamingContext ssc()
public JavaSparkContext sparkContext()
public JavaReceiverInputDStream<String> socketTextStream(String hostname, int port, StorageLevel storageLevel)
hostname
- Hostname to connect to for receiving dataport
- Port to connect to for receiving datastorageLevel
- Storage level to use for storing the received objectspublic JavaReceiverInputDStream<String> socketTextStream(String hostname, int port)
hostname
- Hostname to connect to for receiving dataport
- Port to connect to for receiving datapublic <T> JavaReceiverInputDStream<T> socketStream(String hostname, int port, Function<java.io.InputStream,Iterable<T>> converter, StorageLevel storageLevel)
hostname
- Hostname to connect to for receiving dataport
- Port to connect to for receiving dataconverter
- Function to convert the byte stream to objectsstorageLevel
- Storage level to use for storing the received objectspublic JavaDStream<String> textFileStream(String directory)
directory
- HDFS directory to monitor for new filepublic JavaDStream<byte[]> binaryRecordsStream(String directory, int recordLength)
directory
- HDFS directory to monitor for new filesrecordLength
- The length at which to split the records
public <T> JavaReceiverInputDStream<T> rawSocketStream(String hostname, int port, StorageLevel storageLevel)
hostname
- Hostname to connect to for receiving dataport
- Port to connect to for receiving datastorageLevel
- Storage level to use for storing the received objectspublic <T> JavaReceiverInputDStream<T> rawSocketStream(String hostname, int port)
hostname
- Hostname to connect to for receiving dataport
- Port to connect to for receiving datapublic <K,V,F extends org.apache.hadoop.mapreduce.InputFormat<K,V>> JavaPairInputDStream<K,V> fileStream(String directory, Class<K> kClass, Class<V> vClass, Class<F> fClass)
directory
- HDFS directory to monitor for new filekClass
- class of key for reading HDFS filevClass
- class of value for reading HDFS filefClass
- class of input format for reading HDFS filepublic <K,V,F extends org.apache.hadoop.mapreduce.InputFormat<K,V>> JavaPairInputDStream<K,V> fileStream(String directory, Class<K> kClass, Class<V> vClass, Class<F> fClass, Function<org.apache.hadoop.fs.Path,Boolean> filter, boolean newFilesOnly)
directory
- HDFS directory to monitor for new filekClass
- class of key for reading HDFS filevClass
- class of value for reading HDFS filefClass
- class of input format for reading HDFS filefilter
- Function to filter paths to processnewFilesOnly
- Should process only new files and ignore existing files in the directorypublic <K,V,F extends org.apache.hadoop.mapreduce.InputFormat<K,V>> JavaPairInputDStream<K,V> fileStream(String directory, Class<K> kClass, Class<V> vClass, Class<F> fClass, Function<org.apache.hadoop.fs.Path,Boolean> filter, boolean newFilesOnly, org.apache.hadoop.conf.Configuration conf)
directory
- HDFS directory to monitor for new filekClass
- class of key for reading HDFS filevClass
- class of value for reading HDFS filefClass
- class of input format for reading HDFS filefilter
- Function to filter paths to processnewFilesOnly
- Should process only new files and ignore existing files in the directoryconf
- Hadoop configurationpublic <T> JavaDStream<T> queueStream(java.util.Queue<JavaRDD<T>> queue)
queue
- Queue of RDDsqueueStream
, there is no way to recover data of
those RDDs, so queueStream
doesn't support checkpointing.public <T> JavaInputDStream<T> queueStream(java.util.Queue<JavaRDD<T>> queue, boolean oneAtATime)
queue
- Queue of RDDsoneAtATime
- Whether only one RDD should be consumed from the queue in every intervalqueueStream
, there is no way to recover data of
those RDDs, so queueStream
doesn't support checkpointing.public <T> JavaInputDStream<T> queueStream(java.util.Queue<JavaRDD<T>> queue, boolean oneAtATime, JavaRDD<T> defaultRDD)
queue
- Queue of RDDsoneAtATime
- Whether only one RDD should be consumed from the queue in every intervaldefaultRDD
- Default RDD is returned by the DStream when the queue is emptyqueueStream
, there is no way to recover data of
those RDDs, so queueStream
doesn't support checkpointing.
public <T> JavaReceiverInputDStream<T> receiverStream(Receiver<T> receiver)
receiver
- Custom implementation of Receiverpublic <T> JavaDStream<T> union(scala.collection.Seq<JavaDStream<T>> jdstreams)
jdstreams
- (undocumented)public <K,V> JavaPairDStream<K,V> union(scala.collection.Seq<JavaPairDStream<K,V>> jdstreams)
jdstreams
- (undocumented)public <T> JavaDStream<T> transform(java.util.List<JavaDStream<?>> dstreams, Function2<java.util.List<JavaRDD<?>>,Time,JavaRDD<T>> transformFunc)
dstreams
- (undocumented)transformFunc
- (undocumented)JavaPairDStream
.toJavaDStream().
In the transform function, convert the JavaRDD corresponding to that JavaDStream to
a JavaPairRDD using org.apache.spark.api.java.JavaPairRDD.fromJavaRDD().public <K,V> JavaPairDStream<K,V> transformToPair(java.util.List<JavaDStream<?>> dstreams, Function2<java.util.List<JavaRDD<?>>,Time,JavaPairRDD<K,V>> transformFunc)
dstreams
- (undocumented)transformFunc
- (undocumented)JavaPairDStream
.toJavaDStream().
In the transform function, convert the JavaRDD corresponding to that JavaDStream to
a JavaPairRDD using org.apache.spark.api.java.JavaPairRDD.fromJavaRDD().public void checkpoint(String directory)
directory
- HDFS-compatible directory where the checkpoint data will be reliably storedpublic void remember(Duration duration)
duration
- Minimum duration that each DStream should remember its RDDspublic void addStreamingListener(StreamingListener streamingListener)
StreamingListener
object for
receiving system events related to streaming.streamingListener
- (undocumented)public StreamingContextState getState()
Return the current state of the context. The context can be in three possible states -
public void start()
public void awaitTermination() throws InterruptedException
InterruptedException
public boolean awaitTerminationOrTimeout(long timeout) throws InterruptedException
timeout
- time to wait in millisecondstrue
if it's stopped; or throw the reported error during the execution; or false
if the waiting time elapsed before returning from the method.InterruptedException
public void stop()
public void stop(boolean stopSparkContext)
stopSparkContext
- Stop the associated SparkContext or notpublic void stop(boolean stopSparkContext, boolean stopGracefully)
stopSparkContext
- Stop the associated SparkContext or notstopGracefully
- Stop gracefully by waiting for the processing of all
received data to be completedpublic void close()
close
in interface java.io.Closeable
close
in interface AutoCloseable