org.apache.spark.streaming.api.java

JavaStreamingContext

class JavaStreamingContext extends Closeable

A Java-friendly version of org.apache.spark.streaming.StreamingContext which is the main entry point for Spark Streaming functionality. It provides methods to create org.apache.spark.streaming.api.java.JavaDStream and org.apache.spark.streaming.api.java.JavaPairDStream. from input sources. The internal org.apache.spark.api.java.JavaSparkContext (see core Spark documentation) can be accessed using context.sparkContext. After creating and transforming DStreams, the streaming computation can be started and stopped using context.start() and context.stop(), respectively. context.awaitTermination() allows the current thread to wait for the termination of a context by stop() or by an exception.

Linear Supertypes
Closeable, AutoCloseable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. JavaStreamingContext
  2. Closeable
  3. AutoCloseable
  4. AnyRef
  5. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new JavaStreamingContext(path: String, hadoopConf: Configuration)

    Re-creates a JavaStreamingContext from a checkpoint file.

    Re-creates a JavaStreamingContext from a checkpoint file.

    path

    Path to the directory that was specified as the checkpoint directory

  2. new JavaStreamingContext(path: String)

    Recreate a JavaStreamingContext from a checkpoint file.

    Recreate a JavaStreamingContext from a checkpoint file.

    path

    Path to the directory that was specified as the checkpoint directory

  3. new JavaStreamingContext(conf: SparkConf, batchDuration: Duration)

    Create a JavaStreamingContext using a SparkConf configuration.

    Create a JavaStreamingContext using a SparkConf configuration.

    conf

    A Spark application configuration

    batchDuration

    The time interval at which streaming data will be divided into batches

  4. new JavaStreamingContext(sparkContext: JavaSparkContext, batchDuration: Duration)

    Create a JavaStreamingContext using an existing JavaSparkContext.

    Create a JavaStreamingContext using an existing JavaSparkContext.

    sparkContext

    The underlying JavaSparkContext to use

    batchDuration

    The time interval at which streaming data will be divided into batches

  5. new JavaStreamingContext(master: String, appName: String, batchDuration: Duration, sparkHome: String, jars: Array[String], environment: Map[String, String])

    Create a StreamingContext.

    Create a StreamingContext.

    master

    Name of the Spark Master

    appName

    Name to be used when registering with the scheduler

    batchDuration

    The time interval at which streaming data will be divided into batches

    sparkHome

    The SPARK_HOME directory on the slave nodes

    jars

    Collection of JARs to send to the cluster. These can be paths on the local file system or HDFS, HTTP, HTTPS, or FTP URLs.

    environment

    Environment variables to set on worker nodes

  6. new JavaStreamingContext(master: String, appName: String, batchDuration: Duration, sparkHome: String, jars: Array[String])

    Create a StreamingContext.

    Create a StreamingContext.

    master

    Name of the Spark Master

    appName

    Name to be used when registering with the scheduler

    batchDuration

    The time interval at which streaming data will be divided into batches

    sparkHome

    The SPARK_HOME directory on the slave nodes

    jars

    Collection of JARs to send to the cluster. These can be paths on the local file system or HDFS, HTTP, HTTPS, or FTP URLs.

  7. new JavaStreamingContext(master: String, appName: String, batchDuration: Duration, sparkHome: String, jarFile: String)

    Create a StreamingContext.

    Create a StreamingContext.

    master

    Name of the Spark Master

    appName

    Name to be used when registering with the scheduler

    batchDuration

    The time interval at which streaming data will be divided into batches

    sparkHome

    The SPARK_HOME directory on the slave nodes

    jarFile

    JAR file containing job code, to ship to cluster. This can be a path on the local file system or an HDFS, HTTP, HTTPS, or FTP URL.

  8. new JavaStreamingContext(master: String, appName: String, batchDuration: Duration)

    Create a StreamingContext.

    Create a StreamingContext.

    master

    Name of the Spark Master

    appName

    Name to be used when registering with the scheduler

    batchDuration

    The time interval at which streaming data will be divided into batches

  9. new JavaStreamingContext(ssc: StreamingContext)

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. def actorStream[T](props: Props, name: String): JavaReceiverInputDStream[T]

    Create an input stream with any arbitrary user implemented actor receiver.

    Create an input stream with any arbitrary user implemented actor receiver. Storage level of the data will be the default StorageLevel.MEMORY_AND_DISK_SER_2.

    props

    Props object defining creation of the actor

    name

    Name of the actor

    Note

    An important point to note: Since Actor may exist outside the spark framework, It is thus user's responsibility to ensure the type safety, i.e parametrized type of data received and actorStream should be same.

  7. def actorStream[T](props: Props, name: String, storageLevel: StorageLevel): JavaReceiverInputDStream[T]

    Create an input stream with any arbitrary user implemented actor receiver.

    Create an input stream with any arbitrary user implemented actor receiver.

    props

    Props object defining creation of the actor

    name

    Name of the actor

    storageLevel

    Storage level to use for storing the received objects

    Note

    An important point to note: Since Actor may exist outside the spark framework, It is thus user's responsibility to ensure the type safety, i.e parametrized type of data received and actorStream should be same.

  8. def actorStream[T](props: Props, name: String, storageLevel: StorageLevel, supervisorStrategy: SupervisorStrategy): JavaReceiverInputDStream[T]

    Create an input stream with any arbitrary user implemented actor receiver.

    Create an input stream with any arbitrary user implemented actor receiver.

    props

    Props object defining creation of the actor

    name

    Name of the actor

    storageLevel

    Storage level to use for storing the received objects

    Note

    An important point to note: Since Actor may exist outside the spark framework, It is thus user's responsibility to ensure the type safety, i.e parametrized type of data received and actorStream should be same.

  9. def addStreamingListener(streamingListener: StreamingListener): Unit

    Add a org.apache.spark.streaming.scheduler.StreamingListener object for receiving system events related to streaming.

  10. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  11. def awaitTermination(timeout: Long): Unit

    Wait for the execution to stop.

    Wait for the execution to stop. Any exceptions that occurs during the execution will be thrown in this thread.

    timeout

    time to wait in milliseconds

  12. def awaitTermination(): Unit

    Wait for the execution to stop.

    Wait for the execution to stop. Any exceptions that occurs during the execution will be thrown in this thread.

  13. def checkpoint(directory: String): Unit

    Sets the context to periodically checkpoint the DStream operations for master fault-tolerance.

    Sets the context to periodically checkpoint the DStream operations for master fault-tolerance. The graph will be checkpointed every batch interval.

    directory

    HDFS-compatible directory where the checkpoint data will be reliably stored

  14. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  15. def close(): Unit

    Definition Classes
    JavaStreamingContext → Closeable → AutoCloseable
  16. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  17. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  18. def fileStream[K, V, F <: InputFormat[K, V]](directory: String): JavaPairInputDStream[K, V]

    Create an input stream that monitors a Hadoop-compatible filesystem for new files and reads them using the given key-value types and input format.

    Create an input stream that monitors a Hadoop-compatible filesystem for new files and reads them using the given key-value types and input format. Files must be written to the monitored directory by "moving" them from another location within the same file system. File names starting with . are ignored.

    K

    Key type for reading HDFS file

    V

    Value type for reading HDFS file

    F

    Input format for reading HDFS file

    directory

    HDFS directory to monitor for new file

  19. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  20. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  21. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  22. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  23. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  24. final def notify(): Unit

    Definition Classes
    AnyRef
  25. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  26. def queueStream[T](queue: Queue[JavaRDD[T]], oneAtATime: Boolean, defaultRDD: JavaRDD[T]): JavaInputDStream[T]

    Create an input stream from an queue of RDDs.

    Create an input stream from an queue of RDDs. In each batch, it will process either one or all of the RDDs returned by the queue.

    NOTE: changes to the queue after the stream is created will not be recognized.

    T

    Type of objects in the RDD

    queue

    Queue of RDDs

    oneAtATime

    Whether only one RDD should be consumed from the queue in every interval

    defaultRDD

    Default RDD is returned by the DStream when the queue is empty

  27. def queueStream[T](queue: Queue[JavaRDD[T]], oneAtATime: Boolean): JavaInputDStream[T]

    Create an input stream from an queue of RDDs.

    Create an input stream from an queue of RDDs. In each batch, it will process either one or all of the RDDs returned by the queue.

    NOTE: changes to the queue after the stream is created will not be recognized.

    T

    Type of objects in the RDD

    queue

    Queue of RDDs

    oneAtATime

    Whether only one RDD should be consumed from the queue in every interval

  28. def queueStream[T](queue: Queue[JavaRDD[T]]): JavaDStream[T]

    Create an input stream from an queue of RDDs.

    Create an input stream from an queue of RDDs. In each batch, it will process either one or all of the RDDs returned by the queue.

    NOTE: changes to the queue after the stream is created will not be recognized.

    T

    Type of objects in the RDD

    queue

    Queue of RDDs

  29. def rawSocketStream[T](hostname: String, port: Int): JavaReceiverInputDStream[T]

    Create an input stream from network source hostname:port, where data is received as serialized blocks (serialized using the Spark's serializer) that can be directly pushed into the block manager without deserializing them.

    Create an input stream from network source hostname:port, where data is received as serialized blocks (serialized using the Spark's serializer) that can be directly pushed into the block manager without deserializing them. This is the most efficient way to receive data.

    T

    Type of the objects in the received blocks

    hostname

    Hostname to connect to for receiving data

    port

    Port to connect to for receiving data

  30. def rawSocketStream[T](hostname: String, port: Int, storageLevel: StorageLevel): JavaReceiverInputDStream[T]

    Create an input stream from network source hostname:port, where data is received as serialized blocks (serialized using the Spark's serializer) that can be directly pushed into the block manager without deserializing them.

    Create an input stream from network source hostname:port, where data is received as serialized blocks (serialized using the Spark's serializer) that can be directly pushed into the block manager without deserializing them. This is the most efficient way to receive data.

    T

    Type of the objects in the received blocks

    hostname

    Hostname to connect to for receiving data

    port

    Port to connect to for receiving data

    storageLevel

    Storage level to use for storing the received objects

  31. def receiverStream[T](receiver: Receiver[T]): JavaReceiverInputDStream[T]

    Create an input stream with any arbitrary user implemented receiver.

    Create an input stream with any arbitrary user implemented receiver. Find more details at: http://spark.apache.org/docs/latest/streaming-custom-receivers.html

    receiver

    Custom implementation of Receiver

  32. def remember(duration: Duration): Unit

    Sets each DStreams in this context to remember RDDs it generated in the last given duration.

    Sets each DStreams in this context to remember RDDs it generated in the last given duration. DStreams remember RDDs only for a limited duration of duration and releases them for garbage collection. This method allows the developer to specify how long to remember the RDDs ( if the developer wishes to query old data outside the DStream computation).

    duration

    Minimum duration that each DStream should remember its RDDs

  33. def socketStream[T](hostname: String, port: Int, converter: Function[InputStream, Iterable[T]], storageLevel: StorageLevel): JavaReceiverInputDStream[T]

    Create an input stream from network source hostname:port.

    Create an input stream from network source hostname:port. Data is received using a TCP socket and the receive bytes it interepreted as object using the given converter.

    T

    Type of the objects received (after converting bytes to objects)

    hostname

    Hostname to connect to for receiving data

    port

    Port to connect to for receiving data

    converter

    Function to convert the byte stream to objects

    storageLevel

    Storage level to use for storing the received objects

  34. def socketTextStream(hostname: String, port: Int): JavaReceiverInputDStream[String]

    Create an input stream from network source hostname:port.

    Create an input stream from network source hostname:port. Data is received using a TCP socket and the receive bytes is interpreted as UTF8 encoded \n delimited lines. Storage level of the data will be the default StorageLevel.MEMORY_AND_DISK_SER_2.

    hostname

    Hostname to connect to for receiving data

    port

    Port to connect to for receiving data

  35. def socketTextStream(hostname: String, port: Int, storageLevel: StorageLevel): JavaReceiverInputDStream[String]

    Create an input stream from network source hostname:port.

    Create an input stream from network source hostname:port. Data is received using a TCP socket and the receive bytes is interpreted as UTF8 encoded \n delimited lines.

    hostname

    Hostname to connect to for receiving data

    port

    Port to connect to for receiving data

    storageLevel

    Storage level to use for storing the received objects

  36. val sparkContext: JavaSparkContext

    The underlying SparkContext

  37. val ssc: StreamingContext

  38. def start(): Unit

    Start the execution of the streams.

  39. def stop(stopSparkContext: Boolean, stopGracefully: Boolean): Unit

    Stop the execution of the streams.

    Stop the execution of the streams.

    stopSparkContext

    Stop the associated SparkContext or not

    stopGracefully

    Stop gracefully by waiting for the processing of all received data to be completed

  40. def stop(stopSparkContext: Boolean): Unit

    Stop the execution of the streams.

    Stop the execution of the streams.

    stopSparkContext

    Stop the associated SparkContext or not

  41. def stop(): Unit

    Stop the execution of the streams.

    Stop the execution of the streams. Will stop the associated JavaSparkContext as well.

  42. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  43. def textFileStream(directory: String): JavaDStream[String]

    Create an input stream that monitors a Hadoop-compatible filesystem for new files and reads them as text files (using key as LongWritable, value as Text and input format as TextInputFormat).

    Create an input stream that monitors a Hadoop-compatible filesystem for new files and reads them as text files (using key as LongWritable, value as Text and input format as TextInputFormat). Files must be written to the monitored directory by "moving" them from another location within the same file system. File names starting with . are ignored.

    directory

    HDFS directory to monitor for new file

  44. def toString(): String

    Definition Classes
    AnyRef → Any
  45. def transform[T](dstreams: List[JavaDStream[_]], transformFunc: Function2[List[JavaRDD[_]], Time, JavaRDD[T]]): JavaDStream[T]

    Create a new DStream in which each RDD is generated by applying a function on RDDs of the DStreams.

    Create a new DStream in which each RDD is generated by applying a function on RDDs of the DStreams. The order of the JavaRDDs in the transform function parameter will be the same as the order of corresponding DStreams in the list. Note that for adding a JavaPairDStream in the list of JavaDStreams, convert it to a JavaDStream using org.apache.spark.streaming.api.java.JavaPairDStream.toJavaDStream(). In the transform function, convert the JavaRDD corresponding to that JavaDStream to a JavaPairRDD using org.apache.spark.api.java.JavaPairRDD.fromJavaRDD().

  46. def transformToPair[K, V](dstreams: List[JavaDStream[_]], transformFunc: Function2[List[JavaRDD[_]], Time, JavaPairRDD[K, V]]): JavaPairDStream[K, V]

    Create a new DStream in which each RDD is generated by applying a function on RDDs of the DStreams.

    Create a new DStream in which each RDD is generated by applying a function on RDDs of the DStreams. The order of the JavaRDDs in the transform function parameter will be the same as the order of corresponding DStreams in the list. Note that for adding a JavaPairDStream in the list of JavaDStreams, convert it to a JavaDStream using org.apache.spark.streaming.api.java.JavaPairDStream.toJavaDStream(). In the transform function, convert the JavaRDD corresponding to that JavaDStream to a JavaPairRDD using org.apache.spark.api.java.JavaPairRDD.fromJavaRDD().

  47. def union[K, V](first: JavaPairDStream[K, V], rest: List[JavaPairDStream[K, V]]): JavaPairDStream[K, V]

    Create a unified DStream from multiple DStreams of the same type and same slide duration.

  48. def union[T](first: JavaDStream[T], rest: List[JavaDStream[T]]): JavaDStream[T]

    Create a unified DStream from multiple DStreams of the same type and same slide duration.

  49. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  50. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  51. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Deprecated Value Members

  1. val sc: JavaSparkContext

    Annotations
    @deprecated
    Deprecated

    (Since version 0.9.0) use sparkContext

Inherited from Closeable

Inherited from AutoCloseable

Inherited from AnyRef

Inherited from Any

Ungrouped