spark.streaming.api.java

JavaStreamingContext

class JavaStreamingContext extends AnyRef

A StreamingContext is the main entry point for Spark Streaming functionality. Besides the basic information (such as, cluster URL and job name) to internally create a SparkContext, it provides methods used to create DStream from various input sources.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. Hide All
  2. Show all
  1. JavaStreamingContext
  2. AnyRef
  3. Any
Visibility
  1. Public
  2. All

Instance Constructors

  1. new JavaStreamingContext(path: String)

    Re-creates a StreamingContext from a checkpoint file.

    Re-creates a StreamingContext from a checkpoint file.

    path

    Path either to the directory that was specified as the checkpoint directory, or to the checkpoint file 'graph' or 'graph.bk'.

  2. new JavaStreamingContext(sparkContext: JavaSparkContext, batchDuration: Duration)

    Creates a StreamingContext using an existing SparkContext.

    Creates a StreamingContext using an existing SparkContext.

    sparkContext

    The underlying JavaSparkContext to use

    batchDuration

    The time interval at which streaming data will be divided into batches

  3. new JavaStreamingContext(master: String, appName: String, batchDuration: Duration, sparkHome: String, jars: Array[String], environment: Map[String, String])

    Creates a StreamingContext.

    Creates a StreamingContext.

    master

    Name of the Spark Master

    appName

    Name to be used when registering with the scheduler

    batchDuration

    The time interval at which streaming data will be divided into batches

    sparkHome

    The SPARK_HOME directory on the slave nodes

    jars

    Collection of JARs to send to the cluster. These can be paths on the local file system or HDFS, HTTP, HTTPS, or FTP URLs.

    environment

    Environment variables to set on worker nodes

  4. new JavaStreamingContext(master: String, appName: String, batchDuration: Duration, sparkHome: String, jars: Array[String])

    Creates a StreamingContext.

    Creates a StreamingContext.

    master

    Name of the Spark Master

    appName

    Name to be used when registering with the scheduler

    batchDuration

    The time interval at which streaming data will be divided into batches

    sparkHome

    The SPARK_HOME directory on the slave nodes

    jars

    Collection of JARs to send to the cluster. These can be paths on the local file system or HDFS, HTTP, HTTPS, or FTP URLs.

  5. new JavaStreamingContext(master: String, appName: String, batchDuration: Duration, sparkHome: String, jarFile: String)

    Creates a StreamingContext.

    Creates a StreamingContext.

    master

    Name of the Spark Master

    appName

    Name to be used when registering with the scheduler

    batchDuration

    The time interval at which streaming data will be divided into batches

    sparkHome

    The SPARK_HOME directory on the slave nodes

    jarFile

    JAR file containing job code, to ship to cluster. This can be a path on the local file system or an HDFS, HTTP, HTTPS, or FTP URL.

  6. new JavaStreamingContext(master: String, appName: String, batchDuration: Duration)

    Creates a StreamingContext.

    Creates a StreamingContext.

    master

    Name of the Spark Master

    appName

    Name to be used when registering with the scheduler

    batchDuration

    The time interval at which streaming data will be divided into batches

  7. new JavaStreamingContext(ssc: StreamingContext)

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. def actorStream[T](props: Props, name: String): JavaDStream[T]

    Create an input stream with any arbitrary user implemented actor receiver.

    Create an input stream with any arbitrary user implemented actor receiver.

    props

    Props object defining creation of the actor

    name

    Name of the actor

    Note

    An important point to note: Since Actor may exist outside the spark framework, It is thus user's responsibility to ensure the type safety, i.e parametrized type of data received and actorStream should be same.

  7. def actorStream[T](props: Props, name: String, storageLevel: StorageLevel): JavaDStream[T]

    Create an input stream with any arbitrary user implemented actor receiver.

    Create an input stream with any arbitrary user implemented actor receiver.

    props

    Props object defining creation of the actor

    name

    Name of the actor

    storageLevel

    Storage level to use for storing the received objects

    Note

    An important point to note: Since Actor may exist outside the spark framework, It is thus user's responsibility to ensure the type safety, i.e parametrized type of data received and actorStream should be same.

  8. def actorStream[T](props: Props, name: String, storageLevel: StorageLevel, supervisorStrategy: SupervisorStrategy): JavaDStream[T]

    Create an input stream with any arbitrary user implemented actor receiver.

    Create an input stream with any arbitrary user implemented actor receiver.

    props

    Props object defining creation of the actor

    name

    Name of the actor

    storageLevel

    Storage level to use for storing the received objects

    Note

    An important point to note: Since Actor may exist outside the spark framework, It is thus user's responsibility to ensure the type safety, i.e parametrized type of data received and actorStream should be same.

  9. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  10. def checkpoint(directory: String): Unit

    Sets the context to periodically checkpoint the DStream operations for master fault-tolerance.

    Sets the context to periodically checkpoint the DStream operations for master fault-tolerance. The graph will be checkpointed every batch interval.

    directory

    HDFS-compatible directory where the checkpoint data will be reliably stored

  11. def clone(): AnyRef

    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws()
  12. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  13. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  14. def fileStream[K, V, F <: InputFormat[K, V]](directory: String): JavaPairDStream[K, V]

    Creates a input stream that monitors a Hadoop-compatible filesystem for new files and reads them using the given key-value types and input format.

    Creates a input stream that monitors a Hadoop-compatible filesystem for new files and reads them using the given key-value types and input format. File names starting with . are ignored.

    K

    Key type for reading HDFS file

    V

    Value type for reading HDFS file

    F

    Input format for reading HDFS file

    directory

    HDFS directory to monitor for new file

  15. def finalize(): Unit

    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws()
  16. def flumeStream(hostname: String, port: Int): JavaDStream[SparkFlumeEvent]

    Creates a input stream from a Flume source.

    Creates a input stream from a Flume source.

    hostname

    Hostname of the slave machine to which the flume data will be sent

    port

    Port of the slave machine to which the flume data will be sent

  17. def flumeStream(hostname: String, port: Int, storageLevel: StorageLevel): JavaDStream[SparkFlumeEvent]

    Creates a input stream from a Flume source.

    Creates a input stream from a Flume source.

    hostname

    Hostname of the slave machine to which the flume data will be sent

    port

    Port of the slave machine to which the flume data will be sent

    storageLevel

    Storage level to use for storing the received objects

  18. final def getClass(): java.lang.Class[_]

    Definition Classes
    AnyRef → Any
  19. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  20. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  21. def kafkaStream[T, D <: kafka.serializer.Decoder[_]](typeClass: Class[T], decoderClass: Class[D], kafkaParams: Map[String, String], topics: Map[String, Integer], storageLevel: StorageLevel): JavaDStream[T]

    Create an input stream that pulls messages form a Kafka Broker.

    Create an input stream that pulls messages form a Kafka Broker.

    typeClass

    Type of RDD

    decoderClass

    Type of kafka decoder

    kafkaParams

    Map of kafka configuration paramaters. See: http://kafka.apache.org/configuration.html

    topics

    Map of (topic_name -> numPartitions) to consume. Each partition is consumed in its own thread.

    storageLevel

    RDD storage level. Defaults to memory-only

  22. def kafkaStream(zkQuorum: String, groupId: String, topics: Map[String, Integer], storageLevel: StorageLevel): JavaDStream[String]

    Create an input stream that pulls messages form a Kafka Broker.

    Create an input stream that pulls messages form a Kafka Broker.

    zkQuorum

    Zookeper quorum (hostname:port,hostname:port,..).

    groupId

    The group id for this consumer.

    topics

    Map of (topic_name -> numPartitions) to consume. Each partition is consumed in its own thread.

    storageLevel

    RDD storage level. Defaults to memory-only

  23. def kafkaStream(zkQuorum: String, groupId: String, topics: Map[String, Integer]): JavaDStream[String]

    Create an input stream that pulls messages form a Kafka Broker.

    Create an input stream that pulls messages form a Kafka Broker.

    zkQuorum

    Zookeper quorum (hostname:port,hostname:port,..).

    groupId

    The group id for this consumer.

    topics

    Map of (topic_name -> numPartitions) to consume. Each partition is consumed in its own thread.

  24. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  25. final def notify(): Unit

    Definition Classes
    AnyRef
  26. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  27. def queueStream[T](queue: Queue[JavaRDD[T]], oneAtATime: Boolean, defaultRDD: JavaRDD[T]): JavaDStream[T]

    Creates a input stream from an queue of RDDs.

    Creates a input stream from an queue of RDDs. In each batch, it will process either one or all of the RDDs returned by the queue.

    NOTE: changes to the queue after the stream is created will not be recognized.

    T

    Type of objects in the RDD

    queue

    Queue of RDDs

    oneAtATime

    Whether only one RDD should be consumed from the queue in every interval

    defaultRDD

    Default RDD is returned by the DStream when the queue is empty

  28. def queueStream[T](queue: Queue[JavaRDD[T]], oneAtATime: Boolean): JavaDStream[T]

    Creates a input stream from an queue of RDDs.

    Creates a input stream from an queue of RDDs. In each batch, it will process either one or all of the RDDs returned by the queue.

    NOTE: changes to the queue after the stream is created will not be recognized.

    T

    Type of objects in the RDD

    queue

    Queue of RDDs

    oneAtATime

    Whether only one RDD should be consumed from the queue in every interval

  29. def queueStream[T](queue: Queue[JavaRDD[T]]): JavaDStream[T]

    Creates a input stream from an queue of RDDs.

    Creates a input stream from an queue of RDDs. In each batch, it will process either one or all of the RDDs returned by the queue.

    NOTE: changes to the queue after the stream is created will not be recognized.

    T

    Type of objects in the RDD

    queue

    Queue of RDDs

  30. def rawSocketStream[T](hostname: String, port: Int): JavaDStream[T]

    Create a input stream from network source hostname:port, where data is received as serialized blocks (serialized using the Spark's serializer) that can be directly pushed into the block manager without deserializing them.

    Create a input stream from network source hostname:port, where data is received as serialized blocks (serialized using the Spark's serializer) that can be directly pushed into the block manager without deserializing them. This is the most efficient way to receive data.

    T

    Type of the objects in the received blocks

    hostname

    Hostname to connect to for receiving data

    port

    Port to connect to for receiving data

  31. def rawSocketStream[T](hostname: String, port: Int, storageLevel: StorageLevel): JavaDStream[T]

    Create a input stream from network source hostname:port, where data is received as serialized blocks (serialized using the Spark's serializer) that can be directly pushed into the block manager without deserializing them.

    Create a input stream from network source hostname:port, where data is received as serialized blocks (serialized using the Spark's serializer) that can be directly pushed into the block manager without deserializing them. This is the most efficient way to receive data.

    T

    Type of the objects in the received blocks

    hostname

    Hostname to connect to for receiving data

    port

    Port to connect to for receiving data

    storageLevel

    Storage level to use for storing the received objects

  32. def registerOutputStream(outputStream: spark.streaming.api.java.JavaDStreamLike[_, _, _]): Unit

    Registers an output stream that will be computed every interval

  33. def remember(duration: Duration): Unit

    Sets each DStreams in this context to remember RDDs it generated in the last given duration.

    Sets each DStreams in this context to remember RDDs it generated in the last given duration. DStreams remember RDDs only for a limited duration of duration and releases them for garbage collection. This method allows the developer to specify how to long to remember the RDDs ( if the developer wishes to query old data outside the DStream computation).

    duration

    Minimum duration that each DStream should remember its RDDs

  34. val sc: JavaSparkContext

    The underlying SparkContext

  35. def socketStream[T](hostname: String, port: Int, converter: Function[InputStream, Iterable[T]], storageLevel: StorageLevel): JavaDStream[T]

    Create a input stream from network source hostname:port.

    Create a input stream from network source hostname:port. Data is received using a TCP socket and the receive bytes it interepreted as object using the given converter.

    T

    Type of the objects received (after converting bytes to objects)

    hostname

    Hostname to connect to for receiving data

    port

    Port to connect to for receiving data

    converter

    Function to convert the byte stream to objects

    storageLevel

    Storage level to use for storing the received objects

  36. def socketTextStream(hostname: String, port: Int): JavaDStream[String]

    Create a input stream from network source hostname:port.

    Create a input stream from network source hostname:port. Data is received using a TCP socket and the receive bytes is interpreted as UTF8 encoded \n delimited lines.

    hostname

    Hostname to connect to for receiving data

    port

    Port to connect to for receiving data

  37. def socketTextStream(hostname: String, port: Int, storageLevel: StorageLevel): JavaDStream[String]

    Create a input stream from network source hostname:port.

    Create a input stream from network source hostname:port. Data is received using a TCP socket and the receive bytes is interpreted as UTF8 encoded \n delimited lines.

    hostname

    Hostname to connect to for receiving data

    port

    Port to connect to for receiving data

    storageLevel

    Storage level to use for storing the received objects (default: StorageLevel.MEMORY_AND_DISK_SER_2)

  38. val ssc: StreamingContext

  39. def start(): Unit

    Starts the execution of the streams.

  40. def stop(): Unit

    Sstops the execution of the streams.

  41. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  42. def textFileStream(directory: String): JavaDStream[String]

    Creates a input stream that monitors a Hadoop-compatible filesystem for new files and reads them as text files (using key as LongWritable, value as Text and input format as TextInputFormat).

    Creates a input stream that monitors a Hadoop-compatible filesystem for new files and reads them as text files (using key as LongWritable, value as Text and input format as TextInputFormat). File names starting with . are ignored.

    directory

    HDFS directory to monitor for new file

  43. def toString(): String

    Definition Classes
    AnyRef → Any
  44. def twitterStream(): JavaDStream[Status]

    Create a input stream that returns tweets received from Twitter using Twitter4J's default OAuth authentication; this requires the system properties twitter4j.

    Create a input stream that returns tweets received from Twitter using Twitter4J's default OAuth authentication; this requires the system properties twitter4j.oauth.consumerKey, .consumerSecret, .accessToken and .accessTokenSecret to be set.

  45. def twitterStream(twitterAuth: Authorization): JavaDStream[Status]

    Create a input stream that returns tweets received from Twitter.

    Create a input stream that returns tweets received from Twitter.

    twitterAuth

    Twitter4J Authorization

  46. def twitterStream(filters: Array[String]): JavaDStream[Status]

    Create a input stream that returns tweets received from Twitter using Twitter4J's default OAuth authentication; this requires the system properties twitter4j.

    Create a input stream that returns tweets received from Twitter using Twitter4J's default OAuth authentication; this requires the system properties twitter4j.oauth.consumerKey, .consumerSecret, .accessToken and .accessTokenSecret to be set.

    filters

    Set of filter strings to get only those tweets that match them

  47. def twitterStream(twitterAuth: Authorization, filters: Array[String]): JavaDStream[Status]

    Create a input stream that returns tweets received from Twitter.

    Create a input stream that returns tweets received from Twitter.

    twitterAuth

    Twitter4J Authorization

    filters

    Set of filter strings to get only those tweets that match them

  48. def twitterStream(filters: Array[String], storageLevel: StorageLevel): JavaDStream[Status]

    Create a input stream that returns tweets received from Twitter using Twitter4J's default OAuth authentication; this requires the system properties twitter4j.

    Create a input stream that returns tweets received from Twitter using Twitter4J's default OAuth authentication; this requires the system properties twitter4j.oauth.consumerKey, .consumerSecret, .accessToken and .accessTokenSecret to be set.

    filters

    Set of filter strings to get only those tweets that match them

    storageLevel

    Storage level to use for storing the received objects

  49. def twitterStream(twitterAuth: Authorization, filters: Array[String], storageLevel: StorageLevel): JavaDStream[Status]

    Create a input stream that returns tweets received from Twitter.

    Create a input stream that returns tweets received from Twitter.

    twitterAuth

    Twitter4J Authorization object

    filters

    Set of filter strings to get only those tweets that match them

    storageLevel

    Storage level to use for storing the received objects

  50. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws()
  51. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws()
  52. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws()
  53. def zeroMQStream[T](publisherUrl: String, subscribe: Subscribe, bytesToObjects: Function[Array[Array[Byte]], Iterable[T]]): JavaDStream[T]

    Create an input stream that receives messages pushed by a zeromq publisher.

    Create an input stream that receives messages pushed by a zeromq publisher.

    publisherUrl

    Url of remote zeromq publisher

    subscribe

    topic to subscribe to

    bytesToObjects

    A zeroMQ stream publishes sequence of frames for each topic and each frame has sequence of byte thus it needs the converter(which might be deserializer of bytes) to translate from sequence of sequence of bytes, where sequence refer to a frame and sub sequence refer to its payload.

  54. def zeroMQStream[T](publisherUrl: String, subscribe: Subscribe, bytesToObjects: Function[Array[Array[Byte]], Iterable[T]], storageLevel: StorageLevel): JavaDStream[T]

    Create an input stream that receives messages pushed by a zeromq publisher.

    Create an input stream that receives messages pushed by a zeromq publisher.

    publisherUrl

    Url of remote zeromq publisher

    subscribe

    topic to subscribe to

    bytesToObjects

    A zeroMQ stream publishes sequence of frames for each topic and each frame has sequence of byte thus it needs the converter(which might be deserializer of bytes) to translate from sequence of sequence of bytes, where sequence refer to a frame and sub sequence refer to its payload.

    storageLevel

    RDD storage level. Defaults to memory-only.

  55. def zeroMQStream[T](publisherUrl: String, subscribe: Subscribe, bytesToObjects: (Seq[Seq[Byte]]) ⇒ Iterator[T], storageLevel: StorageLevel, supervisorStrategy: SupervisorStrategy): JavaDStream[T]

    Create an input stream that receives messages pushed by a zeromq publisher.

    Create an input stream that receives messages pushed by a zeromq publisher.

    publisherUrl

    Url of remote zeromq publisher

    subscribe

    topic to subscribe to

    bytesToObjects

    A zeroMQ stream publishes sequence of frames for each topic and each frame has sequence of byte thus it needs the converter(which might be deserializer of bytes) to translate from sequence of sequence of bytes, where sequence refer to a frame and sub sequence refer to its payload.

    storageLevel

    Storage level to use for storing the received objects

Inherited from AnyRef

Inherited from Any