spark.streaming.StreamingContext

Instance Constructors

new StreamingContext(path: String)

Re-create a StreamingContext from a checkpoint file.
Re-create a StreamingContext from a checkpoint file.
path
Path either to the directory that was specified as the checkpoint directory, or to the checkpoint file 'graph' or 'graph.bk'.
new StreamingContext(master: String, appName: String, batchDuration: Duration, sparkHome: String = null, jars: Seq[String] = Nil, environment: Map[String, String] = Map())

Create a StreamingContext by providing the details necessary for creating a new SparkContext.
Create a StreamingContext by providing the details necessary for creating a new SparkContext.
master
Cluster URL to connect to (e.g. mesos://host:port, spark://host:port, local[4]).
appName
A name for your job, to display on the cluster web UI
batchDuration
The time interval at which streaming data will be divided into batches
new StreamingContext(sparkContext: SparkContext, batchDuration: Duration)

Create a StreamingContext using an existing SparkContext.
Create a StreamingContext using an existing SparkContext.
sparkContext
Existing SparkContext
batchDuration
The time interval at which streaming data will be divided into batches

Value Members

final def !=(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def !=(arg0: Any): Boolean

Definition Classes
Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def ==(arg0: Any): Boolean

Definition Classes
Any
def actorStream[T](props: Props, name: String, storageLevel: StorageLevel = StorageLevel.MEMORY_AND_DISK_SER_2, supervisorStrategy: SupervisorStrategy = ...)(implicit arg0: ClassManifest[T]): DStream[T]

Create an input stream with any arbitrary user implemented actor receiver.
Create an input stream with any arbitrary user implemented actor receiver.
props
Props object defining creation of the actor
name
Name of the actor
storageLevel
RDD storage level. Defaults to memory-only.

Note
An important point to note: Since Actor may exist outside the spark framework, It is thus user's responsibility to ensure the type safety, i.e parametrized type of data received and actorStream should be same.
final def asInstanceOf[T0]: T0

Definition Classes
Any
def checkpoint(directory: String): Unit

Set the context to periodically checkpoint the DStream operations for master fault-tolerance.
Set the context to periodically checkpoint the DStream operations for master fault-tolerance. The graph will be checkpointed every batch interval.
directory
HDFS-compatible directory where the checkpoint data will be reliably stored
var checkpointDir: String

Attributes
protected[streaming]
var checkpointDuration: Duration

Attributes
protected[streaming]
def clone(): AnyRef

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws()
val env: SparkEnv

Attributes
protected[streaming]
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def fileStream[K, V, F <: InputFormat[K, V]](directory: String, filter: (Path) ⇒ Boolean, newFilesOnly: Boolean)(implicit arg0: ClassManifest[K], arg1: ClassManifest[V], arg2: ClassManifest[F]): DStream[(K, V)]

Create a input stream that monitors a Hadoop-compatible filesystem for new files and reads them using the given key-value types and input format.
Create a input stream that monitors a Hadoop-compatible filesystem for new files and reads them using the given key-value types and input format.
K
Key type for reading HDFS file
V
Value type for reading HDFS file
F
Input format for reading HDFS file
directory
HDFS directory to monitor for new file
filter
Function to filter paths to process
newFilesOnly
Should process only new files and ignore existing files in the directory
def fileStream[K, V, F <: InputFormat[K, V]](directory: String)(implicit arg0: ClassManifest[K], arg1: ClassManifest[V], arg2: ClassManifest[F]): DStream[(K, V)]

Create a input stream that monitors a Hadoop-compatible filesystem for new files and reads them using the given key-value types and input format.
Create a input stream that monitors a Hadoop-compatible filesystem for new files and reads them using the given key-value types and input format. File names starting with . are ignored.
K
Key type for reading HDFS file
V
Value type for reading HDFS file
F
Input format for reading HDFS file
directory
HDFS directory to monitor for new file
def finalize(): Unit

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws()
def flumeStream(hostname: String, port: Int, storageLevel: StorageLevel = StorageLevel.MEMORY_AND_DISK_SER_2): DStream[SparkFlumeEvent]

Create a input stream from a Flume source.
Create a input stream from a Flume source.
hostname
Hostname of the slave machine to which the flume data will be sent
port
Port of the slave machine to which the flume data will be sent
storageLevel
Storage level to use for storing the received objects
final def getClass(): java.lang.Class[_]

Definition Classes
AnyRef → Any
def getNewNetworkStreamId(): Int

Attributes
protected[streaming]
val graph: DStreamGraph

Attributes
protected[streaming]
def hashCode(): Int

Definition Classes
AnyRef → Any
def initLogging(): Unit

Attributes
protected
Definition Classes
Logging
def initialCheckpoint: Checkpoint

Attributes
protected[streaming]
val isCheckpointPresent: Boolean

Attributes
protected[streaming]
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
def log: Logger

Attributes
protected
Definition Classes
Logging
def logDebug(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logDebug(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logError(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logError(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logInfo(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logInfo(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logTrace(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logTrace(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logWarning(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logWarning(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
var networkInputTracker: NetworkInputTracker

Attributes
protected[streaming]
def networkStream[T](receiver: NetworkReceiver[T])(implicit arg0: ClassManifest[T]): DStream[T]

Create an input stream with any arbitrary user implemented network receiver.
Create an input stream with any arbitrary user implemented network receiver.
receiver
Custom implementation of NetworkReceiver
val nextNetworkInputStreamId: AtomicInteger

Attributes
protected[streaming]
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
def queueStream[T](queue: Queue[RDD[T]], oneAtATime: Boolean, defaultRDD: RDD[T])(implicit arg0: ClassManifest[T]): DStream[T]

Create an input stream from a queue of RDDs.
Create an input stream from a queue of RDDs. In each batch, it will process either one or all of the RDDs returned by the queue.
T
Type of objects in the RDD
queue
Queue of RDDs
oneAtATime
Whether only one RDD should be consumed from the queue in every interval
defaultRDD
Default RDD is returned by the DStream when the queue is empty. Set as null if no RDD should be returned when empty
def queueStream[T](queue: Queue[RDD[T]], oneAtATime: Boolean)(implicit arg0: ClassManifest[T]): DStream[T]

Create an input stream from a queue of RDDs.
Create an input stream from a queue of RDDs. In each batch, it will process either one or all of the RDDs returned by the queue.
T
Type of objects in the RDD
queue
Queue of RDDs
oneAtATime
Whether only one RDD should be consumed from the queue in every interval
def rawSocketStream[T](hostname: String, port: Int, storageLevel: StorageLevel = StorageLevel.MEMORY_AND_DISK_SER_2)(implicit arg0: ClassManifest[T]): DStream[T]

Create a input stream from network source hostname:port, where data is received as serialized blocks (serialized using the Spark's serializer) that can be directly pushed into the block manager without deserializing them.
Create a input stream from network source hostname:port, where data is received as serialized blocks (serialized using the Spark's serializer) that can be directly pushed into the block manager without deserializing them. This is the most efficient way to receive data.
T
Type of the objects in the received blocks
hostname
Hostname to connect to for receiving data
port
Port to connect to for receiving data
storageLevel
Storage level to use for storing the received objects
var receiverJobThread: Thread

Attributes
protected[streaming]
def registerInputStream(inputStream: spark.streaming.dstream.InputDStream[_]): Unit

Register an input stream that will be started (InputDStream.
Register an input stream that will be started (InputDStream.start() called) to get the input data.
def registerOutputStream(outputStream: spark.streaming.DStream[_]): Unit

Register an output stream that will be computed every interval
def remember(duration: Duration): Unit

Set each DStreams in this context to remember RDDs it generated in the last given duration.
Set each DStreams in this context to remember RDDs it generated in the last given duration. DStreams remember RDDs only for a limited duration of time and releases them for garbage collection. This method allows the developer to specify how to long to remember the RDDs ( if the developer wishes to query old data outside the DStream computation).
duration
Minimum duration that each DStream should remember its RDDs
val sc: SparkContext

Attributes
protected[streaming]
var scheduler: Scheduler

Attributes
protected[streaming]
def socketStream[T](hostname: String, port: Int, converter: (InputStream) ⇒ Iterator[T], storageLevel: StorageLevel)(implicit arg0: ClassManifest[T]): DStream[T]

Create a input stream from TCP source hostname:port.
Create a input stream from TCP source hostname:port. Data is received using a TCP socket and the receive bytes it interepreted as object using the given converter.
T
Type of the objects received (after converting bytes to objects)
hostname
Hostname to connect to for receiving data
port
Port to connect to for receiving data
converter
Function to convert the byte stream to objects
storageLevel
Storage level to use for storing the received objects
def socketTextStream(hostname: String, port: Int, storageLevel: StorageLevel = StorageLevel.MEMORY_AND_DISK_SER_2): DStream[String]

Create a input stream from TCP source hostname:port.
Create a input stream from TCP source hostname:port. Data is received using a TCP socket and the receive bytes is interpreted as UTF8 encoded \n delimited lines.
hostname
Hostname to connect to for receiving data
port
Port to connect to for receiving data
storageLevel
Storage level to use for storing the received objects (default: StorageLevel.MEMORY_AND_DISK_SER_2)
def sparkContext: SparkContext

Return the associated Spark context
def start(): Unit

Start the execution of the streams.
def stop(): Unit

Stop the execution of the streams.
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def textFileStream(directory: String): DStream[String]

Create a input stream that monitors a Hadoop-compatible filesystem for new files and reads them as text files (using key as LongWritable, value as Text and input format as TextInputFormat).
Create a input stream that monitors a Hadoop-compatible filesystem for new files and reads them as text files (using key as LongWritable, value as Text and input format as TextInputFormat). File names starting with . are ignored.
directory
HDFS directory to monitor for new file
def toString(): String

Definition Classes
AnyRef → Any
def twitterStream(twitterAuth: Option[Authorization] = None, filters: Seq[String] = Nil, storageLevel: StorageLevel = StorageLevel.MEMORY_AND_DISK_SER_2): DStream[Status]

Create a input stream that returns tweets received from Twitter.
Create a input stream that returns tweets received from Twitter.
twitterAuth
Twitter4J authentication, or None to use Twitter4J's default OAuth authorization; this uses the system properties twitter4j.oauth.consumerKey, .consumerSecret, .accessToken and .accessTokenSecret.
filters
Set of filter strings to get only those tweets that match them
storageLevel
Storage level to use for storing the received objects
def union[T](streams: Seq[DStream[T]])(implicit arg0: ClassManifest[T]): DStream[T]

Create a unified DStream from multiple DStreams of the same type and same interval
def validate(): Unit

Attributes
protected
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws()
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws()
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws()
def zeroMQStream[T](publisherUrl: String, subscribe: Subscribe, bytesToObjects: (Seq[Seq[Byte]]) ⇒ Iterator[T], storageLevel: StorageLevel, supervisorStrategy: SupervisorStrategy)(implicit arg0: ClassManifest[T]): DStream[T]

Create an input stream that receives messages pushed by a zeromq publisher.
Create an input stream that receives messages pushed by a zeromq publisher.
publisherUrl
Url of remote zeromq publisher
subscribe
topic to subscribe to
bytesToObjects
A zeroMQ stream publishes sequence of frames for each topic and each frame has sequence of byte thus it needs the converter (which might be deserializer of bytes) to translate from sequence of sequence of bytes, where sequence refer to a frame and sub sequence refer to its payload.
storageLevel
RDD storage level. Defaults to memory-only.

StreamingContext

class StreamingContext extends Logging

Instance Constructors

new StreamingContext(path: String)

new StreamingContext(master: String, appName: String, batchDuration: Duration, sparkHome: String = null, jars: Seq[String] = Nil, environment: Map[String, String] = Map())

new StreamingContext(sparkContext: SparkContext, batchDuration: Duration)

Value Members

final def !=(arg0: AnyRef): Boolean

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: AnyRef): Boolean

final def ==(arg0: Any): Boolean

def actorStream[T](props: Props, name: String, storageLevel: StorageLevel = StorageLevel.MEMORY_AND_DISK_SER_2, supervisorStrategy: SupervisorStrategy = ...)(implicit arg0: ClassManifest[T]): DStream[T]

final def asInstanceOf[T0]: T0

def checkpoint(directory: String): Unit

var checkpointDir: String

var checkpointDuration: Duration

def clone(): AnyRef

val env: SparkEnv

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

def fileStream[K, V, F <: InputFormat[K, V]](directory: String, filter: (Path) ⇒ Boolean, newFilesOnly: Boolean)(implicit arg0: ClassManifest[K], arg1: ClassManifest[V], arg2: ClassManifest[F]): DStream[(K, V)]

def fileStream[K, V, F <: InputFormat[K, V]](directory: String)(implicit arg0: ClassManifest[K], arg1: ClassManifest[V], arg2: ClassManifest[F]): DStream[(K, V)]

def finalize(): Unit

def flumeStream(hostname: String, port: Int, storageLevel: StorageLevel = StorageLevel.MEMORY_AND_DISK_SER_2): DStream[SparkFlumeEvent]

final def getClass(): java.lang.Class[_]

def getNewNetworkStreamId(): Int

val graph: DStreamGraph

def hashCode(): Int

def initLogging(): Unit

def initialCheckpoint: Checkpoint

val isCheckpointPresent: Boolean

final def isInstanceOf[T0]: Boolean

def log: Logger

def logDebug(msg: ⇒ String, throwable: Throwable): Unit

def logDebug(msg: ⇒ String): Unit

def logError(msg: ⇒ String, throwable: Throwable): Unit

def logError(msg: ⇒ String): Unit

def logInfo(msg: ⇒ String, throwable: Throwable): Unit

def logInfo(msg: ⇒ String): Unit

def logTrace(msg: ⇒ String, throwable: Throwable): Unit

def logTrace(msg: ⇒ String): Unit

def logWarning(msg: ⇒ String, throwable: Throwable): Unit

def logWarning(msg: ⇒ String): Unit

final def ne(arg0: AnyRef): Boolean

var networkInputTracker: NetworkInputTracker

def networkStream[T](receiver: NetworkReceiver[T])(implicit arg0: ClassManifest[T]): DStream[T]

val nextNetworkInputStreamId: AtomicInteger

final def notify(): Unit

final def notifyAll(): Unit

def queueStream[T](queue: Queue[RDD[T]], oneAtATime: Boolean, defaultRDD: RDD[T])(implicit arg0: ClassManifest[T]): DStream[T]

def queueStream[T](queue: Queue[RDD[T]], oneAtATime: Boolean)(implicit arg0: ClassManifest[T]): DStream[T]

def rawSocketStream[T](hostname: String, port: Int, storageLevel: StorageLevel = StorageLevel.MEMORY_AND_DISK_SER_2)(implicit arg0: ClassManifest[T]): DStream[T]

var receiverJobThread: Thread

def registerInputStream(inputStream: spark.streaming.dstream.InputDStream[_]): Unit

def registerOutputStream(outputStream: spark.streaming.DStream[_]): Unit

def remember(duration: Duration): Unit

val sc: SparkContext

var scheduler: Scheduler

def socketStream[T](hostname: String, port: Int, converter: (InputStream) ⇒ Iterator[T], storageLevel: StorageLevel)(implicit arg0: ClassManifest[T]): DStream[T]

def socketTextStream(hostname: String, port: Int, storageLevel: StorageLevel = StorageLevel.MEMORY_AND_DISK_SER_2): DStream[String]

def sparkContext: SparkContext

def start(): Unit

def stop(): Unit

final def synchronized[T0](arg0: ⇒ T0): T0

def textFileStream(directory: String): DStream[String]

def toString(): String

def twitterStream(twitterAuth: Option[Authorization] = None, filters: Seq[String] = Nil, storageLevel: StorageLevel = StorageLevel.MEMORY_AND_DISK_SER_2): DStream[Status]

def union[T](streams: Seq[DStream[T]])(implicit arg0: ClassManifest[T]): DStream[T]

def validate(): Unit

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

def zeroMQStream[T](publisherUrl: String, subscribe: Subscribe, bytesToObjects: (Seq[Seq[Byte]]) ⇒ Iterator[T], storageLevel: StorageLevel, supervisorStrategy: SupervisorStrategy)(implicit arg0: ClassManifest[T]): DStream[T]

Inherited from Logging

Inherited from AnyRef

Inherited from Any