Spark Streaming (Legacy)¶
Core Classes¶
|
Main entry point for Spark Streaming functionality. |
|
A Discretized Stream (DStream), the basic abstraction in Spark Streaming, is a continuous sequence of RDDs (of the same type) representing a continuous stream of data (see |
Streaming Management¶
Add a [[org.apache.spark.streaming.scheduler.StreamingListener]] object for receiving system events related to streaming. |
|
|
Wait for the execution to stop. |
Wait for the execution to stop. |
|
|
Sets the context to periodically checkpoint the DStream operations for master fault-tolerance. |
Return either the currently active StreamingContext (i.e., if there is a context started but not stopped) or None. |
|
Either return the active StreamingContext (i.e. |
|
|
Either recreate a StreamingContext from checkpoint data or create a new StreamingContext. |
|
Set each DStreams in this context to remember RDDs it generated in the last given duration. |
Return SparkContext which is associated with this StreamingContext. |
|
Start the execution of the streams. |
|
|
Stop the execution of the streams, with option of ensuring all received data has been processed. |
|
Create a new DStream in which each RDD is generated by applying a function on RDDs of the DStreams. |
|
Create a unified DStream from multiple DStreams of the same type and same slide duration. |
Input and Output¶
Create an input stream that monitors a Hadoop-compatible file system for new files and reads them as flat binary files with records of fixed length. |
|
|
Create an input stream from a queue of RDDs or list. |
|
Create an input from TCP source hostname:port. |
|
Create an input stream that monitors a Hadoop-compatible file system for new files and reads them as text files. |
|
Print the first num elements of each RDD generated in this DStream. |
|
Save each RDD in this DStream as at text file, using string representation of elements. |
Transformations and Actions¶
Persist the RDDs of this DStream with the default storage level (MEMORY_ONLY). |
|
|
Enable periodic checkpointing of RDDs of this DStream |
|
Return a new DStream by applying ‘cogroup’ between RDDs of this DStream and other DStream. |
|
Return a new DStream by applying combineByKey to each RDD. |
Return the StreamingContext associated with this DStream |
|
Return a new DStream in which each RDD has a single element generated by counting each RDD of this DStream. |
|
Return a new DStream in which each RDD contains the counts of each distinct value in each RDD of this DStream. |
|
|
Return a new DStream in which each RDD contains the count of distinct elements in RDDs in a sliding window over this DStream. |
|
Return a new DStream in which each RDD has a single element generated by counting the number of elements in a window over this DStream. |
Return a new DStream containing only the elements that satisfy predicate. |
|
|
Return a new DStream by applying a function to all elements of this DStream, and then flattening the results |
Return a new DStream by applying a flatmap function to the value of each key-value pairs in this DStream without changing the key. |
|
|
Apply a function to each RDD in this DStream. |
|
Return a new DStream by applying ‘full outer join’ between RDDs of this DStream and other DStream. |
Return a new DStream in which RDD is generated by applying glom() to RDD of this DStream. |
|
|
Return a new DStream by applying groupByKey on each RDD. |
|
Return a new DStream by applying groupByKey over a sliding window. |
|
Return a new DStream by applying ‘join’ between RDDs of this DStream and other DStream. |
|
Return a new DStream by applying ‘left outer join’ between RDDs of this DStream and other DStream. |
|
Return a new DStream by applying a function to each element of DStream. |
|
Return a new DStream in which each RDD is generated by applying mapPartitions() to each RDDs of this DStream. |
|
Return a new DStream in which each RDD is generated by applying mapPartitionsWithIndex() to each RDDs of this DStream. |
Return a new DStream by applying a map function to the value of each key-value pairs in this DStream without changing the key. |
|
|
Return a copy of the DStream in which each RDD are partitioned using the specified partitioner. |
|
Persist the RDDs of this DStream with the given storage level |
|
Return a new DStream in which each RDD has a single element generated by reducing each RDD of this DStream. |
|
Return a new DStream by applying reduceByKey to each RDD. |
|
Return a new DStream by applying incremental reduceByKey over a sliding window. |
|
Return a new DStream in which each RDD has a single element generated by reducing all elements in a sliding window over this DStream. |
|
Return a new DStream with an increased or decreased level of parallelism. |
|
Return a new DStream by applying ‘right outer join’ between RDDs of this DStream and other DStream. |
|
Return all the RDDs between ‘begin’ to ‘end’ (both included) |
|
Return a new DStream in which each RDD is generated by applying a function on each RDD of this DStream. |
|
Return a new DStream in which each RDD is generated by applying a function on each RDD of this DStream and ‘other’ DStream. |
|
Return a new DStream by unifying data of another DStream with this DStream. |
|
Return a new “state” DStream where the state for each key is updated by applying the given function on the previous state of the key and the new values of the key. |
|
Return a new DStream in which each RDD contains all the elements in seen in a sliding window of time over this DStream. |
Kinesis¶
|
Create an input stream that pulls messages from a Kinesis stream. |