Spark Structured Streaming abstracts away complex streaming concepts such as incremental processing, checkpointing, and watermarks so that you can build streaming applications and pipelines without learning any new concepts or tools.
Spark Structured Streaming provides the same structured APIs (DataFrames and Datasets) as Spark so that you don’t need to develop on or maintain two different technology stacks for batch and streaming. In addition, unified APIs make it easy to migrate your existing batch Spark jobs to streaming jobs.
Spark Structured Streaming uses the same underlying architecture as Spark so that you can take advantage of all the performance and cost optimizations built into the Spark engine. With Spark Structured Streaming, you can build low latency streaming applications and pipelines cost effectively.
To get started with Spark Structured Streaming:
Spark Structured Streaming is developed as part of Apache Spark. It thus gets tested and updated with each Spark release.
If you have questions about the system, ask on the Spark mailing lists.
The Spark Structured Streaming developers welcome contributions. If you'd like to help out, read how to contribute to Spark, and send us a patch!