Apache Spark Documentation

Setup instructions, programming guides, and other documentation are available for each stable version of Spark below:

Documentation for preview releases:

The documentation linked to above covers getting started with Spark, as well the built-in components MLlib, Spark Streaming, and GraphX.

In addition, this page lists other resources for learning Spark.

Videos

See the Apache Spark YouTube Channel for videos from Spark events. There are separate playlists for videos of different topics. Besides browsing through playlists, you can also find direct links to videos below.

Screencast Tutorial Videos

Spark Summit Videos

Meetup Talk Videos

In addition to the videos listed below, you can also view all slides from Bay Area meetups here.

Training Materials

  • Training materials and exercises from Spark Summit 2014 are available online. These include videos and slides of talks as well as exercises you can run on your laptop. Topics include Spark core, tuning and debugging, Spark SQL, Spark Streaming, GraphX and MLlib.
  • Spark Summit 2013 included a training session, with slides and videos available on the training day agenda. The session also included exercises that you can walk through on Amazon EC2.
  • The UC Berkeley AMPLab regularly hosts training camps on Spark and related projects. Slides, videos and EC2-based exercises from each of these are available online:
    • AMP Camp 4 (Strata Santa Clara, Feb 2014) — focus on BlinkDB, MLlib, GraphX, Tachyon
    • AMP Camp 3 (Berkeley, CA, Aug 2013)
    • AMP Camp 2 (Strata Santa Clara, Feb 2013)
    • AMP Camp 1 (Berkeley, CA, Aug 2012)

Hands-On Exercises

External Tutorials, Blog Posts, and Talks

Books

Examples

Research Papers

Spark was initially developed as a UC Berkeley research project, and much of the design is documented in papers. The research page lists some of the original motivation and direction.