Lightning-fast cluster computing

Spark Release 0.7.3

Spark 0.7.3 is a maintenance release with several bug fixes, performance fixes, and new features. You can download it as a source package (4 MB tar.gz) or get prebuilt packages for Hadoop 1 / CDH3 or for CDH 4 (61 MB tar.gz).

We recommend that all users update to this maintenance release.

The improvements in this release include:

  • New "add JARs" functionality in Spark shell: Users of spark-shell can now set the ADD_JARS environment variable to add a list of JARs to their clusters; these will also be sent to workers.
  • Windows fixes: Spark standalone clusters now properly kill executors when a job ends or fails. In addition, adding JAR paths with backslashes will now work correctly.
  • Streaming API fixes: The Kafka and Twitter APIs for Spark Streaming have been updated. In the Twitter case, this is to deal with the username/password authentication method being disabled in by Twitter, while in the Kafka case, it is to allow receiving messages other than strings. Note that these are breaking API changes as the Streaming API is still in alpha.
  • Python performance: Spark's mechanism for spawning Python VMs has been improved to do so faster when the JVM has a large heap size, speeding up the Python API.
  • Mesos fixes: JARs added to your job will now be on the classpath when deserializing task results in Mesos.
  • Error reporting: Better error reporting for non-serializable exceptions and overly large task results.
  • Examples: Added an example of stateful stream processing with updateStateByKey.
  • Build: Spark Streaming no longer depends on the Twitter4J repo, which should allow it to build in China.
  • Bug fixes in foldByKey, streaming count, statistics methods, documentation, and web UI.

The following people contributed to this release:

  • Charles Reiss (Mesos)
  • Christoph Grothaus (Windows spawn fixes)
  • Christopher Nguyen (bug fixes)
  • James Phillpotts (Twitter input stream)
  • Jey Kottalam (Python performance)
  • Josh Rosen (usability)
  • Konstantin Boudnik (build)
  • Mark Hamstra (build)
  • Matei Zaharia (Windows, docs, ADD_JARS, Python, streaming)
  • Patrick Wendell (usability)
  • Tathagata Das (streaming fixes)
  • Jerry Shao (bug fixes)
  • S. Kumar (examples)
  • Sean McNamara (Kafka input streams, streaming fixes)

Spark News Archive