Spark Release 0.7.2

Spark 0.7.2 is a maintenance release that contains multiple bug fixes and improvements. You can download it as a source package (4 MB tar.gz) or get prebuilt packages for Hadoop 1 / CDH3 or CDH 4 (61 MB tar.gz).

We recommend that all users update to this maintenance release.

The fixes and improvements in this version include:

  • Scala version updated to 2.9.3.
  • Several improvements to Bagel, including performance fixes and a configurable storage level.
  • New API methods: subtractByKey, foldByKey, mapWith, filterWith, foreachPartition, and others.
  • A new metrics reporting interface, SparkListener, to collect information about each computation stage: task lengths, bytes shuffled, etc.
  • Several new examples using the Java API, including K-means and computing pi.
  • Support for launching multiple worker instances per host in the standalone mode.
  • Various bug fixes across the board.

The following people contributed to this release:

  • Jey Kottalam (Maven build, bug fixes, EC2 scripts, packaging the release)
  • Andrew Ash (bug fixes, docs)
  • Andrey Kouznetsov (bug fixes)
  • Andy Konwinski (docs)
  • Charles Reiss (bug fixes)
  • Christoph Grothaus (bug fixes)
  • Erik van Oosten (bug fixes)
  • Giovanni Delussu (bug fixes)
  • Hiral Patel (bug fixes)
  • Holden Karau (error reporting, EC2 scripts)
  • Imran Rashid (metrics reporting system)
  • Josh Rosen (EC2 scripts)
  • Mark Hamstra (new API methods, tests)
  • Mikhail Bautin (build)
  • Mosharaf Chowdhury (bug fixes)
  • Nick Pentreath (Bagel, examples)
  • Patrick Wendell (bug fixes)
  • Reynold Xin (bug fixes)
  • Stephen Haberman (bug fixes, tests, subtractByKey)
  • Kalpit Shah (build, multiple workers per host)
  • Mike Potts (run scripts)
  • Matei Zaharia (Bagel, bug fixes, build)

We thank everyone who helped with this release, and hope to see more contributions from you in the future!

Spark News Archive