Spark Release 0.5.1

Spark 0.5.1 is a maintenance release that adds several important bug fixes and usability features. You can download it as a tar.gz file.

Maven Publishing

Spark is now available in Maven Central, making it easier to link into your programs without having to build it as a JAR. Use the following Maven identifiers to add it to a project:

  • groupId: org.spark-project
  • artifactId: spark-core_2.9.2
  • version: 0.5.1

Scala 2.9.2

Spark now builds against Scala 2.9.2 by default.

Improved Accumulators

The new Accumulable class generalizes Accumulators for the case when the type being accumulated is not the same as the types of elements being added (e.g. you wish to accumulate a collection, such as a Set, by adding individual elements). This interface is also more efficient in avoiding the creation of temporary objects. (Contributed by Imran Rashid.)

Bug Fixes

  • Spark's algorithm for estimating the sizes of objects (in order to manage memory correctly) has been improved to handle JVMs with 32- vs 64-bit pointers and to measure objects more accurately. (Contributed by Shivaram Venkataraman.)
  • Improved algorithms for taking random samples out of datasets to avoid biases that could occur in the previous ones. (Suggested by Henry Milner.)
  • Improved load balancing across nodes in sort operations.
  • Fixed a shuffle bug that could cause reduce tasks to fail to receive a map task's full output.
  • Fixed a bug with locating custom KryoSerializers.
  • Reduced memory consumption of saveAsObjectFile when objects are large.

EC2 Improvements

Spark’s EC2 launch script now configures Spark’s memory limit automatically based on the machine’s available RAM.

Spark News Archive