Spark News

Spark+AI Summit (June 22-25th, 2020, VIRTUAL) agenda posted

The agenda for Spark + AI Summit 2020 is now available! The summit kicks off on June 22nd. We’ve transformed this year’s Summit into a global event — totally virtual and open to everyone, free of charge. And Summit is now even bigger: extended to five days with 200+ sessions, 4x the training, and keynotes by visionaries and thought leaders. Join tens of thousands of engineers, scientists, developers, analysts and leaders as we shape the future of big data, analytics and AI. Check out the full schedule and register to attend!

Preview release of Spark 3.0

To enable wide-scale community testing of the upcoming Spark 3.0 release, the Apache Spark community has posted a Spark 3.0.0 preview2 release. This preview is not a stable release in terms of either API or functionality, but it is meant to give the community early access to try the code that will become Spark 3.0. If you would like to test the release, please download it, and send feedback using either the mailing lists or JIRA. The documentation is available at the link.

Preview release of Spark 3.0

To enable wide-scale community testing of the upcoming Spark 3.0 release, the Apache Spark community has posted a preview release of Spark 3.0. This preview is not a stable release in terms of either API or functionality, but it is meant to give the community early access to try the code that will become Spark 3.0. If you would like to test the release, please download it, and send feedback using either the mailing lists or JIRA.

Plan for dropping Python 2 support

As many of you already knew, Python core development team and many utilized Python packages like Pandas and NumPy will drop Python 2 support in or before 2020/01/01. Apache Spark has supported both Python 2 and 3 since Spark 1.4 release in 2015. However, maintaining Python 2/3 compatibility is an increasing burden and it essentially limits the use of Python 3 features in Spark. Given the end of life (EOL) of Python 2 is coming, we plan to eventually drop Python 2 support as well. The current plan is as follows:

Spark wins CloudSort Benchmark as the most efficient engine

We are proud to announce that Apache Spark won the 2016 CloudSort Benchmark (both Daytona and Indy category). A joint team from Nanjing University, Alibaba Group, and Databricks Inc. entered the competition using NADSort, a distributed sorting program built on top of Spark, and set a new world record as the most cost-efficient way to sort 100TB of data.

Spark 2.0.2 released

We are happy to announce the availability of Apache Spark 2.0.2! This maintenance release includes fixes across several areas of Spark, as well as Kafka 0.10 and runtime metrics support for Structured Streaming.

Spark 1.6.3 released

We are happy to announce the availability of Spark 1.6.3! This maintenance release includes fixes across several areas of Spark.

Spark 1.6.2 released

We are happy to announce the availability of Spark 1.6.2! This maintenance release includes fixes across several areas of Spark.

Call for Presentations for Spark Summit EU is Open

Call for presentations is now open for Spark Summit EU! The event will take place on October 25-27 in Brussels. Submissions are welcome across a variety of Spark-related topics, including applications, development, data science, enterprise, spark ecosystem and research. Please submit by July 1 to be considered.

Preview release of Spark 2.0

To enable wide-scale community testing of the upcoming Spark 2.0 release, the Apache Spark team has posted a preview release of Spark 2.0. This preview is not a stable release in terms of either API or functionality, but it is meant to give the community early access to try the code that will become Spark 2.0. If you would like to test the release, simply download it, and send feedback using either the mailing lists or JIRA.

Spark 1.6.1 released

We are happy to announce the availability of Spark 1.6.1! This maintenance release includes fixes across several areas of Spark, including significant updates to the experimental Dataset API.

Submission is open for Spark Summit San Francisco

Call for presentations is now open for Spark Summit San Francisco! The event will take place on June 6-8 in San Francisco. Submissions are welcome across a variety of Spark-related topics, including applications, development, data science, business value, spark ecosystem and research. Please submit by February 29th to be considered.

Spark 1.6.0 released

We are happy to announce the availability of Spark 1.6.0! Spark 1.6.0 is the seventh release on the API-compatible 1.X line. With this release the Spark community continues to grow, with contributions from 248 developers!

CFP for Spark Summit East 2016 is closing soon!

Call for presentations is closing soon for Spark Summit East! The event will take place on February 16th-18th in New York City. Submissions are welcome across a variety of Spark-related topics, including applications, development, data science, enterprise, and research. Please submit by November 22nd to be considered.

Spark 1.5.2 released

We are happy to announce the availability of Spark 1.5.2! This maintenance release includes fixes across several areas of Spark, including the DataFrame API, Spark Streaming, PySpark, R, Spark SQL, and MLlib.

Submission is open for Spark Summit East 2016

Abstract submissions are now open for the 2nd Spark Summit East! The event will take place on February 16th-18th in New York City. Submissions are welcome across a variety of Spark-related topics, including applications, development, data science, enterprise, and research.

Spark 1.5.1 released

We are happy to announce the availability of Spark 1.5.1! This maintenance release includes fixes across several areas of Spark, including the DataFrame API, Spark Streaming, PySpark, R, Spark SQL, and MLlib.

Spark 1.5.0 released

We are happy to announce the availability of Spark 1.5.0! Spark 1.5.0 is the sixth release on the API-compatible 1.X line. It is Spark’s largest release ever, with contributions from 230 developers and more than 1,400 commits!

Spark 1.4.1 released

We are happy to announce the availability of Spark 1.4.1! This is a maintenance release that includes contributions from 85 developers. Spark 1.4.1 includes fixes across several areas of Spark, including the DataFrame API, Spark Streaming, PySpark, Spark SQL, and MLlib.

Spark Summit 2015 Videos Posted

The videos and slides for Spark Summit 2015 are now all available online! The talks include technical roadmap discussions, deep dives on Spark components, and use cases built on top of Spark.

Spark 1.4.0 released

We are happy to announce the availability of Spark 1.4.0! Spark 1.4.0 is the fifth release on the API-compatible 1.X line. It is Spark’s largest release ever, with contributions from 210 developers and more than 1,000 commits!

Announcing Spark Summit Europe

Abstract submissions are now open for the first ever Spark Summit Europe. The event will take place on October 27th to 29th in Amsterdam. Submissions are welcome across a variety of Spark related topics, including use cases and ongoing development.

Spark 1.3.0 released

We are happy to announce the availability of Spark 1.3.0! Spark 1.3.0 is the third release on the API-compatible 1.X line. It is Spark’s largest release ever, with contributions from 174 developers and more than 1,000 commits!

Spark 1.2.1 released

We are happy to announce the availability of Spark 1.2.1! This is a maintenance release that includes contributions from 69 developers. Spark 1.2.1 includes fixes across several areas of Spark, including the core API, Streaming, PySpark, SQL, GraphX, and MLlib.

Spark 1.2.0 released

We are happy to announce the availability of Spark 1.2.0! Spark 1.2.0 is the third release on the API-compatible 1.X line. It is Spark’s largest release ever, with contributions from 172 developers and more than 1,000 commits!

Spark 1.1.1 released

We are happy to announce the availability of Spark 1.1.1! This is a maintenance release that includes contributions from 55 developers. Spark 1.1.1 includes fixes across several areas of Spark, including the core API, Streaming, PySpark, SQL, GraphX, and MLlib.

Submissions open for Spark Summit East 2015 in New York

After successful events in the past two years, the Spark Summit conference has expanded for 2015, offering both an event in New York on March 18-19 and one in San Francisco on June 15-17. The conference is a great chance to meet people from throughout the Spark community and see the latest news, tips and use cases.

Spark 1.1.0 released

We are happy to announce the availability of Spark 1.1.0! Spark 1.1.0 is the second release on the API-compatible 1.X line. It is Spark’s largest release ever, with contributions from 171 developers!

Spark 1.0.2 released

We are happy to announce the availability of Spark 1.0.2! This release includes contributions from 30 developers. Spark 1.0.2 includes fixes across several areas of Spark, including the core API, Streaming, PySpark, and MLlib.

Spark 0.9.2 released

We are happy to announce the availability of Spark 0.9.2! Apache Spark 0.9.2 is a maintenance release with bug fixes. We recommend all 0.9.x users to upgrade to this stable release. Contributions to this release came from 28 developers.

Spark Summit 2014 videos posted

The videos and slides for Spark Summit 2014 are now all available online. Watch them to see the latest news from the Spark community as well as use cases and applications built on top. In addition, training materials from the Summit, including hands-on exercises, are all available freely as well.

Spark 1.0.1 released

We are happy to announce the availability of Spark 1.0.1! This release includes contributions from 70 developers. Spark 1.0.0 includes fixes across several areas of Spark, including the core API, PySpark, and MLlib. It also includes new features in Spark’s (alpha) SQL library, including support for JSON data and performance and stability fixes.

Spark 1.0.0 released

We are happy to announce the availability of Spark 1.0.0! Spark 1.0.0 is the first in the 1.X line of releases, providing API stability for Spark’s core interfaces. It is Spark’s largest release ever, with contributions from 117 developers. This release expands Spark’s standard libraries, introducing a new SQL package (Spark SQL) that lets users integrate SQL queries into existing Spark workflows. MLlib, Spark’s machine learning library, is expanded with sparse vector support and several new algorithms. The GraphX and Streaming libraries also introduce new features and optimizations. Spark’s core engine adds support for secured YARN clusters, a unified tool for submitting Spark applications, and several performance and stability improvements.

Spark Summit agenda posted

The agenda for the Spark Summit 2014 conference is now available online. With talks from more than 50 organizations, it will be the biggest Spark event yet, bringing the developer and user communities together. Join us in person or tune in online to learn about the latest happenings in Spark.

Spark 0.9.1 released

We are happy to announce the availability of Spark 0.9.1! Apache Spark 0.9.1 is a maintenance release with bug fixes, performance improvements, better stability with YARN and improved parity of the Scala and Python API. We recommend all 0.9.0 users to upgrade to this stable release. Contributions to this release came from 37 developers.

Spark becomes top-level Apache project

The Apache Software Foundation announced today that Spark has graduated from the Apache Incubator to become a top-level Apache project, signifying that the project’s community and products have been well-governed under the ASF’s meritocratic process and principles. This is a major step for the community and we are very proud to share this news with users as we complete Spark’s move to Apache. Read more about Spark’s growth during the past year and from contributors and users in the ASF’s press release.

Spark 0.9.0 released

We are happy to announce the availability of Spark 0.9.0! Spark 0.9.0 is a major release and Spark’s largest release ever, with contributions from 83 developers. This release expands Spark’s standard libraries, introducing a new graph computation package (GraphX) and adding several new features to the machine learning and stream-processing packages. It also makes major improvements to the core engine, including external aggregations, a simplified H/A mode for long lived applications, and hardened YARN support.

Spark 0.8.1 released

We’ve just posted Spark Release 0.8.1, a maintenance and performance release for the Scala 2.9 version of Spark. 0.8.1 includes support for YARN 2.2, a high availability mode for the standalone scheduler, optimizations to the shuffle, and many other improvements. We recommend that all users update to this release. Visit the release notes to read about the new features, or download the release today.

Spark Summit 2013 is a Wrap

The Spark Summit 2013, held in early December 2013 in downtown San Francisco, was a success! Over 450 Spark developers and enthusiasts from 13 countries and more than 180 companies came to learn from project leaders and production users of Spark, Shark, Spark Streaming and related projects about use cases, recent developments, and the Spark community roadmap.

Announcing the first Spark Summit: December 2, 2013

We are excited to announce the first Spark Summit on Dec 2, 2013 in Downtown San Francisco. Come hear from key production users of Spark, Shark, Spark Streaming and related projects. Also find out where the development is going, and learn how to use the Spark stack in a variety of applications. The summit is being organized and sponsored by leading organizations in the Spark community.

Spark 0.8.0 released

We’re proud to announce the release of Apache Spark 0.8.0. Spark 0.8.0 is a major release that includes many new capabilities and usability improvements. It’s also our first release under the Apache incubator. It is the largest Spark release yet, with contributions from 67 developers and 24 companies. Major new features include an expanded monitoring framework and UI, a machine learning library, and support for running Spark inside of YARN.

Spark user survey and "Powered By" page

As we continue developing Spark, we would love to get feedback from users and hear what you’d like us to work on next. We’ve decided that a good way to do that is a survey – we hope to run this at regular intervals. If you have a few minutes to participate, fill in the survey here. Your time is greatly appreciated.

Registration open for AMP Camp training camp in Berkeley

Want to learn how to use Spark, Shark, GraphX, and related technologies in person? The AMP Lab is hosting a two-day training workshop for them on August 29th and 30th in Berkeley. The workshop will include tutorials, talks from users, and over four hours of hands-on exercises. Registration is now open on the AMP Camp website, for a price of $250 per person. We recommend signing up early because last year’s workshop was sold out.

Spark mailing lists moving to Apache

As part of the Spark project's recent move to Apache, we are planning to migrate the mailing lists to Apache infrastructure this month, so that the existing Google groups will become read-only on September 1, 2013. To keep receiving updates about Spark or to participate in development discussions, please subscribe to the following lists:

Most users will probably want the User list, but individuals interested in contributing code to the project should also subscribe to the Dev list.

Spark 0.7.3 released

We’ve just posted Spark Release 0.7.3, a maintenance release that contains several fixes, including streaming API updates and new functionality for adding JARs to a spark-shell session. We recommend that all users update to this release. Visit the release notes to read about the new features, or download the release today.

Spark accepted into Apache Incubator

Spark was recently accepted into the Apache Incubator, which will serve as the long-term home for the project. While moving the source code and issue tracking to Apache will take some time, we are excited to be joining the community at Apache. Stay tuned on this site for updates on how the project hosting will change.

Spark 0.7.2 released

We’re happy to announce the release of Spark 0.7.2, a new maintenance release that includes several bug fixes and improvements, as well as new code examples and API features. We recommend that all users update to this release. Head over to the release notes to read about the new features, or download the release today.

Spark screencasts published

We have released the first two screencasts in a series of short hands-on video training courses we will be publishing to help new users get up and running with Spark in minutes.

Strata exercises now available online

At this year’s Strata conference, the AMP Lab hosted a full day of tutorials on Spark, Shark, and Spark Streaming, including online exercises on Amazon EC2. Those exercises are now available online, letting you learn Spark and Shark at your own pace on an EC2 cluster with real data. They are a great resource for learning the systems. You can also find slides from the Strata tutorials online, as well as videos from the AMP Camp workshop we held at Berkeley in August.

Spark 0.7.0 released

We’re proud to announce the release of Spark 0.7.0, a new major version of Spark that adds several key features, including a Python API for Spark and an alpha of Spark Streaming. This release is the result of the largest group of contributors yet behind a Spark release – 31 contributors from inside and outside Berkeley. Head over to the release notes to read more about the new features, or download the release today.

Spark/Shark Tutorial for Amazon EMR

This weekend, Amazon posted an article and code that make it easy to launch Spark and Shark on Elastic MapReduce. The article includes examples of how to run both interactive Scala commands and SQL queries from Shark on data in S3. Head over to the Amazon article for details. We’re very excited because, to our knowledge, this makes Spark the first non-Hadoop engine that you can launch with EMR.

Spark 0.6.2 released

We recently released Spark 0.6.2, a new version of Spark. This is a maintenance release that includes several bug fixes and usability improvements (see the release notes). We recommend that all users upgrade to this release.

Video up from first Spark development meetup

On December 18th, we held the first of a series of Spark development meetups, for people interested in learning the Spark codebase and contributing to the project. There was quite a bit more demand than we anticipated, with over 80 people signing up and 64 attending. The first meetup was an introduction to Spark internals. Thanks to one of the attendees, there’s now a video of the meetup on YouTube. We’ve also posted the slides. Look to see more development meetups on Spark and Shark in the future.

Spark in the news

Recently, we’ve seen quite a bit of coverage of Spark in the news. I wanted to list some of the more recent articles, for readers interested in learning more.

In other news, there will be a full day of tutorials on Spark and Shark at the O’Reilly Strata conference in February. They include a three-hour introduction to Spark, Shark and BDAS Tuesday morning, and a three-hour hands-on exercise session.

Spark 0.6.1 and 0.5.2 out

Today we’ve made available two maintenance releases for Spark: 0.6.1 and 0.5.2. They both contain important bug fixes as well as some new features, such as the ability to build against Hadoop 2 distributions. We recommend that users update to the latest version for their branch; for new users, we recommend 0.6.1.