|Mosharaf Chowdhury||University of Michigan, Ann Arbor|
|Ankur Dave||UC Berkeley|
|Joseph Gonzalez||UC Berkeley|
|Mark Hamstra||ClearStory Data|
|Herman van Hovell||QuestTec B.V.|
|Haoyuan Li||Alluxio, UC Berkeley|
|Andrew Or||Princeton University|
|Kay Ousterhout||UC Berkeley|
|Charles Reiss||UC Berkeley|
|Sandy Ryza||Clover Health|
|Kousuke Saruta||NTT Data|
|Shivaram Venkataraman||UC Berkeley|
|Matei Zaharia||Databricks, Stanford|
To get started contributing to Spark, learn how to contribute – anyone can submit patches, documentation and examples to the project.
The PMC regularly adds new committers from the active contributors, based on their contributions to Spark. The qualifications for new committers include:
The type and level of contributions considered may vary by project area – for example, we greatly encourage contributors who want to work on mainly the documentation, or mainly on platform support for specific OSes, storage systems, etc.
All contributions should be reviewed before merging as described in
Contributing to Spark.
In particular, if you are working on an area of the codebase you are unfamiliar with, look at the
Git history for that code to see who reviewed patches before. You can do this using
git log --format=full <filename>, by examining the “Commit” field to see who committed each patch.
Changes pushed to the master branch on Apache cannot be removed; that is, we can’t force-push to it. So please don’t add any test commits or anything like that, only real patches.
All merges should be done using the
script, which squashes the pull request’s changes into one commit. To use this script, you
will need to add a git remote called “apache” at https://git-wip-us.apache.org/repos/asf/spark.git,
as well as one called “apache-github” at
git://github.com/apache/spark. For the
you can authenticate using your ASF username and password. Ask Patrick if you have trouble with
this or want help doing your first merge.
The script is fairly self explanatory and walks you through steps and options interactively.
If you want to amend a commit before merging – which should be used for trivial touch-ups –
then simply let the script wait at the point where it asks you if you want to push to Apache.
Then, in a separate window, modify the code and push a commit. Run
git rebase -i HEAD~2 and
“squash” your new commit. Edit the commit message just after to remove your commit message.
You can verify the result is one change with
git log. Then resume the script in the other window.
Also, please remember to set Assignee on JIRAs where applicable when they are resolved. The script can’t do this automatically.
The trade off when backporting is you get to deliver the fix to people running older versions (great!), but you risk introducing new or even worse bugs in maintenance releases (bad!). The decision point is when you have a bug fix and it’s not clear whether it is worth backporting.
I think the following facets are important to consider: - Backports are an extremely valuable service to the community and should be considered for any bug fix. - Introducing a new bug in a maintenance release must be avoided at all costs. It over time would erode confidence in our release process. - Distributions or advanced users can always backport risky patches on their own, if they see fit.
For me, the consequence of these is that we should backport in the following situations: - Both the bug and the fix are well understood and isolated. Code being modified is well tested. - The bug being addressed is high priority to the community. - The backported fix does not vary widely from the master branch fix.
We tend to avoid backports in the converse situations: - The bug or fix are not well understood. For instance, it relates to interactions between complex components or third party libraries (e.g. Hadoop libraries). The code is not well tested outside of the immediate bug being fixed. - The bug is not clearly a high priority for the community. - The backported fix is widely different from the master branch fix.