Lightning-fast cluster computing

This page tracks external software projects that supplement Apache Spark and add to its ecosystem. is an external, community-managed list of third-party libraries, add-ons, and applications that work with Apache Spark. You can add a package as long as you have a GitHub repository.

Infrastructure Projects

Applications Using Spark

  • Apache Mahout - Previously on Hadoop MapReduce, Mahout has switched to using Spark as the backend
  • Apache MRQL - A query processing and optimization system for large-scale, distributed data analysis, built on top of Apache Hadoop, Hama, and Spark
  • BlinkDB - a massively parallel, approximate query engine built on top of Shark and Spark
  • Spindle - Spark/Parquet-based web analytics query engine
  • Spark Spatial - Spatial joins and processing for Spark
  • Thunderain - a framework for combining stream processing with historical data, think Lambda architecture
  • DF from Ayasdi - a Pandas-like data frame implementation for Spark
  • Oryx - Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning
  • ADAM - A framework and CLI for loading, transforming, and analyzing genomic data using Apache Spark

Additional Language Bindings

C# / .NET