A datatype that can be accumulated, i.
Helper object defining how to accumulate values of a particular type.
A simpler value of Accumulable where the result type being accumulated is the same as the types of elements being merged.
A simpler version of AccumulableParam where the only datatype you can add in is the same type as the accumulated value.
A set of functions used to aggregate data.
Base class for dependencies.
Extra functions available on RDDs of Doubles through an implicit conversion.
A Partitioner that implements hash-based partitioning using Java's Object.hashCode
.
A Spark serializer that uses Java's built-in serialization.
Interface implemented by clients to register their classes with Kryo when using Kryo serialization.
A Spark serializer that uses the Kryo 1.x library.
Utility trait for classes that want to log data.
Base class for dependencies where each partition of the parent RDD is used by at most one partition of the child RDD.
Represents a one-to-one dependency between partitions of the parent and child RDDs.
Extra functions available on RDDs of (key, value) pairs where the key is sortable through an implicit conversion.
Extra functions available on RDDs of (key, value) pairs through an implicit conversion.
A partition of an RDD.
An object that defines how the elements in a key-value pair RDD are partitioned by key.
A Resilient Distributed Dataset (RDD), the basic abstraction in Spark.
Represents a one-to-one dependency between ranges of partitions in the parent and child RDDs.
A Partitioner that partitions sortable records by range into roughly equal ranges.
Extra functions available on RDDs of (key, value) pairs to create a Hadoop SequenceFile, through an implicit conversion.
Represents a dependency on the output of a shuffle stage.
Main entry point for Spark functionality.
Holds all the runtime environment objects for a running Spark instance (either master or worker), including the serializer, Akka actor system, block manager, map output tracker, etc.
The SparkContext object contains a number of implicit conversions and parameters for use with various Spark features.
Core Spark functionality. SparkContext serves as the main entry point to Spark, while RDD is the data type representing a distributed collection, and provides most parallel operations.
In addition, PairRDDFunctions contains operations available only on RDDs of key-value pairs, such as
groupByKey
andjoin
; DoubleRDDFunctions contains operations available only on RDDs of Doubles; and SequenceFileRDDFunctions contains operations available on RDDs that can be saved as SequenceFiles. These operations are automatically available on any RDD of the right type (e.g. RDD[(Int, Int)] through implicit conversions when youimport spark.SparkContext._
.