org.apache.spark
Class Partitioner

Object
  extended by org.apache.spark.Partitioner
All Implemented Interfaces:
java.io.Serializable
Direct Known Subclasses:
HashPartitioner, RangePartitioner

public abstract class Partitioner
extends Object
implements scala.Serializable

An object that defines how the elements in a key-value pair RDD are partitioned by key. Maps each key to a partition ID, from 0 to numPartitions - 1.

See Also:
Serialized Form

Constructor Summary
Partitioner()
           
 
Method Summary
static Partitioner defaultPartitioner(RDD<?> rdd, scala.collection.Seq<RDD<?>> others)
          Choose a partitioner to use for a cogroup-like operation between a number of RDDs.
abstract  int getPartition(Object key)
           
abstract  int numPartitions()
           
 
Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Partitioner

public Partitioner()
Method Detail

defaultPartitioner

public static Partitioner defaultPartitioner(RDD<?> rdd,
                                             scala.collection.Seq<RDD<?>> others)
Choose a partitioner to use for a cogroup-like operation between a number of RDDs.

If any of the RDDs already has a partitioner, choose that one.

Otherwise, we use a default HashPartitioner. For the number of partitions, if spark.default.parallelism is set, then we'll use the value from SparkContext defaultParallelism, otherwise we'll use the max number of upstream partitions.

Unless spark.default.parallelism is set, the number of partitions will be the same as the number of partitions in the largest upstream RDD, as this should be least likely to cause out-of-memory errors.

We use two method parameters (rdd, others) to enforce callers passing at least 1 RDD.

Parameters:
rdd - (undocumented)
others - (undocumented)
Returns:
(undocumented)

numPartitions

public abstract int numPartitions()

getPartition

public abstract int getPartition(Object key)