org.apache.spark.Partitioner

All Implemented Interfaces:: Serializable

Direct Known Subclasses:: HashPartitioner, RangePartitioner

public abstract class Partitioner extends Object implements Serializable

An object that defines how the elements in a key-value pair RDD are partitioned by key. Maps each key to a partition ID, from 0 to numPartitions - 1.

Note that, partitioner must be deterministic, i.e. it must return the same partition id given the same partition key.

See Also:

Serialized Form

Constructor Summary

Constructors

Constructor

Description

Partitioner()
Method Summary

Modifier and Type

Method

Description

static Partitioner

defaultPartitioner(RDD<?> rdd, scala.collection.immutable.Seq<RDD<?>> others)

Choose a partitioner to use for a cogroup-like operation between a number of RDDs.

abstract int

getPartition(Object key)

abstract int

numPartitions()

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- Partitioner
  
  public Partitioner()
Method Details
- defaultPartitioner
  
  public static Partitioner defaultPartitioner(RDD<?> rdd, scala.collection.immutable.Seq<RDD<?>> others)
  
  Choose a partitioner to use for a cogroup-like operation between a number of RDDs.
  If spark.default.parallelism is set, we'll use the value of SparkContext defaultParallelism as the default partitions number, otherwise we'll use the max number of upstream partitions.
  When available, we choose the partitioner from rdds with maximum number of partitions. If this partitioner is eligible (number of partitions within an order of maximum number of partitions in rdds), or has partition number higher than or equal to default partitions number - we use this partitioner.
  Otherwise, we'll use a new HashPartitioner with the default partitions number.
  Unless spark.default.parallelism is set, the number of partitions will be the same as the number of partitions in the largest upstream RDD, as this should be least likely to cause out-of-memory errors.
  We use two method parameters (rdd, others) to enforce callers passing at least 1 RDD.
  
  Parameters:
  
  rdd - (undocumented)
  
  others - (undocumented)
  
  Returns:
  
  (undocumented)
- numPartitions
  
  public abstract int numPartitions()
- getPartition
  
  public abstract int getPartition(Object key)

Class Partitioner

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

Partitioner

Method Details

defaultPartitioner

numPartitions

getPartition