org.apache.spark
Class RangePartitioner<K,V>

Object
  extended by org.apache.spark.Partitioner
      extended by org.apache.spark.RangePartitioner<K,V>
All Implemented Interfaces:
java.io.Serializable

public class RangePartitioner<K,V>
extends Partitioner

A Partitioner that partitions sortable records by range into roughly equal ranges. The ranges are determined by sampling the content of the RDD passed in.

Note that the actual number of partitions created by the RangePartitioner might not be the same as the partitions parameter, in the case where the number of sampled records is less than the value of partitions.

See Also:
Serialized Form

Constructor Summary
RangePartitioner(int partitions, RDD<? extends scala.Product2<K,V>> rdd, boolean ascending, scala.math.Ordering<K> evidence$1, scala.reflect.ClassTag<K> evidence$2)
           
 
Method Summary
static
<K> Object
determineBounds(scala.collection.mutable.ArrayBuffer<scala.Tuple2<K,Object>> candidates, int partitions, scala.math.Ordering<K> evidence$4, scala.reflect.ClassTag<K> evidence$5)
          Determines the bounds for range partitioning from candidates with weights indicating how many items each represents.
 boolean equals(Object other)
           
 int getPartition(Object key)
           
 int hashCode()
           
 int numPartitions()
           
static
<K> scala.Tuple2<Object,scala.Tuple3<Object,Object,Object>[]>
sketch(RDD<K> rdd, int sampleSizePerPartition, scala.reflect.ClassTag<K> evidence$3)
          Sketches the input RDD via reservoir sampling on each partition.
 
Methods inherited from class org.apache.spark.Partitioner
defaultPartitioner
 
Methods inherited from class Object
getClass, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

RangePartitioner

public RangePartitioner(int partitions,
                        RDD<? extends scala.Product2<K,V>> rdd,
                        boolean ascending,
                        scala.math.Ordering<K> evidence$1,
                        scala.reflect.ClassTag<K> evidence$2)
Method Detail

sketch

public static <K> scala.Tuple2<Object,scala.Tuple3<Object,Object,Object>[]> sketch(RDD<K> rdd,
                                                                                   int sampleSizePerPartition,
                                                                                   scala.reflect.ClassTag<K> evidence$3)
Sketches the input RDD via reservoir sampling on each partition.

Parameters:
rdd - the input RDD to sketch
sampleSizePerPartition - max sample size per partition
evidence$3 - (undocumented)
Returns:
(total number of items, an array of (partitionId, number of items, sample))

determineBounds

public static <K> Object determineBounds(scala.collection.mutable.ArrayBuffer<scala.Tuple2<K,Object>> candidates,
                                         int partitions,
                                         scala.math.Ordering<K> evidence$4,
                                         scala.reflect.ClassTag<K> evidence$5)
Determines the bounds for range partitioning from candidates with weights indicating how many items each represents. Usually this is 1 over the probability used to sample this candidate.

Parameters:
candidates - unordered candidates with weights
partitions - number of partitions
evidence$4 - (undocumented)
evidence$5 - (undocumented)
Returns:
selected bounds

numPartitions

public int numPartitions()
Specified by:
numPartitions in class Partitioner

getPartition

public int getPartition(Object key)
Specified by:
getPartition in class Partitioner

equals

public boolean equals(Object other)
Overrides:
equals in class Object

hashCode

public int hashCode()
Overrides:
hashCode in class Object