Package org.apache.spark
Class RangePartitioner<K,V>
Object
org.apache.spark.Partitioner
org.apache.spark.RangePartitioner<K,V>
- All Implemented Interfaces:
Serializable
A
Partitioner
that partitions sortable records by range into roughly
equal ranges. The ranges are determined by sampling the content of the RDD passed in.
- See Also:
- Note:
- The actual number of partitions created by the RangePartitioner might not be the same
as the
partitions
parameter, in the case where the number of sampled records is less than the value ofpartitions
.
-
Constructor Summary
ConstructorDescriptionRangePartitioner
(int partitions, RDD<? extends scala.Product2<K, V>> rdd, boolean ascending, int samplePointsPerPartitionHint, scala.math.Ordering<K> evidence$1, scala.reflect.ClassTag<K> evidence$2) RangePartitioner
(int partitions, RDD<? extends scala.Product2<K, V>> rdd, boolean ascending, scala.math.Ordering<K> evidence$3, scala.reflect.ClassTag<K> evidence$4) -
Method Summary
Modifier and TypeMethodDescriptionstatic <K> Object
determineBounds
(scala.collection.mutable.ArrayBuffer<scala.Tuple2<K, Object>> candidates, int partitions, scala.math.Ordering<K> evidence$6, scala.reflect.ClassTag<K> evidence$7) Determines the bounds for range partitioning from candidates with weights indicating how many items each represents.boolean
int
getPartition
(Object key) int
hashCode()
int
int
Sketches the input RDD via reservoir sampling on each partition.Methods inherited from class org.apache.spark.Partitioner
defaultPartitioner
-
Constructor Details
-
RangePartitioner
-
RangePartitioner
-
-
Method Details
-
sketch
public static <K> scala.Tuple2<Object,scala.Tuple3<Object, sketchObject, Object>[]> (RDD<K> rdd, int sampleSizePerPartition, scala.reflect.ClassTag<K> evidence$5) Sketches the input RDD via reservoir sampling on each partition.- Parameters:
rdd
- the input RDD to sketchsampleSizePerPartition
- max sample size per partitionevidence$5
- (undocumented)- Returns:
- (total number of items, an array of (partitionId, number of items, sample))
-
determineBounds
public static <K> Object determineBounds(scala.collection.mutable.ArrayBuffer<scala.Tuple2<K, Object>> candidates, int partitions, scala.math.Ordering<K> evidence$6, scala.reflect.ClassTag<K> evidence$7) Determines the bounds for range partitioning from candidates with weights indicating how many items each represents. Usually this is 1 over the probability used to sample this candidate.- Parameters:
candidates
- unordered candidates with weightspartitions
- number of partitionsevidence$6
- (undocumented)evidence$7
- (undocumented)- Returns:
- selected bounds
-
samplePointsPerPartitionHint
public int samplePointsPerPartitionHint() -
numPartitions
public int numPartitions()- Specified by:
numPartitions
in classPartitioner
-
getPartition
- Specified by:
getPartition
in classPartitioner
-
equals
-
hashCode
public int hashCode()
-