Class RangePartitioner<K,V>

Object
org.apache.spark.Partitioner
org.apache.spark.RangePartitioner<K,V>
All Implemented Interfaces:
Serializable

public class RangePartitioner<K,V> extends Partitioner
A Partitioner that partitions sortable records by range into roughly equal ranges. The ranges are determined by sampling the content of the RDD passed in.

See Also:
Note:
The actual number of partitions created by the RangePartitioner might not be the same as the partitions parameter, in the case where the number of sampled records is less than the value of partitions.
  • Constructor Summary

    Constructors
    Constructor
    Description
    RangePartitioner(int partitions, RDD<? extends scala.Product2<K,V>> rdd, boolean ascending, int samplePointsPerPartitionHint, scala.math.Ordering<K> evidence$1, scala.reflect.ClassTag<K> evidence$2)
     
    RangePartitioner(int partitions, RDD<? extends scala.Product2<K,V>> rdd, boolean ascending, scala.math.Ordering<K> evidence$3, scala.reflect.ClassTag<K> evidence$4)
     
  • Method Summary

    Modifier and Type
    Method
    Description
    static <K> Object
    determineBounds(scala.collection.mutable.ArrayBuffer<scala.Tuple2<K,Object>> candidates, int partitions, scala.math.Ordering<K> evidence$6, scala.reflect.ClassTag<K> evidence$7)
    Determines the bounds for range partitioning from candidates with weights indicating how many items each represents.
    boolean
    equals(Object other)
     
    int
     
    int
     
    int
     
    int
     
    static <K> scala.Tuple2<Object,scala.Tuple3<Object,Object,Object>[]>
    sketch(RDD<K> rdd, int sampleSizePerPartition, scala.reflect.ClassTag<K> evidence$5)
    Sketches the input RDD via reservoir sampling on each partition.

    Methods inherited from class org.apache.spark.Partitioner

    defaultPartitioner

    Methods inherited from class java.lang.Object

    getClass, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • RangePartitioner

      public RangePartitioner(int partitions, RDD<? extends scala.Product2<K,V>> rdd, boolean ascending, int samplePointsPerPartitionHint, scala.math.Ordering<K> evidence$1, scala.reflect.ClassTag<K> evidence$2)
    • RangePartitioner

      public RangePartitioner(int partitions, RDD<? extends scala.Product2<K,V>> rdd, boolean ascending, scala.math.Ordering<K> evidence$3, scala.reflect.ClassTag<K> evidence$4)
  • Method Details

    • sketch

      public static <K> scala.Tuple2<Object,scala.Tuple3<Object,Object,Object>[]> sketch(RDD<K> rdd, int sampleSizePerPartition, scala.reflect.ClassTag<K> evidence$5)
      Sketches the input RDD via reservoir sampling on each partition.

      Parameters:
      rdd - the input RDD to sketch
      sampleSizePerPartition - max sample size per partition
      evidence$5 - (undocumented)
      Returns:
      (total number of items, an array of (partitionId, number of items, sample))
    • determineBounds

      public static <K> Object determineBounds(scala.collection.mutable.ArrayBuffer<scala.Tuple2<K,Object>> candidates, int partitions, scala.math.Ordering<K> evidence$6, scala.reflect.ClassTag<K> evidence$7)
      Determines the bounds for range partitioning from candidates with weights indicating how many items each represents. Usually this is 1 over the probability used to sample this candidate.

      Parameters:
      candidates - unordered candidates with weights
      partitions - number of partitions
      evidence$6 - (undocumented)
      evidence$7 - (undocumented)
      Returns:
      selected bounds
    • samplePointsPerPartitionHint

      public int samplePointsPerPartitionHint()
    • numPartitions

      public int numPartitions()
      Specified by:
      numPartitions in class Partitioner
    • getPartition

      public int getPartition(Object key)
      Specified by:
      getPartition in class Partitioner
    • equals

      public boolean equals(Object other)
      Overrides:
      equals in class Object
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object