Packages

object Partitioner extends Serializable

Source
Partitioner.scala
Linear Supertypes
Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. Partitioner
  2. Serializable
  3. Serializable
  4. AnyRef
  5. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. def defaultPartitioner(rdd: RDD[_], others: RDD[_]*): Partitioner

    Choose a partitioner to use for a cogroup-like operation between a number of RDDs.

    Choose a partitioner to use for a cogroup-like operation between a number of RDDs.

    If spark.default.parallelism is set, we'll use the value of SparkContext defaultParallelism as the default partitions number, otherwise we'll use the max number of upstream partitions.

    When available, we choose the partitioner from rdds with maximum number of partitions. If this partitioner is eligible (number of partitions within an order of maximum number of partitions in rdds), or has partition number higher than or equal to default partitions number - we use this partitioner.

    Otherwise, we'll use a new HashPartitioner with the default partitions number.

    Unless spark.default.parallelism is set, the number of partitions will be the same as the number of partitions in the largest upstream RDD, as this should be least likely to cause out-of-memory errors.

    We use two method parameters (rdd, others) to enforce callers passing at least 1 RDD.