Class CoGroupedRDD<K>

Object
org.apache.spark.rdd.RDD<scala.Tuple2<K,scala.collection.Iterable<?>[]>>
org.apache.spark.rdd.CoGroupedRDD<K>
All Implemented Interfaces:
Serializable, org.apache.spark.internal.Logging, scala.Serializable

public class CoGroupedRDD<K> extends RDD<scala.Tuple2<K,scala.collection.Iterable<?>[]>>
:: DeveloperApi :: An RDD that cogroups its parents. For each key k in parent RDDs, the resulting RDD contains a tuple with the list of values for that key.

param: rdds parent RDDs. param: part partitioner used to partition the shuffle output

See Also:
Note:
This is an internal API. We recommend users use RDD.cogroup(...) instead of instantiating this directly.
  • Constructor Details

    • CoGroupedRDD

      public CoGroupedRDD(scala.collection.Seq<RDD<? extends scala.Product2<K,?>>> rdds, Partitioner part, scala.reflect.ClassTag<K> evidence$1)
  • Method Details

    • clearDependencies

      public void clearDependencies()
    • compute

      public scala.collection.Iterator<scala.Tuple2<K,scala.collection.Iterable<?>[]>> compute(Partition s, TaskContext context)
      Description copied from class: RDD
      :: DeveloperApi :: Implemented by subclasses to compute a given partition.
      Specified by:
      compute in class RDD<scala.Tuple2<K,scala.collection.Iterable<?>[]>>
      Parameters:
      s - (undocumented)
      context - (undocumented)
      Returns:
      (undocumented)
    • getDependencies

      public scala.collection.Seq<Dependency<?>> getDependencies()
    • getPartitions

      public Partition[] getPartitions()
    • partitioner

      public scala.Some<Partitioner> partitioner()
      Description copied from class: RDD
      Optionally overridden by subclasses to specify how they are partitioned.
      Overrides:
      partitioner in class RDD<scala.Tuple2<K,scala.collection.Iterable<?>[]>>
    • rdds

      public scala.collection.Seq<RDD<? extends scala.Product2<K,?>>> rdds()
    • setSerializer

      public CoGroupedRDD<K> setSerializer(Serializer serializer)
      Set a serializer for this RDD's shuffle, or null to use the default (spark.serializer)