org.apache.spark.rdd.RDD<scala.Tuple2<K,scala.collection.Iterable<Object>[]>>

org.apache.spark.rdd.CoGroupedRDD<K>

All Implemented Interfaces:: Serializable, org.apache.spark.internal.Logging

public class CoGroupedRDD<K> extends RDD<scala.Tuple2<K,scala.collection.Iterable<Object>[]>>

:: DeveloperApi :: An RDD that cogroups its parents. For each key k in parent RDDs, the resulting RDD contains a tuple with the list of values for that key.

param: rdds parent RDDs. param: part partitioner used to partition the shuffle output

See Also:

Serialized Form

Note:

This is an internal API. We recommend users use RDD.cogroup(...) instead of instantiating this directly.

Nested Class Summary

Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging
org.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
Constructor Summary

Constructors

Constructor

Description

CoGroupedRDD(scala.collection.immutable.Seq<RDD<? extends scala.Product2<K,?>>> rdds, Partitioner part, scala.reflect.ClassTag<K> evidence$1)
Method Summary

Modifier and Type

Method

Description

void

clearDependencies()

scala.collection.Iterator<scala.Tuple2<K,scala.collection.Iterable<Object>[]>>

compute(Partition s, TaskContext context)

:: DeveloperApi :: Implemented by subclasses to compute a given partition.

scala.collection.immutable.Seq<Dependency<?>>

getDependencies()

Partition[]

getPartitions()

scala.Some<Partitioner>

partitioner()

Optionally overridden by subclasses to specify how they are partitioned.

scala.collection.immutable.Seq<RDD<? extends scala.Product2<K,?>>>

rdds()

CoGroupedRDD<K>

setSerializer(Serializer serializer)

Set a serializer for this RDD's shuffle, or null to use the default (spark.serializer)

Methods inherited from class org.apache.spark.rdd.RDD
aggregate, barrier, cache, cartesian, checkpoint, cleanShuffleDependencies, coalesce, collect, collect, context, count, countApprox, countApproxDistinct, countApproxDistinct, countByValue, countByValueApprox, dependencies, distinct, distinct, doubleRDDToDoubleRDDFunctions, filter, first, flatMap, fold, foreach, foreachPartition, getCheckpointFile, getNumPartitions, getResourceProfile, getStorageLevel, glom, groupBy, groupBy, groupBy, id, intersection, intersection, intersection, isCheckpointed, isEmpty, iterator, keyBy, localCheckpoint, map, mapPartitions, mapPartitionsWithEvaluator, mapPartitionsWithIndex, max, min, name, numericRDDToDoubleRDDFunctions, partitions, persist, persist, pipe, pipe, pipe, preferredLocations, randomSplit, rddToAsyncRDDActions, rddToOrderedRDDFunctions, rddToPairRDDFunctions, rddToSequenceFileRDDFunctions, reduce, repartition, sample, saveAsObjectFile, saveAsTextFile, saveAsTextFile, setName, sortBy, sparkContext, subtract, subtract, subtract, take, takeOrdered, takeSample, toDebugString, toJavaRDD, toLocalIterator, top, toString, treeAggregate, treeAggregate, treeReduce, union, unpersist, withResources, zip, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitionsWithEvaluator, zipWithIndex, zipWithUniqueId

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait

Methods inherited from interface org.apache.spark.internal.Logging
initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logBasedOnLevel, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, MDC, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContext

Constructor Details
- CoGroupedRDD
  
  public CoGroupedRDD(scala.collection.immutable.Seq<RDD<? extends scala.Product2<K,?>>> rdds, Partitioner part, scala.reflect.ClassTag<K> evidence$1)
Method Details
- clearDependencies
  
  public void clearDependencies()
- compute
  
  public scala.collection.Iterator<scala.Tuple2<K,scala.collection.Iterable<Object>[]>> compute(Partition s, TaskContext context)
  
  Description copied from class: RDD
  
  :: DeveloperApi :: Implemented by subclasses to compute a given partition.
  
  Specified by:
  
  compute in class RDD<scala.Tuple2<K,scala.collection.Iterable<Object>[]>>
  
  Parameters:
  
  s - (undocumented)
  
  context - (undocumented)
  
  Returns:
  
  (undocumented)
- getDependencies
  
  public scala.collection.immutable.Seq<Dependency<?>> getDependencies()
- getPartitions
  
  public Partition[] getPartitions()
- partitioner
  
  public scala.Some<Partitioner> partitioner()
  
  Description copied from class: RDD
  
  Optionally overridden by subclasses to specify how they are partitioned.
  
  Overrides:
  
  partitioner in class RDD<scala.Tuple2<K,scala.collection.Iterable<Object>[]>>
- rdds
  
  public scala.collection.immutable.Seq<RDD<? extends scala.Product2<K,?>>> rdds()
- setSerializer
  
  public CoGroupedRDD<K> setSerializer(Serializer serializer)
  
  Set a serializer for this RDD's shuffle, or null to use the default (spark.serializer)

Class CoGroupedRDD<K>

Nested Class Summary

Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging

Constructor Summary

Method Summary

Methods inherited from class org.apache.spark.rdd.RDD

Methods inherited from class java.lang.Object

Methods inherited from interface org.apache.spark.internal.Logging

Constructor Details

CoGroupedRDD

Method Details

clearDependencies

compute

getDependencies

getPartitions

partitioner

rdds

setSerializer