Package org.apache.spark.rdd
Class CoGroupedRDD<K>
Object
org.apache.spark.rdd.RDD<scala.Tuple2<K,scala.collection.Iterable<Object>[]>>
org.apache.spark.rdd.CoGroupedRDD<K>
- All Implemented Interfaces:
Serializable
,org.apache.spark.internal.Logging
:: DeveloperApi ::
An RDD that cogroups its parents. For each key k in parent RDDs, the resulting RDD contains a
tuple with the list of values for that key.
param: rdds parent RDDs. param: part partitioner used to partition the shuffle output
- See Also:
- Note:
- This is an internal API. We recommend users use RDD.cogroup(...) instead of instantiating this directly.
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging
org.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
-
Constructor Summary
ConstructorDescriptionCoGroupedRDD
(scala.collection.immutable.Seq<RDD<? extends scala.Product2<K, ?>>> rdds, Partitioner part, scala.reflect.ClassTag<K> evidence$1) -
Method Summary
Modifier and TypeMethodDescriptionvoid
compute
(Partition s, TaskContext context) :: DeveloperApi :: Implemented by subclasses to compute a given partition.scala.collection.immutable.Seq<Dependency<?>>
scala.Some<Partitioner>
Optionally overridden by subclasses to specify how they are partitioned.rdds()
setSerializer
(Serializer serializer) Set a serializer for this RDD's shuffle, or null to use the default (spark.serializer)Methods inherited from class org.apache.spark.rdd.RDD
aggregate, barrier, cache, cartesian, checkpoint, cleanShuffleDependencies, coalesce, collect, collect, context, count, countApprox, countApproxDistinct, countApproxDistinct, countByValue, countByValueApprox, dependencies, distinct, distinct, doubleRDDToDoubleRDDFunctions, filter, first, flatMap, fold, foreach, foreachPartition, getCheckpointFile, getNumPartitions, getResourceProfile, getStorageLevel, glom, groupBy, groupBy, groupBy, id, intersection, intersection, intersection, isCheckpointed, isEmpty, iterator, keyBy, localCheckpoint, map, mapPartitions, mapPartitionsWithEvaluator, mapPartitionsWithIndex, max, min, name, numericRDDToDoubleRDDFunctions, partitions, persist, persist, pipe, pipe, pipe, preferredLocations, randomSplit, rddToAsyncRDDActions, rddToOrderedRDDFunctions, rddToPairRDDFunctions, rddToSequenceFileRDDFunctions, reduce, repartition, sample, saveAsObjectFile, saveAsTextFile, saveAsTextFile, setName, sortBy, sparkContext, subtract, subtract, subtract, take, takeOrdered, takeSample, toDebugString, toJavaRDD, toLocalIterator, top, toString, treeAggregate, treeAggregate, treeReduce, union, unpersist, withResources, zip, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitionsWithEvaluator, zipWithIndex, zipWithUniqueId
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
Methods inherited from interface org.apache.spark.internal.Logging
initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContext
-
Constructor Details
-
CoGroupedRDD
public CoGroupedRDD(scala.collection.immutable.Seq<RDD<? extends scala.Product2<K, ?>>> rdds, Partitioner part, scala.reflect.ClassTag<K> evidence$1)
-
-
Method Details
-
clearDependencies
public void clearDependencies() -
compute
public scala.collection.Iterator<scala.Tuple2<K,scala.collection.Iterable<Object>[]>> compute(Partition s, TaskContext context) Description copied from class:RDD
:: DeveloperApi :: Implemented by subclasses to compute a given partition. -
getDependencies
-
getPartitions
-
partitioner
Description copied from class:RDD
Optionally overridden by subclasses to specify how they are partitioned.- Overrides:
partitioner
in classRDD<scala.Tuple2<K,
scala.collection.Iterable<Object>[]>>
-
rdds
-
setSerializer
Set a serializer for this RDD's shuffle, or null to use the default (spark.serializer)
-