K
- the key class.V
- the value class.C
- the combiner class.public class ShuffledRDD<K,V,C> extends RDD<scala.Tuple2<K,C>>
Constructor and Description |
---|
ShuffledRDD(RDD<? extends scala.Product2<K,V>> prev,
Partitioner part,
scala.reflect.ClassTag<K> evidence$1,
scala.reflect.ClassTag<V> evidence$2,
scala.reflect.ClassTag<C> evidence$3) |
Modifier and Type | Method and Description |
---|---|
void |
clearDependencies()
Clears the dependencies of this RDD.
|
scala.collection.Iterator<scala.Tuple2<K,C>> |
compute(Partition split,
TaskContext context)
:: DeveloperApi ::
Implemented by subclasses to compute a given partition.
|
scala.collection.Seq<Dependency<?>> |
getDependencies()
Implemented by subclasses to return how this RDD depends on parent RDDs.
|
Partition[] |
getPartitions()
Implemented by subclasses to return the set of partitions in this RDD.
|
scala.Some<Partitioner> |
partitioner()
Optionally overridden by subclasses to specify how they are partitioned.
|
RDD<? extends scala.Product2<K,V>> |
prev() |
ShuffledRDD<K,V,C> |
setAggregator(Aggregator<K,V,C> aggregator)
Set aggregator for RDD's shuffle.
|
ShuffledRDD<K,V,C> |
setKeyOrdering(scala.math.Ordering<K> keyOrdering)
Set key ordering for RDD's shuffle.
|
ShuffledRDD<K,V,C> |
setMapSideCombine(boolean mapSideCombine)
Set mapSideCombine flag for RDD's shuffle.
|
ShuffledRDD<K,V,C> |
setSerializer(Serializer serializer)
Set a serializer for this RDD's shuffle, or null to use the default (spark.serializer)
|
aggregate, barrier, cache, cartesian, checkpoint, cleanShuffleDependencies, coalesce, collect, collect, context, count, countApprox, countApproxDistinct, countApproxDistinct, countByValue, countByValueApprox, dependencies, distinct, distinct, doubleRDDToDoubleRDDFunctions, filter, first, flatMap, fold, foreach, foreachPartition, getCheckpointFile, getNumPartitions, getResourceProfile, getStorageLevel, glom, groupBy, groupBy, groupBy, id, intersection, intersection, intersection, isCheckpointed, isEmpty, iterator, keyBy, localCheckpoint, map, mapPartitions, mapPartitionsWithIndex, max, min, name, numericRDDToDoubleRDDFunctions, partitions, persist, persist, pipe, pipe, pipe, preferredLocations, randomSplit, rddToAsyncRDDActions, rddToOrderedRDDFunctions, rddToPairRDDFunctions, rddToSequenceFileRDDFunctions, reduce, repartition, sample, saveAsObjectFile, saveAsTextFile, saveAsTextFile, setName, sortBy, sparkContext, subtract, subtract, subtract, take, takeOrdered, takeSample, toDebugString, toJavaRDD, toLocalIterator, top, toString, treeAggregate, treeAggregate, treeReduce, union, unpersist, withResources, zip, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipWithIndex, zipWithUniqueId
$init$, initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, initLock, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$internal$Logging$$log__$eq, org$apache$spark$internal$Logging$$log_, uninitialize
public void clearDependencies()
RDD
UnionRDD
for an example.public scala.collection.Iterator<scala.Tuple2<K,C>> compute(Partition split, TaskContext context)
RDD
public scala.collection.Seq<Dependency<?>> getDependencies()
RDD
public Partition[] getPartitions()
RDD
The partitions in this array must satisfy the following property:
rdd.partitions.zipWithIndex.forall { case (partition, index) => partition.index == index }
public scala.Some<Partitioner> partitioner()
RDD
partitioner
in class RDD<scala.Tuple2<K,C>>
public ShuffledRDD<K,V,C> setAggregator(Aggregator<K,V,C> aggregator)
public ShuffledRDD<K,V,C> setKeyOrdering(scala.math.Ordering<K> keyOrdering)
public ShuffledRDD<K,V,C> setMapSideCombine(boolean mapSideCombine)
public ShuffledRDD<K,V,C> setSerializer(Serializer serializer)