org.apache.spark.rdd
Class ShuffledRDD<K,V,C>

Object
  extended by org.apache.spark.rdd.RDD<scala.Tuple2<K,C>>
      extended by org.apache.spark.rdd.ShuffledRDD<K,V,C>
All Implemented Interfaces:
java.io.Serializable, Logging

public class ShuffledRDD<K,V,C>
extends RDD<scala.Tuple2<K,C>>

:: DeveloperApi :: The resulting RDD from a shuffle (e.g. repartitioning of data). param: prev the parent RDD. param: part the partitioner used to partition the RDD

See Also:
Serialized Form

Constructor Summary
ShuffledRDD(RDD<? extends scala.Product2<K,V>> prev, Partitioner part)
           
 
Method Summary
 void clearDependencies()
          Clears the dependencies of this RDD.
 scala.collection.Iterator<scala.Tuple2<K,C>> compute(Partition split, TaskContext context)
          :: DeveloperApi :: Implemented by subclasses to compute a given partition.
 scala.collection.Seq<Dependency<?>> getDependencies()
          Implemented by subclasses to return how this RDD depends on parent RDDs.
 Partition[] getPartitions()
          Implemented by subclasses to return the set of partitions in this RDD.
 scala.Some<Partitioner> partitioner()
          Optionally overridden by subclasses to specify how they are partitioned.
 Object prev()
           
 ShuffledRDD<K,V,C> setAggregator(Aggregator<K,V,C> aggregator)
          Set aggregator for RDD's shuffle.
 ShuffledRDD<K,V,C> setKeyOrdering(scala.math.Ordering<K> keyOrdering)
          Set key ordering for RDD's shuffle.
 ShuffledRDD<K,V,C> setMapSideCombine(boolean mapSideCombine)
          Set mapSideCombine flag for RDD's shuffle.
 ShuffledRDD<K,V,C> setSerializer(Serializer serializer)
          Set a serializer for this RDD's shuffle, or null to use the default (spark.serializer)
 
Methods inherited from class org.apache.spark.rdd.RDD
aggregate, cache, cartesian, checkpoint, checkpointData, coalesce, collect, collect, context, count, countApprox, countApproxDistinct, countApproxDistinct, countByValue, countByValueApprox, creationSite, dependencies, distinct, distinct, doubleRDDToDoubleRDDFunctions, filter, filterWith, first, flatMap, flatMapWith, fold, foreach, foreachPartition, foreachWith, getCheckpointFile, getStorageLevel, glom, groupBy, groupBy, groupBy, id, intersection, intersection, intersection, isCheckpointed, isEmpty, iterator, keyBy, map, mapPartitions, mapPartitionsWithContext, mapPartitionsWithIndex, mapPartitionsWithSplit, mapWith, max, min, name, numericRDDToDoubleRDDFunctions, partitions, persist, persist, pipe, pipe, pipe, preferredLocations, randomSplit, rddToAsyncRDDActions, rddToOrderedRDDFunctions, rddToPairRDDFunctions, rddToSequenceFileRDDFunctions, reduce, repartition, sample, saveAsObjectFile, saveAsTextFile, saveAsTextFile, scope, setName, sortBy, sparkContext, subtract, subtract, subtract, take, takeOrdered, takeSample, toArray, toDebugString, toJavaRDD, toLocalIterator, top, toString, treeAggregate, treeReduce, union, unpersist, zip, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipWithIndex, zipWithUniqueId
 
Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface org.apache.spark.Logging
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning
 

Constructor Detail

ShuffledRDD

public ShuffledRDD(RDD<? extends scala.Product2<K,V>> prev,
                   Partitioner part)
Method Detail

prev

public Object prev()

setSerializer

public ShuffledRDD<K,V,C> setSerializer(Serializer serializer)
Set a serializer for this RDD's shuffle, or null to use the default (spark.serializer)


setKeyOrdering

public ShuffledRDD<K,V,C> setKeyOrdering(scala.math.Ordering<K> keyOrdering)
Set key ordering for RDD's shuffle.


setAggregator

public ShuffledRDD<K,V,C> setAggregator(Aggregator<K,V,C> aggregator)
Set aggregator for RDD's shuffle.


setMapSideCombine

public ShuffledRDD<K,V,C> setMapSideCombine(boolean mapSideCombine)
Set mapSideCombine flag for RDD's shuffle.


getDependencies

public scala.collection.Seq<Dependency<?>> getDependencies()
Description copied from class: RDD
Implemented by subclasses to return how this RDD depends on parent RDDs. This method will only be called once, so it is safe to implement a time-consuming computation in it.

Returns:
(undocumented)

partitioner

public scala.Some<Partitioner> partitioner()
Description copied from class: RDD
Optionally overridden by subclasses to specify how they are partitioned.

Overrides:
partitioner in class RDD<scala.Tuple2<K,C>>

getPartitions

public Partition[] getPartitions()
Description copied from class: RDD
Implemented by subclasses to return the set of partitions in this RDD. This method will only be called once, so it is safe to implement a time-consuming computation in it.

Returns:
(undocumented)

compute

public scala.collection.Iterator<scala.Tuple2<K,C>> compute(Partition split,
                                                            TaskContext context)
Description copied from class: RDD
:: DeveloperApi :: Implemented by subclasses to compute a given partition.

Specified by:
compute in class RDD<scala.Tuple2<K,C>>
Parameters:
split - (undocumented)
context - (undocumented)
Returns:
(undocumented)

clearDependencies

public void clearDependencies()
Description copied from class: RDD
Clears the dependencies of this RDD. This method must ensure that all references to the original parent RDDs is removed to enable the parent RDDs to be garbage collected. Subclasses of RDD may override this method for implementing their own cleaning logic. See UnionRDD for an example.