A RDD that cogroups its parents.
Coalesce the partitions of a parent RDD (prev
) into fewer partitions, so that each partition of
this RDD computes one or more of the parent ones.
An RDD that reads a Hadoop dataset as specified by a JobConf (e.
A RDD used to prune RDD partitions/partitions so we can avoid launching tasks on all partitions.
An RDD that pipes the contents of each parent partition through an external command (printing them one per line) and returns the output as a collection of strings.
Represents a dependency between the PartitionPruningRDD and its parent.
The resulting RDD from a shuffle (e.