Coalesce the partitions of a parent RDD (prev
) into fewer partitions, so that each partition of
this RDD computes one or more of the parent ones.
An RDD that reads a Hadoop dataset as specified by a JobConf (e.
An RDD that pipes the contents of each parent partition through an external command (printing them one per line) and returns the output as a collection of strings.
Repartition a key-value pair RDD.
The resulting RDD from shuffle and running (hash-based) aggregation.
The resulting RDD from a shuffle (e.
A sort-based shuffle (that doesn't apply aggregation).