pyspark.RDD.glom#

RDD.glom()[source]#

Return an RDD created by coalescing all elements within each partition into a list.

New in version 0.7.0.

Returns
RDD

a new RDD coalescing all elements within each partition into a list

Examples

>>> rdd = sc.parallelize([1, 2, 3, 4], 2)
>>> sorted(rdd.glom().collect())
[[1, 2], [3, 4]]