pyspark.RDD.collectAsMap

RDD.collectAsMap()[source]

Return the key-value pairs in this RDD to the master as a dictionary.

Notes

This method should only be used if the resulting data is expected to be small, as all the data is loaded into the driver’s memory.

Examples

>>> m = sc.parallelize([(1, 2), (3, 4)]).collectAsMap()
>>> m[1]
2
>>> m[3]
4