pyspark.RDD.countByValue

RDD.countByValue() → Dict[K, int][source]

Return the count of each unique value in this RDD as a dictionary of (value, count) pairs.

Examples

>>> sorted(sc.parallelize([1, 2, 1, 2, 2], 2).countByValue().items())
[(1, 2), (2, 3)]