pyspark.RDD.aggregate#
- RDD.aggregate(zeroValue, seqOp, combOp)[source]#
- Aggregate the elements of each partition, and then the results for all the partitions, using a given combine functions and a neutral “zero value.” - The functions - op(t1, t2)is allowed to modify- t1and return it as its result value to avoid object allocation; however, it should not modify- t2.- The first function (seqOp) can return a different result type, U, than the type of this RDD. Thus, we need one operation for merging a T into an U and one operation for merging two U - New in version 1.1.0. - Parameters
- zeroValueU
- the initial value for the accumulated result of each partition 
- seqOpfunction
- a function used to accumulate results within a partition 
- combOpfunction
- an associative function used to combine results from different partitions 
 
- Returns
- U
- the aggregated result 
 
 - See also - Examples - >>> seqOp = (lambda x, y: (x[0] + y, x[1] + 1)) >>> combOp = (lambda x, y: (x[0] + y[0], x[1] + y[1])) >>> sc.parallelize([1, 2, 3, 4]).aggregate((0, 0), seqOp, combOp) (10, 4) >>> sc.parallelize([]).aggregate((0, 0), seqOp, combOp) (0, 0)