Package org.apache.spark.sql.expressions
Class Aggregator<IN,BUF,OUT>
Object
org.apache.spark.sql.expressions.Aggregator<IN,BUF,OUT>
- All Implemented Interfaces:
Serializable,org.apache.spark.sql.internal.UserDefinedFunctionLike
public abstract class Aggregator<IN,BUF,OUT>
extends Object
implements Serializable, org.apache.spark.sql.internal.UserDefinedFunctionLike
A base class for user-defined aggregations, which can be used in
Dataset operations to take
all of the elements of a group and reduce them to a single value.
For example, the following aggregator extracts an int from a specific class and adds them up:
case class Data(i: Int)
val customSummer = new Aggregator[Data, Int, Int] {
def zero: Int = 0
def reduce(b: Int, a: Data): Int = b + a.i
def merge(b1: Int, b2: Int): Int = b1 + b2
def finish(r: Int): Int = r
def bufferEncoder: Encoder[Int] = Encoders.scalaInt
def outputEncoder: Encoder[Int] = Encoders.scalaInt
}.toColumn()
val ds: Dataset[Data] = ...
val aggregated = ds.select(customSummer)
Based loosely on Aggregator from Algebird: https://github.com/twitter/algebird
- Since:
- 1.6.0
- See Also:
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionSpecifies theEncoderfor the intermediate value type.abstract OUTTransform the output of the reduction.abstract BUFMerge two intermediate values.Specifies theEncoderfor the final output value type.abstract BUFCombine two values to produce a new value.toColumn()Returns thisAggregatoras aTypedColumnthat can be used inDatasetoperations.abstract BUFzero()A zero value for this aggregation.Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.spark.sql.internal.UserDefinedFunctionLike
name
-
Constructor Details
-
Aggregator
public Aggregator()
-
-
Method Details
-
bufferEncoder
Specifies theEncoderfor the intermediate value type.- Returns:
- (undocumented)
- Since:
- 2.0.0
-
finish
Transform the output of the reduction.- Parameters:
reduction- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.6.0
-
merge
Merge two intermediate values.- Parameters:
b1- (undocumented)b2- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.6.0
-
outputEncoder
Specifies theEncoderfor the final output value type.- Returns:
- (undocumented)
- Since:
- 2.0.0
-
reduce
Combine two values to produce a new value. For performance, the function may modifyband return it instead of constructing new object for b.- Parameters:
b- (undocumented)a- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.6.0
-
toColumn
Returns thisAggregatoras aTypedColumnthat can be used inDatasetoperations.- Returns:
- (undocumented)
- Since:
- 1.6.0
-
zero
A zero value for this aggregation. Should satisfy the property that any b + zero = b.- Returns:
- (undocumented)
- Since:
- 1.6.0
-