IN
- The input type for the aggregation.BUF
- The type of the intermediate value of the reduction.OUT
- The type of the final output result.public abstract class Aggregator<IN,BUF,OUT>
extends Object
implements scala.Serializable
Dataset
operations to take
all of the elements of a group and reduce them to a single value.
For example, the following aggregator extracts an int
from a specific class and adds them up:
case class Data(i: Int)
val customSummer = new Aggregator[Data, Int, Int] {
def zero: Int = 0
def reduce(b: Int, a: Data): Int = b + a.i
def merge(b1: Int, b2: Int): Int = b1 + b2
def finish(r: Int): Int = r
def bufferEncoder: Encoder[Int] = Encoders.scalaInt
def outputEncoder: Encoder[Int] = Encoders.scalaInt
}.toColumn()
val ds: Dataset[Data] = ...
val aggregated = ds.select(customSummer)
Based loosely on Aggregator from Algebird: https://github.com/twitter/algebird
Constructor and Description |
---|
Aggregator() |
Modifier and Type | Method and Description |
---|---|
abstract Encoder<BUF> |
bufferEncoder()
Specifies the
Encoder for the intermediate value type. |
abstract OUT |
finish(BUF reduction)
Transform the output of the reduction.
|
abstract BUF |
merge(BUF b1,
BUF b2)
Merge two intermediate values.
|
abstract Encoder<OUT> |
outputEncoder()
Specifies the
Encoder for the final output value type. |
abstract BUF |
reduce(BUF b,
IN a)
Combine two values to produce a new value.
|
TypedColumn<IN,OUT> |
toColumn()
Returns this
Aggregator as a TypedColumn that can be used in Dataset . |
abstract BUF |
zero()
A zero value for this aggregation.
|
public abstract Encoder<BUF> bufferEncoder()
Encoder
for the intermediate value type.public abstract OUT finish(BUF reduction)
reduction
- (undocumented)public abstract BUF merge(BUF b1, BUF b2)
b1
- (undocumented)b2
- (undocumented)public abstract Encoder<OUT> outputEncoder()
Encoder
for the final output value type.public abstract BUF reduce(BUF b, IN a)
b
and
return it instead of constructing new object for b.b
- (undocumented)a
- (undocumented)public TypedColumn<IN,OUT> toColumn()
Aggregator
as a TypedColumn
that can be used in Dataset
.
operations.public abstract BUF zero()