package expressions
- Alphabetic
- Public
- All
Type Members
-
abstract
class
Aggregator[-IN, BUF, OUT] extends Serializable
A base class for user-defined aggregations, which can be used in
Dataset
operations to take all of the elements of a group and reduce them to a single value.A base class for user-defined aggregations, which can be used in
Dataset
operations to take all of the elements of a group and reduce them to a single value.For example, the following aggregator extracts an
int
from a specific class and adds them up:case class Data(i: Int) val customSummer = new Aggregator[Data, Int, Int] { def zero: Int = 0 def reduce(b: Int, a: Data): Int = b + a.i def merge(b1: Int, b2: Int): Int = b1 + b2 def finish(r: Int): Int = r def bufferEncoder: Encoder[Int] = Encoders.scalaInt def outputEncoder: Encoder[Int] = Encoders.scalaInt }.toColumn() val ds: Dataset[Data] = ... val aggregated = ds.select(customSummer)
Based loosely on Aggregator from Algebird: https://github.com/twitter/algebird
- IN
The input type for the aggregation.
- BUF
The type of the intermediate value of the reduction.
- OUT
The type of the final output result.
- Since
1.6.0
-
abstract
class
MutableAggregationBuffer extends Row
A
Row
representing a mutable aggregation buffer.A
Row
representing a mutable aggregation buffer.This is not meant to be extended outside of Spark.
- Annotations
- @Stable()
- Since
1.5.0
-
sealed abstract
class
UserDefinedFunction extends AnyRef
A user-defined function.
A user-defined function. To create one, use the
udf
functions infunctions
.As an example:
// Define a UDF that returns true or false based on some numeric score. val predict = udf((score: Double) => score > 0.5) // Projects a column that adds a prediction column based on the score column. df.select( predict(df("score")) )
- Annotations
- @Stable()
- Since
1.3.0
-
class
Window extends AnyRef
Utility functions for defining window in DataFrames.
Utility functions for defining window in DataFrames.
// PARTITION BY country ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW Window.partitionBy("country").orderBy("date") .rowsBetween(Window.unboundedPreceding, Window.currentRow) // PARTITION BY country ORDER BY date ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING Window.partitionBy("country").orderBy("date").rowsBetween(-3, 3)
- Annotations
- @Stable()
- Since
1.4.0
-
class
WindowSpec extends AnyRef
A window specification that defines the partitioning, ordering, and frame boundaries.
A window specification that defines the partitioning, ordering, and frame boundaries.
Use the static methods in Window to create a WindowSpec.
- Annotations
- @Stable()
- Since
1.4.0
-
abstract
class
UserDefinedAggregateFunction extends Serializable
The base class for implementing user-defined aggregate functions (UDAF).
The base class for implementing user-defined aggregate functions (UDAF).
- Annotations
- @Stable() @deprecated
- Deprecated
(Since version 3.0.0)
- Since
1.5.0
Value Members
-
object
Window
Utility functions for defining window in DataFrames.
Utility functions for defining window in DataFrames.
// PARTITION BY country ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW Window.partitionBy("country").orderBy("date") .rowsBetween(Window.unboundedPreceding, Window.currentRow) // PARTITION BY country ORDER BY date ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING Window.partitionBy("country").orderBy("date").rowsBetween(-3, 3)
- Annotations
- @Stable()
- Since
1.4.0
- Note
When ordering is not defined, an unbounded window frame (rowFrame, unboundedPreceding, unboundedFollowing) is used by default. When ordering is defined, a growing window frame (rangeFrame, unboundedPreceding, currentRow) is used by default.