Packages

p

org.apache.spark.sql

expressions

package expressions

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. abstract class Aggregator[-IN, BUF, OUT] extends Serializable

    A base class for user-defined aggregations, which can be used in Dataset operations to take all of the elements of a group and reduce them to a single value.

    A base class for user-defined aggregations, which can be used in Dataset operations to take all of the elements of a group and reduce them to a single value.

    For example, the following aggregator extracts an int from a specific class and adds them up:

    case class Data(i: Int)
    
    val customSummer =  new Aggregator[Data, Int, Int] {
      def zero: Int = 0
      def reduce(b: Int, a: Data): Int = b + a.i
      def merge(b1: Int, b2: Int): Int = b1 + b2
      def finish(r: Int): Int = r
      def bufferEncoder: Encoder[Int] = Encoders.scalaInt
      def outputEncoder: Encoder[Int] = Encoders.scalaInt
    }.toColumn()
    
    val ds: Dataset[Data] = ...
    val aggregated = ds.select(customSummer)

    Based loosely on Aggregator from Algebird: https://github.com/twitter/algebird

    IN

    The input type for the aggregation.

    BUF

    The type of the intermediate value of the reduction.

    OUT

    The type of the final output result.

    Since

    1.6.0

  2. abstract class MutableAggregationBuffer extends Row

    A Row representing a mutable aggregation buffer.

    A Row representing a mutable aggregation buffer.

    This is not meant to be extended outside of Spark.

    Annotations
    @Stable()
    Since

    1.5.0

  3. sealed abstract class UserDefinedFunction extends AnyRef

    A user-defined function.

    A user-defined function. To create one, use the udf functions in functions.

    As an example:

    // Define a UDF that returns true or false based on some numeric score.
    val predict = udf((score: Double) => score > 0.5)
    
    // Projects a column that adds a prediction column based on the score column.
    df.select( predict(df("score")) )
    Annotations
    @Stable()
    Since

    1.3.0

  4. class Window extends AnyRef

    Utility functions for defining window in DataFrames.

    Utility functions for defining window in DataFrames.

    // PARTITION BY country ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    Window.partitionBy("country").orderBy("date")
      .rowsBetween(Window.unboundedPreceding, Window.currentRow)
    
    // PARTITION BY country ORDER BY date ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING
    Window.partitionBy("country").orderBy("date").rowsBetween(-3, 3)
    Annotations
    @Stable()
    Since

    1.4.0

  5. class WindowSpec extends AnyRef

    A window specification that defines the partitioning, ordering, and frame boundaries.

    A window specification that defines the partitioning, ordering, and frame boundaries.

    Use the static methods in Window to create a WindowSpec.

    Annotations
    @Stable()
    Since

    1.4.0

  6. abstract class UserDefinedAggregateFunction extends Serializable

    The base class for implementing user-defined aggregate functions (UDAF).

    The base class for implementing user-defined aggregate functions (UDAF).

    Annotations
    @Stable() @deprecated
    Deprecated

    (Since version 3.0.0)

    Since

    1.5.0

Value Members

  1. object Window

    Utility functions for defining window in DataFrames.

    Utility functions for defining window in DataFrames.

    // PARTITION BY country ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    Window.partitionBy("country").orderBy("date")
      .rowsBetween(Window.unboundedPreceding, Window.currentRow)
    
    // PARTITION BY country ORDER BY date ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING
    Window.partitionBy("country").orderBy("date").rowsBetween(-3, 3)
    Annotations
    @Stable()
    Since

    1.4.0

    Note

    When ordering is not defined, an unbounded window frame (rowFrame, unboundedPreceding, unboundedFollowing) is used by default. When ordering is defined, a growing window frame (rangeFrame, unboundedPreceding, currentRow) is used by default.

Ungrouped