Package

org.apache.spark.sql

expressions

Permalink

package expressions

Visibility
  1. Public
  2. All

Type Members

  1. abstract class Aggregator[-IN, BUF, OUT] extends Serializable

    Permalink

    :: Experimental :: A base class for user-defined aggregations, which can be used in Dataset operations to take all of the elements of a group and reduce them to a single value.

    :: Experimental :: A base class for user-defined aggregations, which can be used in Dataset operations to take all of the elements of a group and reduce them to a single value.

    For example, the following aggregator extracts an int from a specific class and adds them up:

    case class Data(i: Int)
    
    val customSummer =  new Aggregator[Data, Int, Int] {
      def zero: Int = 0
      def reduce(b: Int, a: Data): Int = b + a.i
      def merge(b1: Int, b2: Int): Int = b1 + b2
      def finish(r: Int): Int = r
    }.toColumn()
    
    val ds: Dataset[Data] = ...
    val aggregated = ds.select(customSummer)

    Based loosely on Aggregator from Algebird: https://github.com/twitter/algebird

    IN

    The input type for the aggregation.

    BUF

    The type of the intermediate value of the reduction.

    OUT

    The type of the final output result.

    Annotations
    @Experimental() @Evolving()
    Since

    1.6.0

  2. abstract class MutableAggregationBuffer extends Row

    Permalink

    A Row representing a mutable aggregation buffer.

    A Row representing a mutable aggregation buffer.

    This is not meant to be extended outside of Spark.

    Annotations
    @Stable()
    Since

    1.5.0

  3. abstract class UserDefinedAggregateFunction extends Serializable

    Permalink

    The base class for implementing user-defined aggregate functions (UDAF).

    The base class for implementing user-defined aggregate functions (UDAF).

    Annotations
    @Stable()
    Since

    1.5.0

  4. case class UserDefinedFunction(f: AnyRef, dataType: DataType, inputTypes: Option[Seq[DataType]]) extends Product with Serializable

    Permalink

    A user-defined function.

    A user-defined function. To create one, use the udf functions in functions.

    As an example:

    // Defined a UDF that returns true or false based on some numeric score.
    val predict = udf((score: Double) => if (score > 0.5) true else false)
    
    // Projects a column that adds a prediction column based on the score column.
    df.select( predict(df("score")) )
    Annotations
    @Stable()
    Since

    1.3.0

    Note

    The user-defined functions must be deterministic. Due to optimization, duplicate invocations may be eliminated or the function may even be invoked more times than it is present in the query.

  5. class Window extends AnyRef

    Permalink

    Utility functions for defining window in DataFrames.

    Utility functions for defining window in DataFrames.

    // PARTITION BY country ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    Window.partitionBy("country").orderBy("date").rowsBetween(Long.MinValue, 0)
    
    // PARTITION BY country ORDER BY date ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING
    Window.partitionBy("country").orderBy("date").rowsBetween(-3, 3)
    Annotations
    @Stable()
    Since

    1.4.0

  6. class WindowSpec extends AnyRef

    Permalink

    A window specification that defines the partitioning, ordering, and frame boundaries.

    A window specification that defines the partitioning, ordering, and frame boundaries.

    Use the static methods in Window to create a WindowSpec.

    Annotations
    @Stable()
    Since

    1.4.0

Value Members

  1. object Window

    Permalink

    Utility functions for defining window in DataFrames.

    Utility functions for defining window in DataFrames.

    // PARTITION BY country ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    Window.partitionBy("country").orderBy("date")
      .rowsBetween(Window.unboundedPreceding, Window.currentRow)
    
    // PARTITION BY country ORDER BY date ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING
    Window.partitionBy("country").orderBy("date").rowsBetween(-3, 3)
    Annotations
    @Stable()
    Since

    1.4.0

  2. package javalang

    Permalink
  3. package scalalang

    Permalink

Ungrouped