Spark 4.1.0-preview3 ScalaDoc < Back

Packages

package root
Definition Classes
root
package org
Definition Classes
root
package apache
Definition Classes
org
package spark
Core Spark functionality.
Core Spark functionality. org.apache.spark.SparkContext serves as the main entry point to Spark, while org.apache.spark.rdd.RDD is the data type representing a distributed collection, and provides most parallel operations.
In addition, org.apache.spark.rdd.PairRDDFunctions contains operations available only on RDDs of key-value pairs, such as groupByKey and join; org.apache.spark.rdd.DoubleRDDFunctions contains operations available only on RDDs of Doubles; and org.apache.spark.rdd.SequenceFileRDDFunctions contains operations available on RDDs that can be saved as SequenceFiles. These operations are automatically available on any RDD of the right type (e.g. RDD[(Int, Int)] through implicit conversions.
Java programmers should reference the org.apache.spark.api.java package for Spark programming APIs in Java.
Classes and methods marked with Experimental are user-facing features which have not been officially adopted by the Spark project. These are subject to change or removal in minor releases.
Classes and methods marked with Developer API are intended for advanced users want to extend Spark through lower level interfaces. These are subject to changes or removal in minor releases.
Definition Classes
apache
package sql
Allows the execution of relational queries, including those expressed in SQL using Spark.
Allows the execution of relational queries, including those expressed in SQL using Spark.
Definition Classes
spark
package api
Contains API classes that are specific to a single language (i.e.
Contains API classes that are specific to a single language (i.e. Java).
Definition Classes
sql
package catalog
Definition Classes
sql
package catalyst
Definition Classes
sql
package columnar
Definition Classes
sql
package connector
Definition Classes
sql
package expressions
Definition Classes
sql
package javalang
package scalalang
Aggregator
MutableAggregationBuffer
SparkUserDefinedFunction
UserDefinedAggregateFunction
UserDefinedFunction
Window
WindowSpec
package jdbc
Definition Classes
sql
package pipelines
Definition Classes
sql
package protobuf
Definition Classes
sql
package sources
A set of APIs for adding data sources to Spark SQL.
A set of APIs for adding data sources to Spark SQL.
Definition Classes
sql
package streaming
Definition Classes
sql
package types
Contains a type system for attributes produced by relations, including complex types like structs, arrays and maps.
Contains a type system for attributes produced by relations, including complex types like structs, arrays and maps.
Definition Classes
sql
package util
Definition Classes
sql
package vectorized
Definition Classes
sql

org.apache.spark.sql

expressions

package expressions

Ordering

Alphabetic

Visibility

Public
Protected

Package Members

package javalang
package scalalang

Type Members

abstract class Aggregator[-IN, BUF, OUT] extends Serializable with UserDefinedFunctionLike
A base class for user-defined aggregations, which can be used in Dataset operations to take all of the elements of a group and reduce them to a single value.
A base class for user-defined aggregations, which can be used in Dataset operations to take all of the elements of a group and reduce them to a single value.
For example, the following aggregator extracts an int from a specific class and adds them up:
```
case class Data(i: Int)

val customSummer =  new Aggregator[Data, Int, Int] {
  def zero: Int = 0
  def reduce(b: Int, a: Data): Int = b + a.i
  def merge(b1: Int, b2: Int): Int = b1 + b2
  def finish(r: Int): Int = r
  def bufferEncoder: Encoder[Int] = Encoders.scalaInt
  def outputEncoder: Encoder[Int] = Encoders.scalaInt
}.toColumn()

val ds: Dataset[Data] = ...
val aggregated = ds.select(customSummer)
```
Based loosely on Aggregator from Algebird: https://github.com/twitter/algebird
IN
The input type for the aggregation.
BUF
The type of the intermediate value of the reduction.
OUT
The type of the final output result.
Annotations
@SerialVersionUID()
Since
1.6.0
abstract class MutableAggregationBuffer extends Row
A Row representing a mutable aggregation buffer.
A Row representing a mutable aggregation buffer.
This is not meant to be extended outside of Spark.
Annotations
@Stable()
Since
1.5.0

sealed abstract class UserDefinedFunction extends UserDefinedFunctionLike

A user-defined function.

A user-defined function. To create one, use the udf functions in functions.

As an example:

// Define a UDF that returns true or false based on some numeric score.
val predict = udf((score: Double) => score > 0.5)

// Projects a column that adds a prediction column based on the score column.
df.select( predict(df("score")) )

Annotations: @Stable()
Since: 1.3.0

class Window extends AnyRef

Utility functions for defining window in DataFrames.

// PARTITION BY country ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
Window.partitionBy("country").orderBy("date")
  .rowsBetween(Window.unboundedPreceding, Window.currentRow)

// PARTITION BY country ORDER BY date ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING
Window.partitionBy("country").orderBy("date").rowsBetween(-3, 3)

Annotations: @Stable()
Since: 1.4.0

class WindowSpec extends AnyRef
A window specification that defines the partitioning, ordering, and frame boundaries.
A window specification that defines the partitioning, ordering, and frame boundaries.
Use the static methods in Window to create a WindowSpec.
Annotations
@Stable()
Since
1.4.0

Deprecated Type Members

abstract class UserDefinedAggregateFunction extends Serializable with UserDefinedFunctionLike
The base class for implementing user-defined aggregate functions (UDAF).
The base class for implementing user-defined aggregate functions (UDAF).
Annotations
@Stable() @deprecated
Deprecated
(Since version 3.0.0) Aggregator[IN, BUF, OUT] should now be registered as a UDF via the functions.udaf(agg) method.
Since
1.5.0

Value Members

object SparkUserDefinedFunction extends Serializable
object Window
Utility functions for defining window in DataFrames.
Utility functions for defining window in DataFrames.
```
// PARTITION BY country ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
Window.partitionBy("country").orderBy("date")
  .rowsBetween(Window.unboundedPreceding, Window.currentRow)

// PARTITION BY country ORDER BY date ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING
Window.partitionBy("country").orderBy("date").rowsBetween(-3, 3)
```
Annotations
@Stable()
Since
1.4.0
Note
When ordering is not defined, an unbounded window frame (rowFrame, unboundedPreceding, unboundedFollowing) is used by default. When ordering is defined, a growing window frame (rangeFrame, unboundedPreceding, currentRow) is used by default.