Spark 3.2.1 ScalaDoc - org.apache.spark.sql.functions

final def !=(arg0: Any): Boolean

Definition Classes: AnyRef → Any

final def ##(): Int

Definition Classes: AnyRef → Any

final def ==(arg0: Any): Boolean

Definition Classes: AnyRef → Any

def abs(e: Column): Column

Computes the absolute value of a numeric value.

Since: 1.3.0

def acos(columnName: String): Column

returns: inverse cosine of columnName, as if computed by java.lang.Math.acos

Since: 1.4.0

def acos(e: Column): Column

returns: inverse cosine of e in radians, as if computed by java.lang.Math.acos

Since: 1.4.0

def acosh(columnName: String): Column

returns: inverse hyperbolic cosine of columnName

Since: 3.1.0

def acosh(e: Column): Column

returns: inverse hyperbolic cosine of e

Since: 3.1.0

def add_months(startDate: Column, numMonths: Column): Column

Returns the date that is numMonths after startDate.

startDate: A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
numMonths: A column of the number of months to add to startDate, can be negative to subtract months
returns: A date, or null if startDate was a string that could not be cast to a date

Since: 3.0.0

def add_months(startDate: Column, numMonths: Int): Column

Returns the date that is numMonths after startDate.

startDate: A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
numMonths: The number of months to add to startDate, can be negative to subtract months
returns: A date, or null if startDate was a string that could not be cast to a date

Since: 1.5.0

def aggregate(expr: Column, initialValue: Column, merge: (Column, Column) ⇒ Column): Column

Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state.

df.select(aggregate(col("i"), lit(0), (acc, x) => acc + x))

expr: the input array column
initialValue: the initial value
merge: (combined_value, input_value) => combined_value, the merge function to merge an input value to the combined_value

Since: 3.0.0

def aggregate(expr: Column, initialValue: Column, merge: (Column, Column) ⇒ Column, finish: (Column) ⇒ Column): Column

Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state.

Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state. The final state is converted into the final result by applying a finish function.

df.select(aggregate(col("i"), lit(0), (acc, x) => acc + x, _ * 10))

expr: the input array column
initialValue: the initial value
merge: (combined_value, input_value) => combined_value, the merge function to merge an input value to the combined_value
finish: combined_value => final_value, the lambda function to convert the combined value of all inputs to final result

Since: 3.0.0

def approx_count_distinct(columnName: String, rsd: Double): Column

Aggregate function: returns the approximate number of distinct items in a group.

rsd: maximum relative standard deviation allowed (default = 0.05)

Since: 2.1.0

def approx_count_distinct(e: Column, rsd: Double): Column

Aggregate function: returns the approximate number of distinct items in a group.

rsd: maximum relative standard deviation allowed (default = 0.05)

Since: 2.1.0

def approx_count_distinct(columnName: String): Column

Aggregate function: returns the approximate number of distinct items in a group.

Since: 2.1.0

def approx_count_distinct(e: Column): Column

Aggregate function: returns the approximate number of distinct items in a group.

Since: 2.1.0

def array(colName: String, colNames: String*): Column

Creates a new array column.

Creates a new array column. The input columns must all have the same data type.

Annotations: @varargs()
Since: 1.4.0

def array(cols: Column*): Column

Creates a new array column.

Creates a new array column. The input columns must all have the same data type.

Annotations: @varargs()
Since: 1.4.0

def array_contains(column: Column, value: Any): Column

Returns null if the array is null, true if the array contains value, and false otherwise.

Since: 1.5.0

def array_distinct(e: Column): Column

Removes duplicate values from the array.

Since: 2.4.0

def array_except(col1: Column, col2: Column): Column

Returns an array of the elements in the first array but not in the second array, without duplicates.

Returns an array of the elements in the first array but not in the second array, without duplicates. The order of elements in the result is not determined

Since: 2.4.0

def array_intersect(col1: Column, col2: Column): Column

Returns an array of the elements in the intersection of the given two arrays, without duplicates.

Since: 2.4.0

def array_join(column: Column, delimiter: String): Column

Concatenates the elements of column using the delimiter.

Since: 2.4.0

def array_join(column: Column, delimiter: String, nullReplacement: String): Column

Concatenates the elements of column using the delimiter.

Concatenates the elements of column using the delimiter. Null values are replaced with nullReplacement.

Since: 2.4.0

def array_max(e: Column): Column

Returns the maximum value in the array.

Returns the maximum value in the array. NaN is greater than any non-NaN elements for double/float type. NULL elements are skipped.

Since: 2.4.0

def array_min(e: Column): Column

Returns the minimum value in the array.

Returns the minimum value in the array. NaN is greater than any non-NaN elements for double/float type. NULL elements are skipped.

Since: 2.4.0

def array_position(column: Column, value: Any): Column

Locates the position of the first occurrence of the value in the given array as long.

Locates the position of the first occurrence of the value in the given array as long. Returns null if either of the arguments are null.

Since: 2.4.0
Note: The position is not zero based, but 1 based index. Returns 0 if value could not be found in array.

def array_remove(column: Column, element: Any): Column

Remove all elements that equal to element from the given array.

Since: 2.4.0

def array_repeat(e: Column, count: Int): Column

Creates an array containing the left argument repeated the number of times given by the right argument.

Since: 2.4.0

def array_repeat(left: Column, right: Column): Column

Creates an array containing the left argument repeated the number of times given by the right argument.

Since: 2.4.0

def array_sort(e: Column): Column

Sorts the input array in ascending order.

Sorts the input array in ascending order. The elements of the input array must be orderable. NaN is greater than any non-NaN elements for double/float type. Null elements will be placed at the end of the returned array.

Since: 2.4.0

def array_union(col1: Column, col2: Column): Column

Returns an array of the elements in the union of the given two arrays, without duplicates.

Since: 2.4.0

def arrays_overlap(a1: Column, a2: Column): Column

Returns true if a1 and a2 have at least one non-null element in common.

Returns true if a1 and a2 have at least one non-null element in common. If not and both the arrays are non-empty and any of them contains a null, it returns null. It returns false otherwise.

Since: 2.4.0

def arrays_zip(e: Column*): Column

Returns a merged array of structs in which the N-th struct contains all N-th values of input arrays.

Annotations: @varargs()
Since: 2.4.0

final def asInstanceOf[T0]: T0

Definition Classes: Any

def asc(columnName: String): Column

Returns a sort expression based on ascending order of the column.

df.sort(asc("dept"), desc("age"))

Since: 1.3.0

def asc_nulls_first(columnName: String): Column

Returns a sort expression based on ascending order of the column, and null values return before non-null values.

df.sort(asc_nulls_first("dept"), desc("age"))

Since: 2.1.0

def asc_nulls_last(columnName: String): Column

Returns a sort expression based on ascending order of the column, and null values appear after non-null values.

df.sort(asc_nulls_last("dept"), desc("age"))

Since: 2.1.0

def ascii(e: Column): Column

Computes the numeric value of the first character of the string column, and returns the result as an int column.

Since: 1.5.0

def asin(columnName: String): Column

returns: inverse sine of columnName, as if computed by java.lang.Math.asin

Since: 1.4.0

def asin(e: Column): Column

returns: inverse sine of e in radians, as if computed by java.lang.Math.asin

Since: 1.4.0

def asinh(columnName: String): Column

returns: inverse hyperbolic sine of columnName

Since: 3.1.0

def asinh(e: Column): Column

returns: inverse hyperbolic sine of e

Since: 3.1.0

def assert_true(c: Column, e: Column): Column

Returns null if the condition is true; throws an exception with the error message otherwise.

Since: 3.1.0

def assert_true(c: Column): Column

Returns null if the condition is true, and throws an exception otherwise.

Since: 3.1.0

def atan(columnName: String): Column

returns: inverse tangent of columnName, as if computed by java.lang.Math.atan

Since: 1.4.0

def atan(e: Column): Column

returns: inverse tangent of e as if computed by java.lang.Math.atan

Since: 1.4.0

def atan2(yValue: Double, xName: String): Column

yValue: coordinate on y-axis
xName: coordinate on x-axis
returns: the theta component of the point (r, theta) in polar coordinates that corresponds to the point (x, y) in Cartesian coordinates, as if computed by java.lang.Math.atan2

Since: 1.4.0

def atan2(yValue: Double, x: Column): Column

yValue: coordinate on y-axis
x: coordinate on x-axis
returns: the theta component of the point (r, theta) in polar coordinates that corresponds to the point (x, y) in Cartesian coordinates, as if computed by java.lang.Math.atan2

Since: 1.4.0

def atan2(yName: String, xValue: Double): Column

yName: coordinate on y-axis
xValue: coordinate on x-axis
returns: the theta component of the point (r, theta) in polar coordinates that corresponds to the point (x, y) in Cartesian coordinates, as if computed by java.lang.Math.atan2

Since: 1.4.0

def atan2(y: Column, xValue: Double): Column

y: coordinate on y-axis
xValue: coordinate on x-axis
returns: the theta component of the point (r, theta) in polar coordinates that corresponds to the point (x, y) in Cartesian coordinates, as if computed by java.lang.Math.atan2

Since: 1.4.0

def atan2(yName: String, xName: String): Column

yName: coordinate on y-axis
xName: coordinate on x-axis
returns: the theta component of the point (r, theta) in polar coordinates that corresponds to the point (x, y) in Cartesian coordinates, as if computed by java.lang.Math.atan2

Since: 1.4.0

def atan2(yName: String, x: Column): Column

yName: coordinate on y-axis
x: coordinate on x-axis
returns: the theta component of the point (r, theta) in polar coordinates that corresponds to the point (x, y) in Cartesian coordinates, as if computed by java.lang.Math.atan2

Since: 1.4.0

def atan2(y: Column, xName: String): Column

y: coordinate on y-axis
xName: coordinate on x-axis
returns: the theta component of the point (r, theta) in polar coordinates that corresponds to the point (x, y) in Cartesian coordinates, as if computed by java.lang.Math.atan2

Since: 1.4.0

def atan2(y: Column, x: Column): Column

y: coordinate on y-axis
x: coordinate on x-axis
returns: the theta component of the point (r, theta) in polar coordinates that corresponds to the point (x, y) in Cartesian coordinates, as if computed by java.lang.Math.atan2

Since: 1.4.0

def atanh(columnName: String): Column

returns: inverse hyperbolic tangent of columnName

Since: 3.1.0

def atanh(e: Column): Column

returns: inverse hyperbolic tangent of e

Since: 3.1.0

def avg(columnName: String): Column

Aggregate function: returns the average of the values in a group.

Since: 1.3.0

def avg(e: Column): Column

Aggregate function: returns the average of the values in a group.

Since: 1.3.0

def base64(e: Column): Column

Computes the BASE64 encoding of a binary column and returns it as a string column.

Computes the BASE64 encoding of a binary column and returns it as a string column. This is the reverse of unbase64.

Since: 1.5.0

def bin(columnName: String): Column

An expression that returns the string representation of the binary value of the given long column.

An expression that returns the string representation of the binary value of the given long column. For example, bin("12") returns "1100".

Since: 1.5.0

def bin(e: Column): Column

An expression that returns the string representation of the binary value of the given long column.

An expression that returns the string representation of the binary value of the given long column. For example, bin("12") returns "1100".

Since: 1.5.0

def bitwise_not(e: Column): Column

Computes bitwise NOT (~) of a number.

Since: 3.2.0

def broadcast[T](df: Dataset[T]): Dataset[T]

Marks a DataFrame as small enough for use in broadcast joins.

The following example marks the right DataFrame for broadcast hash join using joinKey.

// left and right are DataFrames
left.join(broadcast(right), "joinKey")

Since: 1.5.0

def bround(e: Column, scale: Int): Column

Round the value of e to scale decimal places with HALF_EVEN round mode if scale is greater than or equal to 0 or at integral part when scale is less than 0.

Since: 2.0.0

def bround(e: Column): Column

Returns the value of the column e rounded to 0 decimal places with HALF_EVEN round mode.

Since: 2.0.0

def bucket(numBuckets: Int, e: Column): Column

A transform for any type that partitions by a hash of the input column.

Since: 3.0.0

def bucket(numBuckets: Column, e: Column): Column

A transform for any type that partitions by a hash of the input column.

Since: 3.0.0

def call_udf(udfName: String, cols: Column*): Column

Call an user-defined function.

Call an user-defined function. Example:

import org.apache.spark.sql._

val df = Seq(("id1", 1), ("id2", 4), ("id3", 5)).toDF("id", "value")
val spark = df.sparkSession
spark.udf.register("simpleUDF", (v: Int) => v * v)
df.select($"id", call_udf("simpleUDF", $"value"))

Annotations: @varargs()
Since: 3.2.0

def cbrt(columnName: String): Column

Computes the cube-root of the given column.

Since: 1.4.0

def cbrt(e: Column): Column

Computes the cube-root of the given value.

Since: 1.4.0

def ceil(columnName: String): Column

Computes the ceiling of the given column.

Since: 1.4.0

def ceil(e: Column): Column

Computes the ceiling of the given value.

Since: 1.4.0

def clone(): AnyRef

Attributes: protected[lang]
Definition Classes: AnyRef
Annotations: @throws( ... ) @native()

def coalesce(e: Column*): Column

Returns the first column that is not null, or null if all inputs are null.

For example, coalesce(a, b, c) will return a if a is not null, or b if a is null and b is not null, or c if both a and b are null but c is not null.

Annotations: @varargs()
Since: 1.3.0

def col(colName: String): Column

Returns a Column based on the given column name.

Since: 1.3.0

def collect_list(columnName: String): Column

Aggregate function: returns a list of objects with duplicates.

Since: 1.6.0
Note: The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle.

def collect_list(e: Column): Column

Aggregate function: returns a list of objects with duplicates.

Since: 1.6.0
Note: The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle.

def collect_set(columnName: String): Column

Aggregate function: returns a set of objects with duplicate elements eliminated.

Since: 1.6.0
Note: The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle.

def collect_set(e: Column): Column

Aggregate function: returns a set of objects with duplicate elements eliminated.

Since: 1.6.0
Note: The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle.

def column(colName: String): Column

Returns a Column based on the given column name.

Returns a Column based on the given column name. Alias of col.

Since: 1.3.0

def concat(exprs: Column*): Column

Concatenates multiple input columns together into a single column.

Concatenates multiple input columns together into a single column. The function works with strings, binary and compatible array columns.

Annotations: @varargs()
Since: 1.5.0

def concat_ws(sep: String, exprs: Column*): Column

Concatenates multiple input string columns together into a single string column, using the given separator.

Annotations: @varargs()
Since: 1.5.0

def conv(num: Column, fromBase: Int, toBase: Int): Column

Convert a number in a string column from one base to another.

Since: 1.5.0

def corr(columnName1: String, columnName2: String): Column

Aggregate function: returns the Pearson Correlation Coefficient for two columns.

Since: 1.6.0

def corr(column1: Column, column2: Column): Column

Aggregate function: returns the Pearson Correlation Coefficient for two columns.

Since: 1.6.0

def cos(columnName: String): Column

columnName: angle in radians
returns: cosine of the angle, as if computed by java.lang.Math.cos

Since: 1.4.0

def cos(e: Column): Column

e: angle in radians
returns: cosine of the angle, as if computed by java.lang.Math.cos

Since: 1.4.0

def cosh(columnName: String): Column

columnName: hyperbolic angle
returns: hyperbolic cosine of the angle, as if computed by java.lang.Math.cosh

Since: 1.4.0

def cosh(e: Column): Column

e: hyperbolic angle
returns: hyperbolic cosine of the angle, as if computed by java.lang.Math.cosh

Since: 1.4.0

def count(columnName: String): TypedColumn[Any, Long]

Aggregate function: returns the number of items in a group.

Since: 1.3.0

def count(e: Column): Column

Aggregate function: returns the number of items in a group.

Since: 1.3.0

def countDistinct(columnName: String, columnNames: String*): Column

Aggregate function: returns the number of distinct items in a group.

An alias of count_distinct, and it is encouraged to use count_distinct directly.

Annotations: @varargs()
Since: 1.3.0

def countDistinct(expr: Column, exprs: Column*): Column

Aggregate function: returns the number of distinct items in a group.

An alias of count_distinct, and it is encouraged to use count_distinct directly.

Annotations: @varargs()
Since: 1.3.0

def count_distinct(expr: Column, exprs: Column*): Column

Aggregate function: returns the number of distinct items in a group.

Annotations: @varargs()
Since: 3.2.0

def covar_pop(columnName1: String, columnName2: String): Column

Aggregate function: returns the population covariance for two columns.

Since: 2.0.0

def covar_pop(column1: Column, column2: Column): Column

Aggregate function: returns the population covariance for two columns.

Since: 2.0.0

def covar_samp(columnName1: String, columnName2: String): Column

Aggregate function: returns the sample covariance for two columns.

Since: 2.0.0

def covar_samp(column1: Column, column2: Column): Column

Aggregate function: returns the sample covariance for two columns.

Since: 2.0.0

def crc32(e: Column): Column

Calculates the cyclic redundancy check value (CRC32) of a binary column and returns the value as a bigint.

Since: 1.5.0

def cume_dist(): Column

Window function: returns the cumulative distribution of values within a window partition, i.e.

Window function: returns the cumulative distribution of values within a window partition, i.e. the fraction of rows that are below the current row.

N = total number of rows in the partition
cumeDist(x) = number of values before (and including) x / N

Since: 1.6.0

def current_date(): Column

Returns the current date at the start of query evaluation as a date column.

Returns the current date at the start of query evaluation as a date column. All calls of current_date within the same query return the same value.

Since: 1.5.0

def current_timestamp(): Column

Returns the current timestamp at the start of query evaluation as a timestamp column.

Returns the current timestamp at the start of query evaluation as a timestamp column. All calls of current_timestamp within the same query return the same value.

Since: 1.5.0

def date_add(start: Column, days: Column): Column

Returns the date that is days days after start

start: A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
days: A column of the number of days to add to start, can be negative to subtract days
returns: A date, or null if start was a string that could not be cast to a date

Since: 3.0.0

def date_add(start: Column, days: Int): Column

Returns the date that is days days after start

start: A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
days: The number of days to add to start, can be negative to subtract days
returns: A date, or null if start was a string that could not be cast to a date

Since: 1.5.0

def date_format(dateExpr: Column, format: String): Column

Converts a date/timestamp/string to a value of string in the format specified by the date format given by the second argument.

See Datetime Patterns for valid date and time format patterns

dateExpr: A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
format: A pattern dd.MM.yyyy would return a string like 18.03.1993
returns: A string, or null if dateExpr was a string that could not be cast to a timestamp

Since: 1.5.0
Exceptions thrown: IllegalArgumentException if the format pattern is invalid
Note: Use specialized functions like year whenever possible as they benefit from a specialized implementation.

def date_sub(start: Column, days: Column): Column

Returns the date that is days days before start

start: A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
days: A column of the number of days to subtract from start, can be negative to add days
returns: A date, or null if start was a string that could not be cast to a date

Since: 3.0.0

def date_sub(start: Column, days: Int): Column

Returns the date that is days days before start

start: A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
days: The number of days to subtract from start, can be negative to add days
returns: A date, or null if start was a string that could not be cast to a date

Since: 1.5.0

def date_trunc(format: String, timestamp: Column): Column

Returns timestamp truncated to the unit specified by the format.

For example, date_trunc("year", "2018-11-19 12:01:19") returns 2018-01-01 00:00:00

timestamp: A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
returns: A timestamp, or null if timestamp was a string that could not be cast to a timestamp or format was an invalid value

Since: 2.3.0

def datediff(end: Column, start: Column): Column

Returns the number of days from start to end.

Only considers the date part of the input. For example:

dateddiff("2018-01-10 00:00:00", "2018-01-09 23:59:59")
// returns 1

end: A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
start: A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
returns: An integer, or null if either end or start were strings that could not be cast to a date. Negative if end is before start

Since: 1.5.0

def dayofmonth(e: Column): Column

Extracts the day of the month as an integer from a given date/timestamp/string.

returns: An integer, or null if the input was a string that could not be cast to a date

Since: 1.5.0

def dayofweek(e: Column): Column

Extracts the day of the week as an integer from a given date/timestamp/string.

Extracts the day of the week as an integer from a given date/timestamp/string. Ranges from 1 for a Sunday through to 7 for a Saturday

returns: An integer, or null if the input was a string that could not be cast to a date

Since: 2.3.0

def dayofyear(e: Column): Column

Extracts the day of the year as an integer from a given date/timestamp/string.

returns: An integer, or null if the input was a string that could not be cast to a date

Since: 1.5.0

def days(e: Column): Column

A transform for timestamps and dates to partition data into days.

Since: 3.0.0

def decode(value: Column, charset: String): Column

Computes the first argument into a string from a binary using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16').

Computes the first argument into a string from a binary using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'). If either argument is null, the result will also be null.

Since: 1.5.0

def degrees(columnName: String): Column

Converts an angle measured in radians to an approximately equivalent angle measured in degrees.

columnName: angle in radians
returns: angle in degrees, as if computed by java.lang.Math.toDegrees

Since: 2.1.0

def degrees(e: Column): Column

Converts an angle measured in radians to an approximately equivalent angle measured in degrees.

e: angle in radians
returns: angle in degrees, as if computed by java.lang.Math.toDegrees

Since: 2.1.0

def dense_rank(): Column

Window function: returns the rank of rows within a window partition, without any gaps.

The difference between rank and dense_rank is that denseRank leaves no gaps in ranking sequence when there are ties. That is, if you were ranking a competition using dense_rank and had three people tie for second place, you would say that all three were in second place and that the next person came in third. Rank would give me sequential numbers, making the person that came in third place (after the ties) would register as coming in fifth.

This is equivalent to the DENSE_RANK function in SQL.

Since: 1.6.0

def desc(columnName: String): Column

Returns a sort expression based on the descending order of the column.

df.sort(asc("dept"), desc("age"))

Since: 1.3.0

def desc_nulls_first(columnName: String): Column

Returns a sort expression based on the descending order of the column, and null values appear before non-null values.

df.sort(asc("dept"), desc_nulls_first("age"))

Since: 2.1.0

def desc_nulls_last(columnName: String): Column

Returns a sort expression based on the descending order of the column, and null values appear after non-null values.

df.sort(asc("dept"), desc_nulls_last("age"))

Since: 2.1.0

def element_at(column: Column, value: Any): Column

Returns element of array at given index in value if column is array.

Returns element of array at given index in value if column is array. Returns value for the given key in value if column is map.

Since: 2.4.0

def encode(value: Column, charset: String): Column

Computes the first argument into a binary from a string using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16').

Computes the first argument into a binary from a string using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'). If either argument is null, the result will also be null.

Since: 1.5.0

final def eq(arg0: AnyRef): Boolean

Definition Classes: AnyRef

def equals(arg0: Any): Boolean

Definition Classes: AnyRef → Any

def exists(column: Column, f: (Column) ⇒ Column): Column

Returns whether a predicate holds for one or more elements in the array.

df.select(exists(col("i"), _ % 2 === 0))

column: the input array column
f: col => predicate, the Boolean predicate to check the input column

Since: 3.0.0

def exp(columnName: String): Column

Computes the exponential of the given column.

Since: 1.4.0

def exp(e: Column): Column

Computes the exponential of the given value.

Since: 1.4.0

def explode(e: Column): Column

Creates a new row for each element in the given array or map column.

Creates a new row for each element in the given array or map column. Uses the default column name col for elements in the array and key and value for elements in the map unless specified otherwise.

Since: 1.3.0

def explode_outer(e: Column): Column

Creates a new row for each element in the given array or map column.

Creates a new row for each element in the given array or map column. Uses the default column name col for elements in the array and key and value for elements in the map unless specified otherwise. Unlike explode, if the array/map is null or empty then null is produced.

Since: 2.2.0

def expm1(columnName: String): Column

Computes the exponential of the given column minus one.

Since: 1.4.0

def expm1(e: Column): Column

Computes the exponential of the given value minus one.

Since: 1.4.0

def expr(expr: String): Column

Parses the expression string into the column that it represents, similar to Dataset#selectExpr.

// get the number of words of each length
df.groupBy(expr("length(word)")).count()

def factorial(e: Column): Column

Computes the factorial of the given value.

Since: 1.5.0

def filter(column: Column, f: (Column, Column) ⇒ Column): Column

Returns an array of elements for which a predicate holds in a given array.

df.select(filter(col("s"), (x, i) => i % 2 === 0))

column: the input array column
f: (col, index) => predicate, the Boolean predicate to filter the input column given the index. Indices start at 0.

Since: 3.0.0

def filter(column: Column, f: (Column) ⇒ Column): Column

Returns an array of elements for which a predicate holds in a given array.

df.select(filter(col("s"), x => x % 2 === 0))

column: the input array column
f: col => predicate, the Boolean predicate to filter the input column

Since: 3.0.0

def finalize(): Unit

Attributes: protected[lang]
Definition Classes: AnyRef
Annotations: @throws( classOf[java.lang.Throwable] )

def first(columnName: String): Column

Aggregate function: returns the first value of a column in a group.

The function by default returns the first values it sees. It will return the first non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.

Since: 1.3.0
Note: The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.

def first(e: Column): Column

Aggregate function: returns the first value in a group.

The function by default returns the first values it sees. It will return the first non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.

Since: 1.3.0
Note: The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.

def first(columnName: String, ignoreNulls: Boolean): Column

Aggregate function: returns the first value of a column in a group.

The function by default returns the first values it sees. It will return the first non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.

Since: 2.0.0
Note: The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.

def first(e: Column, ignoreNulls: Boolean): Column

Aggregate function: returns the first value in a group.

The function by default returns the first values it sees. It will return the first non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.

Since: 2.0.0
Note: The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.

def flatten(e: Column): Column

Creates a single array from an array of arrays.

Creates a single array from an array of arrays. If a structure of nested arrays is deeper than two levels, only one level of nesting is removed.

Since: 2.4.0

def floor(columnName: String): Column

Computes the floor of the given column.

Since: 1.4.0

def floor(e: Column): Column

Computes the floor of the given value.

Since: 1.4.0

def forall(column: Column, f: (Column) ⇒ Column): Column

Returns whether a predicate holds for every element in the array.

df.select(forall(col("i"), x => x % 2 === 0))

column: the input array column
f: col => predicate, the Boolean predicate to check the input column

Since: 3.0.0

def format_number(x: Column, d: Int): Column

Formats numeric column x to a format like '#,###,###.##', rounded to d decimal places with HALF_EVEN round mode, and returns the result as a string column.

If d is 0, the result has no decimal point or fractional part. If d is less than 0, the result will be null.

Since: 1.5.0

def format_string(format: String, arguments: Column*): Column

Formats the arguments in printf-style and returns the result as a string column.

Annotations: @varargs()
Since: 1.5.0

def from_csv(e: Column, schema: Column, options: Map[String, String]): Column

(Java-specific) Parses a column containing a CSV string into a StructType with the specified schema.

(Java-specific) Parses a column containing a CSV string into a StructType with the specified schema. Returns null, in the case of an unparseable string.

e: a string column containing CSV data.
schema: the schema to use when parsing the CSV string
options: options to control how the CSV is parsed. accepts the same options and the CSV data source. See Data Source Option in the version you use.

Since: 3.0.0

def from_csv(e: Column, schema: StructType, options: Map[String, String]): Column

Parses a column containing a CSV string into a StructType with the specified schema.

Parses a column containing a CSV string into a StructType with the specified schema. Returns null, in the case of an unparseable string.

e: a string column containing CSV data.
schema: the schema to use when parsing the CSV string
options: options to control how the CSV is parsed. accepts the same options and the CSV data source. See Data Source Option in the version you use.

Since: 3.0.0

def from_json(e: Column, schema: Column, options: Map[String, String]): Column

(Java-specific) Parses a column containing a JSON string into a MapType with StringType as keys type, StructType or ArrayType of StructTypes with the specified schema.

(Java-specific) Parses a column containing a JSON string into a MapType with StringType as keys type, StructType or ArrayType of StructTypes with the specified schema. Returns null, in the case of an unparseable string.

e: a string column containing JSON data.
schema: the schema to use when parsing the json string
options: options to control how the json is parsed. accepts the same options and the json data source. See Data Source Option in the version you use.

Since: 2.4.0

def from_json(e: Column, schema: Column): Column

(Scala-specific) Parses a column containing a JSON string into a MapType with StringType as keys type, StructType or ArrayType of StructTypes with the specified schema.

(Scala-specific) Parses a column containing a JSON string into a MapType with StringType as keys type, StructType or ArrayType of StructTypes with the specified schema. Returns null, in the case of an unparseable string.

e: a string column containing JSON data.
schema: the schema to use when parsing the json string

Since: 2.4.0

def from_json(e: Column, schema: String, options: Map[String, String]): Column

(Scala-specific) Parses a column containing a JSON string into a MapType with StringType as keys type, StructType or ArrayType with the specified schema.

(Scala-specific) Parses a column containing a JSON string into a MapType with StringType as keys type, StructType or ArrayType with the specified schema. Returns null, in the case of an unparseable string.

e: a string column containing JSON data.
schema: the schema as a DDL-formatted string.
options: options to control how the json is parsed. accepts the same options and the json data source. See Data Source Option in the version you use.

Since: 2.3.0

def from_json(e: Column, schema: String, options: Map[String, String]): Column

(Java-specific) Parses a column containing a JSON string into a MapType with StringType as keys type, StructType or ArrayType with the specified schema.

(Java-specific) Parses a column containing a JSON string into a MapType with StringType as keys type, StructType or ArrayType with the specified schema. Returns null, in the case of an unparseable string.

e: a string column containing JSON data.
schema: the schema as a DDL-formatted string.
options: options to control how the json is parsed. accepts the same options and the json data source. See Data Source Option in the version you use.

Since: 2.1.0

def from_json(e: Column, schema: DataType): Column

Parses a column containing a JSON string into a MapType with StringType as keys type, StructType or ArrayType with the specified schema.

Parses a column containing a JSON string into a MapType with StringType as keys type, StructType or ArrayType with the specified schema. Returns null, in the case of an unparseable string.

e: a string column containing JSON data.
schema: the schema to use when parsing the json string

Since: 2.2.0

def from_json(e: Column, schema: StructType): Column

Parses a column containing a JSON string into a StructType with the specified schema.

Parses a column containing a JSON string into a StructType with the specified schema. Returns null, in the case of an unparseable string.

e: a string column containing JSON data.
schema: the schema to use when parsing the json string

Since: 2.1.0

def from_json(e: Column, schema: DataType, options: Map[String, String]): Column

(Java-specific) Parses a column containing a JSON string into a MapType with StringType as keys type, StructType or ArrayType with the specified schema.

(Java-specific) Parses a column containing a JSON string into a MapType with StringType as keys type, StructType or ArrayType with the specified schema. Returns null, in the case of an unparseable string.

e: a string column containing JSON data.
schema: the schema to use when parsing the json string
options: options to control how the json is parsed. accepts the same options and the json data source. See Data Source Option in the version you use.

Since: 2.2.0

def from_json(e: Column, schema: StructType, options: Map[String, String]): Column

(Java-specific) Parses a column containing a JSON string into a StructType with the specified schema.

(Java-specific) Parses a column containing a JSON string into a StructType with the specified schema. Returns null, in the case of an unparseable string.

e: a string column containing JSON data.
schema: the schema to use when parsing the json string
options: options to control how the json is parsed. accepts the same options and the json data source. See Data Source Option in the version you use.

Since: 2.1.0

def from_json(e: Column, schema: DataType, options: Map[String, String]): Column

(Scala-specific) Parses a column containing a JSON string into a MapType with StringType as keys type, StructType or ArrayType with the specified schema.

(Scala-specific) Parses a column containing a JSON string into a MapType with StringType as keys type, StructType or ArrayType with the specified schema. Returns null, in the case of an unparseable string.

e: a string column containing JSON data.
schema: the schema to use when parsing the json string
options: options to control how the json is parsed. accepts the same options and the json data source. See Data Source Option in the version you use.

Since: 2.2.0

def from_json(e: Column, schema: StructType, options: Map[String, String]): Column

(Scala-specific) Parses a column containing a JSON string into a StructType with the specified schema.

(Scala-specific) Parses a column containing a JSON string into a StructType with the specified schema. Returns null, in the case of an unparseable string.

e: a string column containing JSON data.
schema: the schema to use when parsing the json string
options: options to control how the json is parsed. Accepts the same options as the json data source. See Data Source Option in the version you use.

Since: 2.1.0

def from_unixtime(ut: Column, f: String): Column

Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the given format.

See Datetime Patterns for valid date and time format patterns

ut: A number of a type that is castable to a long, such as string or integer. Can be negative for timestamps before the unix epoch
f: A date time pattern that the input will be formatted to
returns: A string, or null if ut was a string that could not be cast to a long or f was an invalid date time pattern

Since: 1.5.0

def from_unixtime(ut: Column): Column

Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the yyyy-MM-dd HH:mm:ss format.

ut: A number of a type that is castable to a long, such as string or integer. Can be negative for timestamps before the unix epoch
returns: A string, or null if the input was a string that could not be cast to a long

Since: 1.5.0

def from_utc_timestamp(ts: Column, tz: Column): Column

Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders that time as a timestamp in the given time zone.

Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders that time as a timestamp in the given time zone. For example, 'GMT+1' would yield '2017-07-14 03:40:00.0'.

Since: 2.4.0

def from_utc_timestamp(ts: Column, tz: String): Column

Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders that time as a timestamp in the given time zone.

Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders that time as a timestamp in the given time zone. For example, 'GMT+1' would yield '2017-07-14 03:40:00.0'.

ts: A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
tz: A string detailing the time zone ID that the input should be adjusted to. It should be in the format of either region-based zone IDs or zone offsets. Region IDs must have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'. Other short names are not recommended to use because they can be ambiguous.
returns: A timestamp, or null if ts was a string that could not be cast to a timestamp or tz was an invalid value

Since: 1.5.0

final def getClass(): Class[_]

Definition Classes: AnyRef → Any
Annotations: @native()

def get_json_object(e: Column, path: String): Column

Extracts json object from a json string based on json path specified, and returns json string of the extracted json object.

Extracts json object from a json string based on json path specified, and returns json string of the extracted json object. It will return null if the input json string is invalid.

Since: 1.6.0

def greatest(columnName: String, columnNames: String*): Column

Returns the greatest value of the list of column names, skipping null values.

Returns the greatest value of the list of column names, skipping null values. This function takes at least 2 parameters. It will return null iff all parameters are null.

Annotations: @varargs()
Since: 1.5.0

def greatest(exprs: Column*): Column

Returns the greatest value of the list of values, skipping null values.

Returns the greatest value of the list of values, skipping null values. This function takes at least 2 parameters. It will return null iff all parameters are null.

Annotations: @varargs()
Since: 1.5.0

def grouping(columnName: String): Column

Aggregate function: indicates whether a specified column in a GROUP BY list is aggregated or not, returns 1 for aggregated or 0 for not aggregated in the result set.

Since: 2.0.0

def grouping(e: Column): Column

Aggregate function: indicates whether a specified column in a GROUP BY list is aggregated or not, returns 1 for aggregated or 0 for not aggregated in the result set.

Since: 2.0.0

def grouping_id(colName: String, colNames: String*): Column

Aggregate function: returns the level of grouping, equals to

(grouping(c1) <<; (n-1)) + (grouping(c2) <<; (n-2)) + ... + grouping(cn)

Since: 2.0.0
Note: The list of columns should match with grouping columns exactly.

def grouping_id(cols: Column*): Column

Aggregate function: returns the level of grouping, equals to

(grouping(c1) <<; (n-1)) + (grouping(c2) <<; (n-2)) + ... + grouping(cn)

Since: 2.0.0
Note: The list of columns should match with grouping columns exactly, or empty (means all the grouping columns).

def hash(cols: Column*): Column

Calculates the hash code of given columns, and returns the result as an int column.

Annotations: @varargs()
Since: 2.0.0

def hashCode(): Int

Definition Classes: AnyRef → Any
Annotations: @native()

def hex(column: Column): Column

Computes hex value of the given column.

Since: 1.5.0

def hour(e: Column): Column

Extracts the hours as an integer from a given date/timestamp/string.

returns: An integer, or null if the input was a string that could not be cast to a date

Since: 1.5.0

def hours(e: Column): Column

A transform for timestamps to partition data into hours.

Since: 3.0.0

def hypot(l: Double, rightName: String): Column