public class functions
extends Object
Spark also includes more built-in functions that are less common and are not defined here.
You can still access them (and all the functions defined here) using the functions.expr()
API
and calling them through a SQL expression string. You can find the entire list of functions
at SQL API documentation of your Spark version, see also
the latest list
As an example, isnan
is a function that is defined here. You can use isnan(col("myCol"))
to invoke the isnan
function. This way the programming language's compiler ensures isnan
exists and is of the proper form. You can also use expr("isnan(myCol)")
function to invoke the
same function. In this case, Spark itself will ensure isnan
exists when it analyzes the query.
regr_count
is an example of a function that is built-in but not defined here, because it is
less commonly used. To invoke it, use expr("regr_count(yCol, xCol)")
.
This function APIs usually have methods with Column
signature only because it can support not
only Column
but also other types such as a native string. The other variants currently exist
for historical reasons.
Constructor and Description |
---|
functions() |
Modifier and Type | Method and Description |
---|---|
static Column |
abs(Column e)
Computes the absolute value of a numeric value.
|
static Column |
acos(Column e) |
static Column |
acos(String columnName) |
static Column |
acosh(Column e) |
static Column |
acosh(String columnName) |
static Column |
add_months(Column startDate,
Column numMonths)
Returns the date that is
numMonths after startDate . |
static Column |
add_months(Column startDate,
int numMonths)
Returns the date that is
numMonths after startDate . |
static Column |
aes_decrypt(Column input,
Column key)
Returns a decrypted value of
input . |
static Column |
aes_decrypt(Column input,
Column key,
Column mode)
Returns a decrypted value of
input . |
static Column |
aes_decrypt(Column input,
Column key,
Column mode,
Column padding)
Returns a decrypted value of
input . |
static Column |
aes_decrypt(Column input,
Column key,
Column mode,
Column padding,
Column aad)
Returns a decrypted value of
input using AES in mode with padding . |
static Column |
aes_encrypt(Column input,
Column key)
Returns an encrypted value of
input . |
static Column |
aes_encrypt(Column input,
Column key,
Column mode)
Returns an encrypted value of
input . |
static Column |
aes_encrypt(Column input,
Column key,
Column mode,
Column padding)
Returns an encrypted value of
input . |
static Column |
aes_encrypt(Column input,
Column key,
Column mode,
Column padding,
Column iv)
Returns an encrypted value of
input . |
static Column |
aes_encrypt(Column input,
Column key,
Column mode,
Column padding,
Column iv,
Column aad)
Returns an encrypted value of
input using AES in given mode with the specified padding . |
static Column |
aggregate(Column expr,
Column initialValue,
scala.Function2<Column,Column,Column> merge)
Applies a binary operator to an initial state and all elements in the array,
and reduces this to a single state.
|
static Column |
aggregate(Column expr,
Column initialValue,
scala.Function2<Column,Column,Column> merge,
scala.Function1<Column,Column> finish)
Applies a binary operator to an initial state and all elements in the array,
and reduces this to a single state.
|
static Column |
any_value(Column e)
Aggregate function: returns some value of
e for a group of rows. |
static Column |
any_value(Column e,
Column ignoreNulls)
Aggregate function: returns some value of
e for a group of rows. |
static Column |
any(Column e)
Aggregate function: returns true if at least one value of
e is true. |
static Column |
approx_count_distinct(Column e)
Aggregate function: returns the approximate number of distinct items in a group.
|
static Column |
approx_count_distinct(Column e,
double rsd)
Aggregate function: returns the approximate number of distinct items in a group.
|
static Column |
approx_count_distinct(String columnName)
Aggregate function: returns the approximate number of distinct items in a group.
|
static Column |
approx_count_distinct(String columnName,
double rsd)
Aggregate function: returns the approximate number of distinct items in a group.
|
static Column |
approx_percentile(Column e,
Column percentage,
Column accuracy)
Aggregate function: returns the approximate
percentile of the numeric column col which
is the smallest value in the ordered col values (sorted from least to greatest) such that
no more than percentage of col values is less than the value or equal to that value. |
static Column |
approxCountDistinct(Column e)
Deprecated.
Use approx_count_distinct. Since 2.1.0.
|
static Column |
approxCountDistinct(Column e,
double rsd)
Deprecated.
Use approx_count_distinct. Since 2.1.0.
|
static Column |
approxCountDistinct(String columnName)
Deprecated.
Use approx_count_distinct. Since 2.1.0.
|
static Column |
approxCountDistinct(String columnName,
double rsd)
Deprecated.
Use approx_count_distinct. Since 2.1.0.
|
static Column |
array_agg(Column e)
Aggregate function: returns a list of objects with duplicates.
|
static Column |
array_append(Column column,
Object element)
Returns an ARRAY containing all elements from the source ARRAY as well as the new element.
|
static Column |
array_compact(Column column)
Remove all null elements from the given array.
|
static Column |
array_contains(Column column,
Object value)
Returns null if the array is null, true if the array contains
value , and false otherwise. |
static Column |
array_distinct(Column e)
Removes duplicate values from the array.
|
static Column |
array_except(Column col1,
Column col2)
Returns an array of the elements in the first array but not in the second array,
without duplicates.
|
static Column |
array_insert(Column arr,
Column pos,
Column value)
Adds an item into a given array at a specified position
|
static Column |
array_intersect(Column col1,
Column col2)
Returns an array of the elements in the intersection of the given two arrays,
without duplicates.
|
static Column |
array_join(Column column,
String delimiter)
Concatenates the elements of
column using the delimiter . |
static Column |
array_join(Column column,
String delimiter,
String nullReplacement)
Concatenates the elements of
column using the delimiter . |
static Column |
array_max(Column e)
Returns the maximum value in the array.
|
static Column |
array_min(Column e)
Returns the minimum value in the array.
|
static Column |
array_position(Column column,
Object value)
Locates the position of the first occurrence of the value in the given array as long.
|
static Column |
array_prepend(Column column,
Object element)
Returns an array containing value as well as all elements from array.
|
static Column |
array_remove(Column column,
Object element)
Remove all elements that equal to element from the given array.
|
static Column |
array_repeat(Column left,
Column right)
Creates an array containing the left argument repeated the number of times given by the
right argument.
|
static Column |
array_repeat(Column e,
int count)
Creates an array containing the left argument repeated the number of times given by the
right argument.
|
static Column |
array_size(Column e)
Returns the total number of elements in the array.
|
static Column |
array_sort(Column e)
Sorts the input array in ascending order.
|
static Column |
array_sort(Column e,
scala.Function2<Column,Column,Column> comparator)
Sorts the input array based on the given comparator function.
|
static Column |
array_union(Column col1,
Column col2)
Returns an array of the elements in the union of the given two arrays, without duplicates.
|
static Column |
array(Column... cols)
Creates a new array column.
|
static Column |
array(scala.collection.Seq<Column> cols)
Creates a new array column.
|
static Column |
array(String colName,
scala.collection.Seq<String> colNames)
Creates a new array column.
|
static Column |
array(String colName,
String... colNames)
Creates a new array column.
|
static Column |
arrays_overlap(Column a1,
Column a2)
Returns
true if a1 and a2 have at least one non-null element in common. |
static Column |
arrays_zip(Column... e)
Returns a merged array of structs in which the N-th struct contains all N-th values of input
arrays.
|
static Column |
arrays_zip(scala.collection.Seq<Column> e)
Returns a merged array of structs in which the N-th struct contains all N-th values of input
arrays.
|
static Column |
asc_nulls_first(String columnName)
Returns a sort expression based on ascending order of the column,
and null values return before non-null values.
|
static Column |
asc_nulls_last(String columnName)
Returns a sort expression based on ascending order of the column,
and null values appear after non-null values.
|
static Column |
asc(String columnName)
Returns a sort expression based on ascending order of the column.
|
static Column |
ascii(Column e)
Computes the numeric value of the first character of the string column, and returns the
result as an int column.
|
static Column |
asin(Column e) |
static Column |
asin(String columnName) |
static Column |
asinh(Column e) |
static Column |
asinh(String columnName) |
static Column |
assert_true(Column c)
Returns null if the condition is true, and throws an exception otherwise.
|
static Column |
assert_true(Column c,
Column e)
Returns null if the condition is true; throws an exception with the error message otherwise.
|
static Column |
atan(Column e) |
static Column |
atan(String columnName) |
static Column |
atan2(Column y,
Column x) |
static Column |
atan2(Column y,
double xValue) |
static Column |
atan2(Column y,
String xName) |
static Column |
atan2(double yValue,
Column x) |
static Column |
atan2(double yValue,
String xName) |
static Column |
atan2(String yName,
Column x) |
static Column |
atan2(String yName,
double xValue) |
static Column |
atan2(String yName,
String xName) |
static Column |
atanh(Column e) |
static Column |
atanh(String columnName) |
static Column |
avg(Column e)
Aggregate function: returns the average of the values in a group.
|
static Column |
avg(String columnName)
Aggregate function: returns the average of the values in a group.
|
static Column |
base64(Column e)
Computes the BASE64 encoding of a binary column and returns it as a string column.
|
static Column |
bin(Column e)
An expression that returns the string representation of the binary value of the given long
column.
|
static Column |
bin(String columnName)
An expression that returns the string representation of the binary value of the given long
column.
|
static Column |
bit_and(Column e)
Aggregate function: returns the bitwise AND of all non-null input values, or null if none.
|
static Column |
bit_count(Column e)
Returns the number of bits that are set in the argument expr as an unsigned 64-bit integer,
or NULL if the argument is NULL.
|
static Column |
bit_get(Column e,
Column pos)
Returns the value of the bit (0 or 1) at the specified position.
|
static Column |
bit_length(Column e)
Calculates the bit length for the specified string column.
|
static Column |
bit_or(Column e)
Aggregate function: returns the bitwise OR of all non-null input values, or null if none.
|
static Column |
bit_xor(Column e)
Aggregate function: returns the bitwise XOR of all non-null input values, or null if none.
|
static Column |
bitmap_bit_position(Column col)
Returns the bit position for the given input column.
|
static Column |
bitmap_bucket_number(Column col)
Returns the bucket number for the given input column.
|
static Column |
bitmap_construct_agg(Column col)
Returns a bitmap with the positions of the bits set from all the values from the input column.
|
static Column |
bitmap_count(Column col)
Returns the number of set bits in the input bitmap.
|
static Column |
bitmap_or_agg(Column col)
Returns a bitmap that is the bitwise OR of all of the bitmaps from the input column.
|
static Column |
bitwise_not(Column e)
Computes bitwise NOT (~) of a number.
|
static Column |
bitwiseNOT(Column e)
Deprecated.
Use bitwise_not. Since 3.2.0.
|
static Column |
bool_and(Column e)
Aggregate function: returns true if all values of
e are true. |
static Column |
bool_or(Column e)
Aggregate function: returns true if at least one value of
e is true. |
static <T> Dataset<T> |
broadcast(Dataset<T> df)
Marks a DataFrame as small enough for use in broadcast joins.
|
static Column |
bround(Column e)
Returns the value of the column
e rounded to 0 decimal places with HALF_EVEN round mode. |
static Column |
bround(Column e,
int scale)
Round the value of
e to scale decimal places with HALF_EVEN round mode
if scale is greater than or equal to 0 or at integral part when scale is less than 0. |
static Column |
btrim(Column str)
Removes the leading and trailing space characters from
str . |
static Column |
btrim(Column str,
Column trim)
Remove the leading and trailing
trim characters from str . |
static Column |
bucket(Column numBuckets,
Column e)
A transform for any type that partitions by a hash of the input column.
|
static Column |
bucket(int numBuckets,
Column e)
A transform for any type that partitions by a hash of the input column.
|
static Column |
call_function(String funcName,
Column... cols)
Call a SQL function.
|
static Column |
call_function(String funcName,
scala.collection.Seq<Column> cols)
Call a SQL function.
|
static Column |
call_udf(String udfName,
Column... cols)
Call an user-defined function.
|
static Column |
call_udf(String udfName,
scala.collection.Seq<Column> cols)
Call an user-defined function.
|
static Column |
callUDF(String udfName,
Column... cols)
Call an user-defined function.
|
static Column |
callUDF(String udfName,
scala.collection.Seq<Column> cols)
Deprecated.
Use call_udf. Since .
|
static Column |
cardinality(Column e)
Returns length of array or map.
|
static Column |
cbrt(Column e)
Computes the cube-root of the given value.
|
static Column |
cbrt(String columnName)
Computes the cube-root of the given column.
|
static Column |
ceil(Column e)
Computes the ceiling of the given value of
e to 0 decimal places. |
static Column |
ceil(Column e,
Column scale)
Computes the ceiling of the given value of
e to scale decimal places. |
static Column |
ceil(String columnName)
Computes the ceiling of the given value of
e to 0 decimal places. |
static Column |
ceiling(Column e)
Computes the ceiling of the given value of
e to 0 decimal places. |
static Column |
ceiling(Column e,
Column scale)
Computes the ceiling of the given value of
e to scale decimal places. |
static Column |
char_length(Column str)
Returns the character length of string data or number of bytes of binary data.
|
static Column |
character_length(Column str)
Returns the character length of string data or number of bytes of binary data.
|
static Column |
chr(Column n)
Returns the ASCII character having the binary equivalent to
n . |
static Column |
coalesce(Column... e)
Returns the first column that is not null, or null if all inputs are null.
|
static Column |
coalesce(scala.collection.Seq<Column> e)
Returns the first column that is not null, or null if all inputs are null.
|
static Column |
col(String colName)
Returns a
Column based on the given column name. |
static Column |
collect_list(Column e)
Aggregate function: returns a list of objects with duplicates.
|
static Column |
collect_list(String columnName)
Aggregate function: returns a list of objects with duplicates.
|
static Column |
collect_set(Column e)
Aggregate function: returns a set of objects with duplicate elements eliminated.
|
static Column |
collect_set(String columnName)
Aggregate function: returns a set of objects with duplicate elements eliminated.
|
static Column |
column(String colName)
Returns a
Column based on the given column name. |
static Column |
concat_ws(String sep,
Column... exprs)
Concatenates multiple input string columns together into a single string column,
using the given separator.
|
static Column |
concat_ws(String sep,
scala.collection.Seq<Column> exprs)
Concatenates multiple input string columns together into a single string column,
using the given separator.
|
static Column |
concat(Column... exprs)
Concatenates multiple input columns together into a single column.
|
static Column |
concat(scala.collection.Seq<Column> exprs)
Concatenates multiple input columns together into a single column.
|
static Column |
contains(Column left,
Column right)
Returns a boolean.
|
static Column |
conv(Column num,
int fromBase,
int toBase)
Convert a number in a string column from one base to another.
|
static Column |
convert_timezone(Column targetTz,
Column sourceTs)
Converts the timestamp without time zone
sourceTs
from the current time zone to targetTz . |
static Column |
convert_timezone(Column sourceTz,
Column targetTz,
Column sourceTs)
Converts the timestamp without time zone
sourceTs
from the sourceTz time zone to targetTz . |
static Column |
corr(Column column1,
Column column2)
Aggregate function: returns the Pearson Correlation Coefficient for two columns.
|
static Column |
corr(String columnName1,
String columnName2)
Aggregate function: returns the Pearson Correlation Coefficient for two columns.
|
static Column |
cos(Column e) |
static Column |
cos(String columnName) |
static Column |
cosh(Column e) |
static Column |
cosh(String columnName) |
static Column |
cot(Column e) |
static Column |
count_distinct(Column expr,
Column... exprs)
Aggregate function: returns the number of distinct items in a group.
|
static Column |
count_distinct(Column expr,
scala.collection.Seq<Column> exprs)
Aggregate function: returns the number of distinct items in a group.
|
static Column |
count_if(Column e)
Aggregate function: returns the number of
TRUE values for the expression. |
static Column |
count_min_sketch(Column e,
Column eps,
Column confidence,
Column seed)
Returns a count-min sketch of a column with the given esp, confidence and seed.
|
static Column |
count(Column e)
Aggregate function: returns the number of items in a group.
|
static TypedColumn<Object,Object> |
count(String columnName)
Aggregate function: returns the number of items in a group.
|
static Column |
countDistinct(Column expr,
Column... exprs)
Aggregate function: returns the number of distinct items in a group.
|
static Column |
countDistinct(Column expr,
scala.collection.Seq<Column> exprs)
Aggregate function: returns the number of distinct items in a group.
|
static Column |
countDistinct(String columnName,
scala.collection.Seq<String> columnNames)
Aggregate function: returns the number of distinct items in a group.
|
static Column |
countDistinct(String columnName,
String... columnNames)
Aggregate function: returns the number of distinct items in a group.
|
static Column |
covar_pop(Column column1,
Column column2)
Aggregate function: returns the population covariance for two columns.
|
static Column |
covar_pop(String columnName1,
String columnName2)
Aggregate function: returns the population covariance for two columns.
|
static Column |
covar_samp(Column column1,
Column column2)
Aggregate function: returns the sample covariance for two columns.
|
static Column |
covar_samp(String columnName1,
String columnName2)
Aggregate function: returns the sample covariance for two columns.
|
static Column |
crc32(Column e)
Calculates the cyclic redundancy check value (CRC32) of a binary column and
returns the value as a bigint.
|
static Column |
csc(Column e) |
static Column |
cume_dist()
Window function: returns the cumulative distribution of values within a window partition,
i.e.
|
static Column |
curdate()
Returns the current date at the start of query evaluation as a date column.
|
static Column |
current_catalog()
Returns the current catalog.
|
static Column |
current_database()
Returns the current database.
|
static Column |
current_date()
Returns the current date at the start of query evaluation as a date column.
|
static Column |
current_schema()
Returns the current schema.
|
static Column |
current_timestamp()
Returns the current timestamp at the start of query evaluation as a timestamp column.
|
static Column |
current_timezone()
Returns the current session local timezone.
|
static Column |
current_user()
Returns the user name of current execution context.
|
static Column |
date_add(Column start,
Column days)
Returns the date that is
days days after start |
static Column |
date_add(Column start,
int days)
Returns the date that is
days days after start |
static Column |
date_diff(Column end,
Column start)
Returns the number of days from
start to end . |
static Column |
date_format(Column dateExpr,
String format)
Converts a date/timestamp/string to a value of string in the format specified by the date
format given by the second argument.
|
static Column |
date_from_unix_date(Column days)
Create date from the number of
days since 1970-01-01. |
static Column |
date_part(Column field,
Column source)
Extracts a part of the date/timestamp or interval source.
|
static Column |
date_sub(Column start,
Column days)
Returns the date that is
days days before start |
static Column |
date_sub(Column start,
int days)
Returns the date that is
days days before start |
static Column |
date_trunc(String format,
Column timestamp)
Returns timestamp truncated to the unit specified by the format.
|
static Column |
dateadd(Column start,
Column days)
Returns the date that is
days days after start |
static Column |
datediff(Column end,
Column start)
Returns the number of days from
start to end . |
static Column |
datepart(Column field,
Column source)
Extracts a part of the date/timestamp or interval source.
|
static Column |
day(Column e)
Extracts the day of the month as an integer from a given date/timestamp/string.
|
static Column |
dayofmonth(Column e)
Extracts the day of the month as an integer from a given date/timestamp/string.
|
static Column |
dayofweek(Column e)
Extracts the day of the week as an integer from a given date/timestamp/string.
|
static Column |
dayofyear(Column e)
Extracts the day of the year as an integer from a given date/timestamp/string.
|
static Column |
days(Column e)
A transform for timestamps and dates to partition data into days.
|
static Column |
decode(Column value,
String charset)
Computes the first argument into a string from a binary using the provided character set
(one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16').
|
static Column |
degrees(Column e)
Converts an angle measured in radians to an approximately equivalent angle measured in degrees.
|
static Column |
degrees(String columnName)
Converts an angle measured in radians to an approximately equivalent angle measured in degrees.
|
static Column |
dense_rank()
Window function: returns the rank of rows within a window partition, without any gaps.
|
static Column |
desc_nulls_first(String columnName)
Returns a sort expression based on the descending order of the column,
and null values appear before non-null values.
|
static Column |
desc_nulls_last(String columnName)
Returns a sort expression based on the descending order of the column,
and null values appear after non-null values.
|
static Column |
desc(String columnName)
Returns a sort expression based on the descending order of the column.
|
static Column |
e()
Returns Euler's number.
|
static Column |
element_at(Column column,
Object value)
Returns element of array at given index in value if column is array.
|
static Column |
elt(Column... inputs)
Returns the
n -th input, e.g., returns input2 when n is 2. |
static Column |
elt(scala.collection.Seq<Column> inputs)
Returns the
n -th input, e.g., returns input2 when n is 2. |
static Column |
encode(Column value,
String charset)
Computes the first argument into a binary from a string using the provided character set
(one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16').
|
static Column |
endswith(Column str,
Column suffix)
Returns a boolean.
|
static Column |
equal_null(Column col1,
Column col2)
Returns same result as the EQUAL(=) operator for non-null operands,
but returns true if both are null, false if one of the them is null.
|
static Column |
every(Column e)
Aggregate function: returns true if all values of
e are true. |
static Column |
exists(Column column,
scala.Function1<Column,Column> f)
Returns whether a predicate holds for one or more elements in the array.
|
static Column |
exp(Column e)
Computes the exponential of the given value.
|
static Column |
exp(String columnName)
Computes the exponential of the given column.
|
static Column |
explode_outer(Column e)
Creates a new row for each element in the given array or map column.
|
static Column |
explode(Column e)
Creates a new row for each element in the given array or map column.
|
static Column |
expm1(Column e)
Computes the exponential of the given value minus one.
|
static Column |
expm1(String columnName)
Computes the exponential of the given column minus one.
|
static Column |
expr(String expr)
Parses the expression string into the column that it represents, similar to
Dataset.selectExpr(java.lang.String...) . |
static Column |
extract(Column field,
Column source)
Extracts a part of the date/timestamp or interval source.
|
static Column |
factorial(Column e)
Computes the factorial of the given value.
|
static Column |
filter(Column column,
scala.Function1<Column,Column> f)
Returns an array of elements for which a predicate holds in a given array.
|
static Column |
filter(Column column,
scala.Function2<Column,Column,Column> f)
Returns an array of elements for which a predicate holds in a given array.
|
static Column |
find_in_set(Column str,
Column strArray)
Returns the index (1-based) of the given string (
str ) in the comma-delimited
list (strArray ). |
static Column |
first_value(Column e)
Aggregate function: returns the first value in a group.
|
static Column |
first_value(Column e,
Column ignoreNulls)
Aggregate function: returns the first value in a group.
|
static Column |
first(Column e)
Aggregate function: returns the first value in a group.
|
static Column |
first(Column e,
boolean ignoreNulls)
Aggregate function: returns the first value in a group.
|
static Column |
first(String columnName)
Aggregate function: returns the first value of a column in a group.
|
static Column |
first(String columnName,
boolean ignoreNulls)
Aggregate function: returns the first value of a column in a group.
|
static Column |
flatten(Column e)
Creates a single array from an array of arrays.
|
static Column |
floor(Column e)
Computes the floor of the given value of
e to 0 decimal places. |
static Column |
floor(Column e,
Column scale)
Computes the floor of the given value of
e to scale decimal places. |
static Column |
floor(String columnName)
Computes the floor of the given column value to 0 decimal places.
|
static Column |
forall(Column column,
scala.Function1<Column,Column> f)
Returns whether a predicate holds for every element in the array.
|
static Column |
format_number(Column x,
int d)
Formats numeric column x to a format like '#,###,###.##', rounded to d decimal places
with HALF_EVEN round mode, and returns the result as a string column.
|
static Column |
format_string(String format,
Column... arguments)
Formats the arguments in printf-style and returns the result as a string column.
|
static Column |
format_string(String format,
scala.collection.Seq<Column> arguments)
Formats the arguments in printf-style and returns the result as a string column.
|
static Column |
from_csv(Column e,
Column schema,
java.util.Map<String,String> options)
(Java-specific) Parses a column containing a CSV string into a
StructType
with the specified schema. |
static Column |
from_csv(Column e,
StructType schema,
scala.collection.immutable.Map<String,String> options)
Parses a column containing a CSV string into a
StructType with the specified schema. |
static Column |
from_json(Column e,
Column schema)
(Scala-specific) Parses a column containing a JSON string into a
MapType with StringType
as keys type, StructType or ArrayType of StructType s with the specified schema. |
static Column |
from_json(Column e,
Column schema,
java.util.Map<String,String> options)
(Java-specific) Parses a column containing a JSON string into a
MapType with StringType
as keys type, StructType or ArrayType of StructType s with the specified schema. |
static Column |
from_json(Column e,
DataType schema)
Parses a column containing a JSON string into a
MapType with StringType as keys type,
StructType or ArrayType with the specified schema. |
static Column |
from_json(Column e,
DataType schema,
scala.collection.immutable.Map<String,String> options)
(Scala-specific) Parses a column containing a JSON string into a
MapType with StringType
as keys type, StructType or ArrayType with the specified schema. |
static Column |
from_json(Column e,
DataType schema,
java.util.Map<String,String> options)
(Java-specific) Parses a column containing a JSON string into a
MapType with StringType
as keys type, StructType or ArrayType with the specified schema. |
static Column |
from_json(Column e,
String schema,
java.util.Map<String,String> options)
(Java-specific) Parses a column containing a JSON string into a
MapType with StringType
as keys type, StructType or ArrayType with the specified schema. |
static Column |
from_json(Column e,
String schema,
scala.collection.immutable.Map<String,String> options)
(Scala-specific) Parses a column containing a JSON string into a
MapType with StringType
as keys type, StructType or ArrayType with the specified schema. |
static Column |
from_json(Column e,
StructType schema)
Parses a column containing a JSON string into a
StructType with the specified schema. |
static Column |
from_json(Column e,
StructType schema,
scala.collection.immutable.Map<String,String> options)
(Scala-specific) Parses a column containing a JSON string into a
StructType with the
specified schema. |
static Column |
from_json(Column e,
StructType schema,
java.util.Map<String,String> options)
(Java-specific) Parses a column containing a JSON string into a
StructType with the
specified schema. |
static Column |
from_unixtime(Column ut)
Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string
representing the timestamp of that moment in the current system time zone in the
yyyy-MM-dd HH:mm:ss format.
|
static Column |
from_unixtime(Column ut,
String f)
Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string
representing the timestamp of that moment in the current system time zone in the given
format.
|
static Column |
from_utc_timestamp(Column ts,
Column tz)
Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders
that time as a timestamp in the given time zone.
|
static Column |
from_utc_timestamp(Column ts,
String tz)
Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders
that time as a timestamp in the given time zone.
|
static Column |
get_json_object(Column e,
String path)
Extracts json object from a json string based on json path specified, and returns json string
of the extracted json object.
|
static Column |
get(Column column,
Column index)
Returns element of array at given (0-based) index.
|
static Column |
getbit(Column e,
Column pos)
Returns the value of the bit (0 or 1) at the specified position.
|
static Column |
greatest(Column... exprs)
Returns the greatest value of the list of values, skipping null values.
|
static Column |
greatest(scala.collection.Seq<Column> exprs)
Returns the greatest value of the list of values, skipping null values.
|
static Column |
greatest(String columnName,
scala.collection.Seq<String> columnNames)
Returns the greatest value of the list of column names, skipping null values.
|
static Column |
greatest(String columnName,
String... columnNames)
Returns the greatest value of the list of column names, skipping null values.
|
static Column |
grouping_id(scala.collection.Seq<Column> cols)
Aggregate function: returns the level of grouping, equals to
|
static Column |
grouping_id(String colName,
scala.collection.Seq<String> colNames)
Aggregate function: returns the level of grouping, equals to
|
static Column |
grouping(Column e)
Aggregate function: indicates whether a specified column in a GROUP BY list is aggregated
or not, returns 1 for aggregated or 0 for not aggregated in the result set.
|
static Column |
grouping(String columnName)
Aggregate function: indicates whether a specified column in a GROUP BY list is aggregated
or not, returns 1 for aggregated or 0 for not aggregated in the result set.
|
static Column |
hash(Column... cols)
Calculates the hash code of given columns, and returns the result as an int column.
|
static Column |
hash(scala.collection.Seq<Column> cols)
Calculates the hash code of given columns, and returns the result as an int column.
|
static Column |
hex(Column column)
Computes hex value of the given column.
|
static Column |
histogram_numeric(Column e,
Column nBins)
Aggregate function: computes a histogram on numeric 'expr' using nb bins.
|
static Column |
hll_sketch_agg(Column e)
Aggregate function: returns the updatable binary representation of the Datasketches
HllSketch configured with default lgConfigK value.
|
static Column |
hll_sketch_agg(Column e,
Column lgConfigK)
Aggregate function: returns the updatable binary representation of the Datasketches
HllSketch configured with lgConfigK arg.
|
static Column |
hll_sketch_agg(Column e,
int lgConfigK)
Aggregate function: returns the updatable binary representation of the Datasketches
HllSketch configured with lgConfigK arg.
|
static Column |
hll_sketch_agg(String columnName)
Aggregate function: returns the updatable binary representation of the Datasketches
HllSketch configured with default lgConfigK value.
|
static Column |
hll_sketch_agg(String columnName,
int lgConfigK)
Aggregate function: returns the updatable binary representation of the Datasketches
HllSketch configured with lgConfigK arg.
|
static Column |
hll_sketch_estimate(Column c)
Returns the estimated number of unique values given the binary representation
of a Datasketches HllSketch.
|
static Column |
hll_sketch_estimate(String columnName)
Returns the estimated number of unique values given the binary representation
of a Datasketches HllSketch.
|
static Column |
hll_union_agg(Column e)
Aggregate function: returns the updatable binary representation of the Datasketches
HllSketch, generated by merging previously created Datasketches HllSketch instances
via a Datasketches Union instance.
|
static Column |
hll_union_agg(Column e,
boolean allowDifferentLgConfigK)
Aggregate function: returns the updatable binary representation of the Datasketches
HllSketch, generated by merging previously created Datasketches HllSketch instances
via a Datasketches Union instance.
|
static Column |
hll_union_agg(Column e,
Column allowDifferentLgConfigK)
Aggregate function: returns the updatable binary representation of the Datasketches
HllSketch, generated by merging previously created Datasketches HllSketch instances
via a Datasketches Union instance.
|
static Column |
hll_union_agg(String columnName)
Aggregate function: returns the updatable binary representation of the Datasketches
HllSketch, generated by merging previously created Datasketches HllSketch instances
via a Datasketches Union instance.
|
static Column |
hll_union_agg(String columnName,
boolean allowDifferentLgConfigK)
Aggregate function: returns the updatable binary representation of the Datasketches
HllSketch, generated by merging previously created Datasketches HllSketch instances
via a Datasketches Union instance.
|
static Column |
hll_union(Column c1,
Column c2)
Merges two binary representations of Datasketches HllSketch objects, using a
Datasketches Union object.
|
static Column |
hll_union(Column c1,
Column c2,
boolean allowDifferentLgConfigK)
Merges two binary representations of Datasketches HllSketch objects, using a
Datasketches Union object.
|
static Column |
hll_union(String columnName1,
String columnName2)
Merges two binary representations of Datasketches HllSketch objects, using a
Datasketches Union object.
|
static Column |
hll_union(String columnName1,
String columnName2,
boolean allowDifferentLgConfigK)
Merges two binary representations of Datasketches HllSketch objects, using a
Datasketches Union object.
|
static Column |
hour(Column e)
Extracts the hours as an integer from a given date/timestamp/string.
|
static Column |
hours(Column e)
A transform for timestamps to partition data into hours.
|
static Column |
hypot(Column l,
Column r)
Computes
sqrt(a^2^ + b^2^) without intermediate overflow or underflow. |
static Column |
hypot(Column l,
double r)
Computes
sqrt(a^2^ + b^2^) without intermediate overflow or underflow. |
static Column |
hypot(Column l,
String rightName)
Computes
sqrt(a^2^ + b^2^) without intermediate overflow or underflow. |
static Column |
hypot(double l,
Column r)
Computes
sqrt(a^2^ + b^2^) without intermediate overflow or underflow. |
static Column |
hypot(double l,
String rightName)
Computes
sqrt(a^2^ + b^2^) without intermediate overflow or underflow. |
static Column |
hypot(String leftName,
Column r)
Computes
sqrt(a^2^ + b^2^) without intermediate overflow or underflow. |
static Column |
hypot(String leftName,
double r)
Computes
sqrt(a^2^ + b^2^) without intermediate overflow or underflow. |
static Column |
hypot(String leftName,
String rightName)
Computes
sqrt(a^2^ + b^2^) without intermediate overflow or underflow. |
static Column |
ifnull(Column col1,
Column col2)
Returns
col2 if col1 is null, or col1 otherwise. |
static Column |
ilike(Column str,
Column pattern)
Returns true if str matches
pattern with escapeChar ('\') case-insensitively, null if any
arguments are null, false otherwise. |
static Column |
ilike(Column str,
Column pattern,
Column escapeChar)
Returns true if str matches
pattern with escapeChar case-insensitively, null if any
arguments are null, false otherwise. |
static Column |
initcap(Column e)
Returns a new string column by converting the first letter of each word to uppercase.
|
static Column |
inline_outer(Column e)
Creates a new row for each element in the given array of structs.
|
static Column |
inline(Column e)
Creates a new row for each element in the given array of structs.
|
static Column |
input_file_block_length()
Returns the length of the block being read, or -1 if not available.
|
static Column |
input_file_block_start()
Returns the start offset of the block being read, or -1 if not available.
|
static Column |
input_file_name()
Creates a string column for the file name of the current Spark task.
|
static Column |
instr(Column str,
String substring)
Locate the position of the first occurrence of substr column in the given string.
|
static Column |
isnan(Column e)
Return true iff the column is NaN.
|
static Column |
isnotnull(Column col)
Returns true if
col is not null, or false otherwise. |
static Column |
isnull(Column e)
Return true iff the column is null.
|
static Column |
java_method(scala.collection.Seq<Column> cols)
Calls a method with reflection.
|
static Column |
json_array_length(Column jsonArray)
Returns the number of elements in the outermost JSON array.
|
static Column |
json_object_keys(Column json)
Returns all the keys of the outermost JSON object as an array.
|
static Column |
json_tuple(Column json,
scala.collection.Seq<String> fields)
Creates a new row for a json column according to the given field names.
|
static Column |
json_tuple(Column json,
String... fields)
Creates a new row for a json column according to the given field names.
|
static Column |
kurtosis(Column e)
Aggregate function: returns the kurtosis of the values in a group.
|
static Column |
kurtosis(String columnName)
Aggregate function: returns the kurtosis of the values in a group.
|
static Column |
lag(Column e,
int offset)
Window function: returns the value that is
offset rows before the current row, and
null if there is less than offset rows before the current row. |
static Column |
lag(Column e,
int offset,
Object defaultValue)
Window function: returns the value that is
offset rows before the current row, and
defaultValue if there is less than offset rows before the current row. |
static Column |
lag(Column e,
int offset,
Object defaultValue,
boolean ignoreNulls)
Window function: returns the value that is
offset rows before the current row, and
defaultValue if there is less than offset rows before the current row. |
static Column |
lag(String columnName,
int offset)
Window function: returns the value that is
offset rows before the current row, and
null if there is less than offset rows before the current row. |
static Column |
lag(String columnName,
int offset,
Object defaultValue)
Window function: returns the value that is
offset rows before the current row, and
defaultValue if there is less than offset rows before the current row. |
static Column |
last_day(Column e)
Returns the last day of the month which the given date belongs to.
|
static Column |
last_value(Column e)
Aggregate function: returns the last value in a group.
|
static Column |
last_value(Column e,
Column ignoreNulls)
Aggregate function: returns the last value in a group.
|
static Column |
last(Column e)
Aggregate function: returns the last value in a group.
|
static Column |
last(Column e,
boolean ignoreNulls)
Aggregate function: returns the last value in a group.
|
static Column |
last(String columnName)
Aggregate function: returns the last value of the column in a group.
|
static Column |
last(String columnName,
boolean ignoreNulls)
Aggregate function: returns the last value of the column in a group.
|
static Column |
lcase(Column str)
Returns
str with all characters changed to lowercase. |
static Column |
lead(Column e,
int offset)
Window function: returns the value that is
offset rows after the current row, and
null if there is less than offset rows after the current row. |
static Column |
lead(Column e,
int offset,
Object defaultValue)
Window function: returns the value that is
offset rows after the current row, and
defaultValue if there is less than offset rows after the current row. |
static Column |
lead(Column e,
int offset,
Object defaultValue,
boolean ignoreNulls)
Window function: returns the value that is
offset rows after the current row, and
defaultValue if there is less than offset rows after the current row. |
static Column |
lead(String columnName,
int offset)
Window function: returns the value that is
offset rows after the current row, and
null if there is less than offset rows after the current row. |
static Column |
lead(String columnName,
int offset,
Object defaultValue)
Window function: returns the value that is
offset rows after the current row, and
defaultValue if there is less than offset rows after the current row. |
static Column |
least(Column... exprs)
Returns the least value of the list of values, skipping null values.
|
static Column |
least(scala.collection.Seq<Column> exprs)
Returns the least value of the list of values, skipping null values.
|
static Column |
least(String columnName,
scala.collection.Seq<String> columnNames)
Returns the least value of the list of column names, skipping null values.
|
static Column |
least(String columnName,
String... columnNames)
Returns the least value of the list of column names, skipping null values.
|
static Column |
left(Column str,
Column len)
Returns the leftmost
len (len can be string type) characters from the string str ,
if len is less or equal than 0 the result is an empty string. |
static Column |
len(Column e)
Computes the character length of a given string or number of bytes of a binary string.
|
static Column |
length(Column e)
Computes the character length of a given string or number of bytes of a binary string.
|
static Column |
levenshtein(Column l,
Column r)
Computes the Levenshtein distance of the two given string columns.
|
static Column |
levenshtein(Column l,
Column r,
int threshold)
Computes the Levenshtein distance of the two given string columns if it's less than or
equal to a given threshold.
|
static Column |
like(Column str,
Column pattern)
Returns true if str matches
pattern with escapeChar ('\'), null if any arguments are null,
false otherwise. |
static Column |
like(Column str,
Column pattern,
Column escapeChar)
Returns true if str matches
pattern with escapeChar , null if any arguments are null,
false otherwise. |
static Column |
lit(Object literal)
Creates a
Column of literal value. |
static Column |
ln(Column e)
Computes the natural logarithm of the given value.
|
static Column |
localtimestamp()
Returns the current timestamp without time zone at the start of query evaluation
as a timestamp without time zone column.
|
static Column |
locate(String substr,
Column str)
Locate the position of the first occurrence of substr.
|
static Column |
locate(String substr,
Column str,
int pos)
Locate the position of the first occurrence of substr in a string column, after position pos.
|
static Column |
log(Column e)
Computes the natural logarithm of the given value.
|
static Column |
log(double base,
Column a)
Returns the first argument-base logarithm of the second argument.
|
static Column |
log(double base,
String columnName)
Returns the first argument-base logarithm of the second argument.
|
static Column |
log(String columnName)
Computes the natural logarithm of the given column.
|
static Column |
log10(Column e)
Computes the logarithm of the given value in base 10.
|
static Column |
log10(String columnName)
Computes the logarithm of the given value in base 10.
|
static Column |
log1p(Column e)
Computes the natural logarithm of the given value plus one.
|
static Column |
log1p(String columnName)
Computes the natural logarithm of the given column plus one.
|
static Column |
log2(Column expr)
Computes the logarithm of the given column in base 2.
|
static Column |
log2(String columnName)
Computes the logarithm of the given value in base 2.
|
static Column |
lower(Column e)
Converts a string column to lower case.
|
static Column |
lpad(Column str,
int len,
byte[] pad)
Left-pad the binary column with pad to a byte length of len.
|
static Column |
lpad(Column str,
int len,
String pad)
Left-pad the string column with pad to a length of len.
|
static Column |
ltrim(Column e)
Trim the spaces from left end for the specified string value.
|
static Column |
ltrim(Column e,
String trimString)
Trim the specified character string from left end for the specified string column.
|
static Column |
make_date(Column year,
Column month,
Column day) |
static Column |
make_dt_interval()
Make DayTimeIntervalType duration.
|
static Column |
make_dt_interval(Column days)
Make DayTimeIntervalType duration from days.
|
static Column |
make_dt_interval(Column days,
Column hours)
Make DayTimeIntervalType duration from days and hours.
|
static Column |
make_dt_interval(Column days,
Column hours,
Column mins)
Make DayTimeIntervalType duration from days, hours and mins.
|
static Column |
make_dt_interval(Column days,
Column hours,
Column mins,
Column secs)
Make DayTimeIntervalType duration from days, hours, mins and secs.
|
static Column |
make_interval()
Make interval.
|
static Column |
make_interval(Column years)
Make interval from years.
|
static Column |
make_interval(Column years,
Column months)
Make interval from years and months.
|
static Column |
make_interval(Column years,
Column months,
Column weeks)
Make interval from years, months and weeks.
|
static Column |
make_interval(Column years,
Column months,
Column weeks,
Column days)
Make interval from years, months, weeks and days.
|
static Column |
make_interval(Column years,
Column months,
Column weeks,
Column days,
Column hours)
Make interval from years, months, weeks, days and hours.
|
static Column |
make_interval(Column years,
Column months,
Column weeks,
Column days,
Column hours,
Column mins)
Make interval from years, months, weeks, days, hours and mins.
|
static Column |
make_interval(Column years,
Column months,
Column weeks,
Column days,
Column hours,
Column mins,
Column secs)
Make interval from years, months, weeks, days, hours, mins and secs.
|
static Column |
make_timestamp_ltz(Column years,
Column months,
Column days,
Column hours,
Column mins,
Column secs)
Create the current timestamp with local time zone from years, months, days, hours, mins and
secs fields.
|
static Column |
make_timestamp_ltz(Column years,
Column months,
Column days,
Column hours,
Column mins,
Column secs,
Column timezone)
Create the current timestamp with local time zone from years, months, days, hours, mins, secs
and timezone fields.
|
static Column |
make_timestamp_ntz(Column years,
Column months,
Column days,
Column hours,
Column mins,
Column secs)
Create local date-time from years, months, days, hours, mins, secs fields.
|
static Column |
make_timestamp(Column years,
Column months,
Column days,
Column hours,
Column mins,
Column secs)
Create timestamp from years, months, days, hours, mins and secs fields.
|
static Column |
make_timestamp(Column years,
Column months,
Column days,
Column hours,
Column mins,
Column secs,
Column timezone)
Create timestamp from years, months, days, hours, mins, secs and timezone fields.
|
static Column |
make_ym_interval()
Make year-month interval.
|
static Column |
make_ym_interval(Column years)
Make year-month interval from years.
|
static Column |
make_ym_interval(Column years,
Column months)
Make year-month interval from years, months.
|
static Column |
map_concat(Column... cols)
Returns the union of all the given maps.
|
static Column |
map_concat(scala.collection.Seq<Column> cols)
Returns the union of all the given maps.
|
static Column |
map_contains_key(Column column,
Object key)
Returns true if the map contains the key.
|
static Column |
map_entries(Column e)
Returns an unordered array of all entries in the given map.
|
static Column |
map_filter(Column expr,
scala.Function2<Column,Column,Column> f)
Returns a map whose key-value pairs satisfy a predicate.
|
static Column |
map_from_arrays(Column keys,
Column values)
Creates a new map column.
|
static Column |
map_from_entries(Column e)
Returns a map created from the given array of entries.
|
static Column |
map_keys(Column e)
Returns an unordered array containing the keys of the map.
|
static Column |
map_values(Column e)
Returns an unordered array containing the values of the map.
|
static Column |
map_zip_with(Column left,
Column right,
scala.Function3<Column,Column,Column,Column> f)
Merge two given maps, key-wise into a single map using a function.
|
static Column |
map(Column... cols)
Creates a new map column.
|
static Column |
map(scala.collection.Seq<Column> cols)
Creates a new map column.
|
static Column |
mask(Column input)
Masks the given string value.
|
static Column |
mask(Column input,
Column upperChar)
Masks the given string value.
|
static Column |
mask(Column input,
Column upperChar,
Column lowerChar)
Masks the given string value.
|
static Column |
mask(Column input,
Column upperChar,
Column lowerChar,
Column digitChar)
Masks the given string value.
|
static Column |
mask(Column input,
Column upperChar,
Column lowerChar,
Column digitChar,
Column otherChar)
Masks the given string value.
|
static Column |
max_by(Column e,
Column ord)
Aggregate function: returns the value associated with the maximum value of ord.
|
static Column |
max(Column e)
Aggregate function: returns the maximum value of the expression in a group.
|
static Column |
max(String columnName)
Aggregate function: returns the maximum value of the column in a group.
|
static Column |
md5(Column e)
Calculates the MD5 digest of a binary column and returns the value
as a 32 character hex string.
|
static Column |
mean(Column e)
Aggregate function: returns the average of the values in a group.
|
static Column |
mean(String columnName)
Aggregate function: returns the average of the values in a group.
|
static Column |
median(Column e)
Aggregate function: returns the median of the values in a group.
|
static Column |
min_by(Column e,
Column ord)
Aggregate function: returns the value associated with the minimum value of ord.
|
static Column |
min(Column e)
Aggregate function: returns the minimum value of the expression in a group.
|
static Column |
min(String columnName)
Aggregate function: returns the minimum value of the column in a group.
|
static Column |
minute(Column e)
Extracts the minutes as an integer from a given date/timestamp/string.
|
static Column |
mode(Column e)
Aggregate function: returns the most frequent value in a group.
|
static Column |
monotonically_increasing_id()
A column expression that generates monotonically increasing 64-bit integers.
|
static Column |
monotonicallyIncreasingId()
Deprecated.
Use monotonically_increasing_id(). Since 2.0.0.
|
static Column |
month(Column e)
Extracts the month as an integer from a given date/timestamp/string.
|
static Column |
months_between(Column end,
Column start)
Returns number of months between dates
start and end . |
static Column |
months_between(Column end,
Column start,
boolean roundOff)
Returns number of months between dates
end and start . |
static Column |
months(Column e)
A transform for timestamps and dates to partition data into months.
|
static Column |
named_struct(scala.collection.Seq<Column> cols)
Creates a struct with the given field names and values.
|
static Column |
nanvl(Column col1,
Column col2)
Returns col1 if it is not NaN, or col2 if col1 is NaN.
|
static Column |
negate(Column e)
Unary minus, i.e.
|
static Column |
negative(Column e)
Returns the negated value.
|
static Column |
next_day(Column date,
Column dayOfWeek)
Returns the first date which is later than the value of the
date column that is on the
specified day of the week. |
static Column |
next_day(Column date,
String dayOfWeek)
Returns the first date which is later than the value of the
date column that is on the
specified day of the week. |
static Column |
not(Column e)
Inversion of boolean expression, i.e.
|
static Column |
now()
Returns the current timestamp at the start of query evaluation.
|
static Column |
nth_value(Column e,
int offset)
Window function: returns the value that is the
offset th row of the window frame
(counting from 1), and null if the size of window frame is less than offset rows. |
static Column |
nth_value(Column e,
int offset,
boolean ignoreNulls)
Window function: returns the value that is the
offset th row of the window frame
(counting from 1), and null if the size of window frame is less than offset rows. |
static Column |
ntile(int n)
Window function: returns the ntile group id (from 1 to
n inclusive) in an ordered window
partition. |
static Column |
nullif(Column col1,
Column col2)
Returns null if
col1 equals to col2 , or col1 otherwise. |
static Column |
nvl(Column col1,
Column col2)
Returns
col2 if col1 is null, or col1 otherwise. |
static Column |
nvl2(Column col1,
Column col2,
Column col3)
Returns
col2 if col1 is not null, or col3 otherwise. |
static Column |
octet_length(Column e)
Calculates the byte length for the specified string column.
|
static Column |
overlay(Column src,
Column replace,
Column pos)
Overlay the specified portion of
src with replace ,
starting from byte position pos of src . |
static Column |
overlay(Column src,
Column replace,
Column pos,
Column len)
Overlay the specified portion of
src with replace ,
starting from byte position pos of src and proceeding for len bytes. |
static Column |
parse_url(Column url,
Column partToExtract)
Extracts a part from a URL.
|
static Column |
parse_url(Column url,
Column partToExtract,
Column key)
Extracts a part from a URL.
|
static Column |
percent_rank()
Window function: returns the relative rank (i.e.
|
static Column |
percentile_approx(Column e,
Column percentage,
Column accuracy)
Aggregate function: returns the approximate
percentile of the numeric column col which
is the smallest value in the ordered col values (sorted from least to greatest) such that
no more than percentage of col values is less than the value or equal to that value. |
static Column |
percentile(Column e,
Column percentage)
Aggregate function: returns the exact percentile(s) of numeric column
expr at the
given percentage(s) with value range in [0.0, 1.0]. |
static Column |
percentile(Column e,
Column percentage,
Column frequency)
Aggregate function: returns the exact percentile(s) of numeric column
expr at the
given percentage(s) with value range in [0.0, 1.0]. |
static Column |
pi()
Returns Pi.
|
static Column |
pmod(Column dividend,
Column divisor)
Returns the positive value of dividend mod divisor.
|
static Column |
posexplode_outer(Column e)
Creates a new row for each element with position in the given array or map column.
|
static Column |
posexplode(Column e)
Creates a new row for each element with position in the given array or map column.
|
static Column |
position(Column substr,
Column str)
Returns the position of the first occurrence of
substr in str after position 1 . |
static Column |
position(Column substr,
Column str,
Column start)
Returns the position of the first occurrence of
substr in str after position start . |
static Column |
positive(Column e)
Returns the value.
|
static Column |
pow(Column l,
Column r)
Returns the value of the first argument raised to the power of the second argument.
|
static Column |
pow(Column l,
double r)
Returns the value of the first argument raised to the power of the second argument.
|
static Column |
pow(Column l,
String rightName)
Returns the value of the first argument raised to the power of the second argument.
|
static Column |
pow(double l,
Column r)
Returns the value of the first argument raised to the power of the second argument.
|
static Column |
pow(double l,
String rightName)
Returns the value of the first argument raised to the power of the second argument.
|
static Column |
pow(String leftName,
Column r)
Returns the value of the first argument raised to the power of the second argument.
|
static Column |
pow(String leftName,
double r)
Returns the value of the first argument raised to the power of the second argument.
|
static Column |
pow(String leftName,
String rightName)
Returns the value of the first argument raised to the power of the second argument.
|
static Column |
power(Column l,
Column r)
Returns the value of the first argument raised to the power of the second argument.
|
static Column |
printf(Column format,
scala.collection.Seq<Column> arguments)
Formats the arguments in printf-style and returns the result as a string column.
|
static Column |
product(Column e)
Aggregate function: returns the product of all numerical elements in a group.
|
static Column |
quarter(Column e)
Extracts the quarter as an integer from a given date/timestamp/string.
|
static Column |
radians(Column e)
Converts an angle measured in degrees to an approximately equivalent angle measured in radians.
|
static Column |
radians(String columnName)
Converts an angle measured in degrees to an approximately equivalent angle measured in radians.
|
static Column |
raise_error(Column c)
Throws an exception with the provided error message.
|
static Column |
rand()
Generate a random column with independent and identically distributed (i.i.d.) samples
uniformly distributed in [0.0, 1.0).
|
static Column |
rand(long seed)
Generate a random column with independent and identically distributed (i.i.d.) samples
uniformly distributed in [0.0, 1.0).
|
static Column |
randn()
Generate a column with independent and identically distributed (i.i.d.) samples from
the standard normal distribution.
|
static Column |
randn(long seed)
Generate a column with independent and identically distributed (i.i.d.) samples from
the standard normal distribution.
|
static Column |
random()
Returns a random value with independent and identically distributed (i.i.d.) uniformly
distributed values in [0, 1).
|
static Column |
random(Column seed)
Returns a random value with independent and identically distributed (i.i.d.) uniformly
distributed values in [0, 1).
|
static Column |
rank()
Window function: returns the rank of rows within a window partition.
|
static Column |
reduce(Column expr,
Column initialValue,
scala.Function2<Column,Column,Column> merge)
Applies a binary operator to an initial state and all elements in the array,
and reduces this to a single state.
|
static Column |
reduce(Column expr,
Column initialValue,
scala.Function2<Column,Column,Column> merge,
scala.Function1<Column,Column> finish)
Applies a binary operator to an initial state and all elements in the array,
and reduces this to a single state.
|
static Column |
reflect(scala.collection.Seq<Column> cols)
Calls a method with reflection.
|
static Column |
regexp_count(Column str,
Column regexp)
Returns a count of the number of times that the regular expression pattern
regexp
is matched in the string str . |
static Column |
regexp_extract_all(Column str,
Column regexp)
Extract all strings in the
str that match the regexp expression and
corresponding to the first regex group index. |
static Column |
regexp_extract_all(Column str,
Column regexp,
Column idx)
Extract all strings in the
str that match the regexp expression and
corresponding to the regex group index. |
static Column |
regexp_extract(Column e,
String exp,
int groupIdx)
Extract a specific group matched by a Java regex, from the specified string column.
|
static Column |
regexp_instr(Column str,
Column regexp)
Searches a string for a regular expression and returns an integer that indicates
the beginning position of the matched substring.
|
static Column |
regexp_instr(Column str,
Column regexp,
Column idx)
Searches a string for a regular expression and returns an integer that indicates
the beginning position of the matched substring.
|
static Column |
regexp_like(Column str,
Column regexp)
Returns true if
str matches regexp , or false otherwise. |
static Column |
regexp_replace(Column e,
Column pattern,
Column replacement)
Replace all substrings of the specified string value that match regexp with rep.
|
static Column |
regexp_replace(Column e,
String pattern,
String replacement)
Replace all substrings of the specified string value that match regexp with rep.
|
static Column |
regexp_substr(Column str,
Column regexp)
Returns the substring that matches the regular expression
regexp within the string str . |
static Column |
regexp(Column str,
Column regexp)
Returns true if
str matches regexp , or false otherwise. |
static Column |
regr_avgx(Column y,
Column x)
Aggregate function: returns the average of the independent variable for non-null pairs
in a group, where
y is the dependent variable and x is the independent variable. |
static Column |
regr_avgy(Column y,
Column x)
Aggregate function: returns the average of the independent variable for non-null pairs
in a group, where
y is the dependent variable and x is the independent variable. |
static Column |
regr_count(Column y,
Column x)
Aggregate function: returns the number of non-null number pairs
in a group, where
y is the dependent variable and x is the independent variable. |
static Column |
regr_intercept(Column y,
Column x)
Aggregate function: returns the intercept of the univariate linear regression line
for non-null pairs in a group, where
y is the dependent variable and
x is the independent variable. |
static Column |
regr_r2(Column y,
Column x)
Aggregate function: returns the coefficient of determination for non-null pairs
in a group, where
y is the dependent variable and x is the independent variable. |
static Column |
regr_slope(Column y,
Column x)
Aggregate function: returns the slope of the linear regression line for non-null pairs
in a group, where
y is the dependent variable and x is the independent variable. |
static Column |
regr_sxx(Column y,
Column x)
Aggregate function: returns REGR_COUNT(y, x) * VAR_POP(x) for non-null pairs
in a group, where
y is the dependent variable and x is the independent variable. |
static Column |
regr_sxy(Column y,
Column x)
Aggregate function: returns REGR_COUNT(y, x) * COVAR_POP(y, x) for non-null pairs
in a group, where
y is the dependent variable and x is the independent variable. |
static Column |
regr_syy(Column y,
Column x)
Aggregate function: returns REGR_COUNT(y, x) * VAR_POP(y) for non-null pairs
in a group, where
y is the dependent variable and x is the independent variable. |
static Column |
repeat(Column str,
int n)
Repeats a string column n times, and returns it as a new string column.
|
static Column |
replace(Column src,
Column search)
Replaces all occurrences of
search with replace . |
static Column |
replace(Column src,
Column search,
Column replace)
Replaces all occurrences of
search with replace . |
static Column |
reverse(Column e)
Returns a reversed string or an array with reverse order of elements.
|
static Column |
right(Column str,
Column len)
Returns the rightmost
len (len can be string type) characters from the string str ,
if len is less or equal than 0 the result is an empty string. |
static Column |
rint(Column e)
Returns the double value that is closest in value to the argument and
is equal to a mathematical integer.
|
static Column |
rint(String columnName)
Returns the double value that is closest in value to the argument and
is equal to a mathematical integer.
|
static Column |
rlike(Column str,
Column regexp)
Returns true if
str matches regexp , or false otherwise. |
static Column |
round(Column e)
Returns the value of the column
e rounded to 0 decimal places with HALF_UP round mode. |
static Column |
round(Column e,
int scale)
Round the value of
e to scale decimal places with HALF_UP round mode
if scale is greater than or equal to 0 or at integral part when scale is less than 0. |
static Column |
row_number()
Window function: returns a sequential number starting at 1 within a window partition.
|
static Column |
rpad(Column str,
int len,
byte[] pad)
Right-pad the binary column with pad to a byte length of len.
|
static Column |
rpad(Column str,
int len,
String pad)
Right-pad the string column with pad to a length of len.
|
static Column |
rtrim(Column e)
Trim the spaces from right end for the specified string value.
|
static Column |
rtrim(Column e,
String trimString)
Trim the specified character string from right end for the specified string column.
|
static Column |
schema_of_csv(Column csv)
Parses a CSV string and infers its schema in DDL format.
|
static Column |
schema_of_csv(Column csv,
java.util.Map<String,String> options)
Parses a CSV string and infers its schema in DDL format using options.
|
static Column |
schema_of_csv(String csv)
Parses a CSV string and infers its schema in DDL format.
|
static Column |
schema_of_json(Column json)
Parses a JSON string and infers its schema in DDL format.
|
static Column |
schema_of_json(Column json,
java.util.Map<String,String> options)
Parses a JSON string and infers its schema in DDL format using options.
|
static Column |
schema_of_json(String json)
Parses a JSON string and infers its schema in DDL format.
|
static Column |
sec(Column e) |
static Column |
second(Column e)
Extracts the seconds as an integer from a given date/timestamp/string.
|
static Column |
sentences(Column string)
Splits a string into arrays of sentences, where each sentence is an array of words.
|
static Column |
sentences(Column string,
Column language,
Column country)
Splits a string into arrays of sentences, where each sentence is an array of words.
|
static Column |
sequence(Column start,
Column stop)
Generate a sequence of integers from start to stop,
incrementing by 1 if start is less than or equal to stop, otherwise -1.
|
static Column |
sequence(Column start,
Column stop,
Column step)
Generate a sequence of integers from start to stop, incrementing by step.
|
static Column |
session_window(Column timeColumn,
Column gapDuration)
Generates session window given a timestamp specifying column.
|
static Column |
session_window(Column timeColumn,
String gapDuration)
Generates session window given a timestamp specifying column.
|
static Column |
sha(Column col)
Returns a sha1 hash value as a hex string of the
col . |
static Column |
sha1(Column e)
Calculates the SHA-1 digest of a binary column and returns the value
as a 40 character hex string.
|
static Column |
sha2(Column e,
int numBits)
Calculates the SHA-2 family of hash functions of a binary column and
returns the value as a hex string.
|
static Column |
shiftleft(Column e,
int numBits)
Shift the given value numBits left.
|
static Column |
shiftLeft(Column e,
int numBits)
Deprecated.
Use shiftleft. Since 3.2.0.
|
static Column |
shiftright(Column e,
int numBits)
(Signed) shift the given value numBits right.
|
static Column |
shiftRight(Column e,
int numBits)
Deprecated.
Use shiftright. Since 3.2.0.
|
static Column |
shiftrightunsigned(Column e,
int numBits)
Unsigned shift the given value numBits right.
|
static Column |
shiftRightUnsigned(Column e,
int numBits)
Deprecated.
Use shiftrightunsigned. Since 3.2.0.
|
static Column |
shuffle(Column e)
Returns a random permutation of the given array.
|
static Column |
sign(Column e)
Computes the signum of the given value.
|
static Column |
signum(Column e)
Computes the signum of the given value.
|
static Column |
signum(String columnName)
Computes the signum of the given column.
|
static Column |
sin(Column e) |
static Column |
sin(String columnName) |
static Column |
sinh(Column e) |
static Column |
sinh(String columnName) |
static Column |
size(Column e)
Returns length of array or map.
|
static Column |
skewness(Column e)
Aggregate function: returns the skewness of the values in a group.
|
static Column |
skewness(String columnName)
Aggregate function: returns the skewness of the values in a group.
|
static Column |
slice(Column x,
Column start,
Column length)
Returns an array containing all the elements in
x from index start (or starting from the
end if start is negative) with the specified length . |
static Column |
slice(Column x,
int start,
int length)
Returns an array containing all the elements in
x from index start (or starting from the
end if start is negative) with the specified length . |
static Column |
some(Column e)
Aggregate function: returns true if at least one value of
e is true. |
static Column |
sort_array(Column e)
Sorts the input array for the given column in ascending order,
according to the natural ordering of the array elements.
|
static Column |
sort_array(Column e,
boolean asc)
Sorts the input array for the given column in ascending or descending order,
according to the natural ordering of the array elements.
|
static Column |
soundex(Column e)
Returns the soundex code for the specified expression.
|
static Column |
spark_partition_id()
Partition ID.
|
static Column |
split_part(Column str,
Column delimiter,
Column partNum)
Splits
str by delimiter and return requested part of the split (1-based). |
static Column |
split(Column str,
String pattern)
Splits str around matches of the given pattern.
|
static Column |
split(Column str,
String pattern,
int limit)
Splits str around matches of the given pattern.
|
static Column |
sqrt(Column e)
Computes the square root of the specified float value.
|
static Column |
sqrt(String colName)
Computes the square root of the specified float value.
|
static Column |
stack(scala.collection.Seq<Column> cols)
Separates
col1 , ..., colk into n rows. |
static Column |
startswith(Column str,
Column prefix)
Returns a boolean.
|
static Column |
std(Column e)
Aggregate function: alias for
stddev_samp . |
static Column |
stddev_pop(Column e)
Aggregate function: returns the population standard deviation of
the expression in a group.
|
static Column |
stddev_pop(String columnName)
Aggregate function: returns the population standard deviation of
the expression in a group.
|
static Column |
stddev_samp(Column e)
Aggregate function: returns the sample standard deviation of
the expression in a group.
|
static Column |
stddev_samp(String columnName)
Aggregate function: returns the sample standard deviation of
the expression in a group.
|
static Column |
stddev(Column e)
Aggregate function: alias for
stddev_samp . |
static Column |
stddev(String columnName)
Aggregate function: alias for
stddev_samp . |
static Column |
str_to_map(Column text)
Creates a map after splitting the text into key/value pairs using delimiters.
|
static Column |
str_to_map(Column text,
Column pairDelim)
Creates a map after splitting the text into key/value pairs using delimiters.
|
static Column |
str_to_map(Column text,
Column pairDelim,
Column keyValueDelim)
Creates a map after splitting the text into key/value pairs using delimiters.
|
static Column |
struct(Column... cols)
Creates a new struct column.
|
static Column |
struct(scala.collection.Seq<Column> cols)
Creates a new struct column.
|
static Column |
struct(String colName,
scala.collection.Seq<String> colNames)
Creates a new struct column that composes multiple input columns.
|
static Column |
struct(String colName,
String... colNames)
Creates a new struct column that composes multiple input columns.
|
static Column |
substr(Column str,
Column pos)
Returns the substring of
str that starts at pos ,
or the slice of byte array that starts at pos . |
static Column |
substr(Column str,
Column pos,
Column len)
Returns the substring of
str that starts at pos and is of length len ,
or the slice of byte array that starts at pos and is of length len . |
static Column |
substring_index(Column str,
String delim,
int count)
Returns the substring from string str before count occurrences of the delimiter delim.
|
static Column |
substring(Column str,
int pos,
int len)
Substring starts at
pos and is of length len when str is String type or
returns the slice of byte array that starts at pos in byte and is of length len
when str is Binary type |
static Column |
sum_distinct(Column e)
Aggregate function: returns the sum of distinct values in the expression.
|
static Column |
sum(Column e)
Aggregate function: returns the sum of all values in the expression.
|
static Column |
sum(String columnName)
Aggregate function: returns the sum of all values in the given column.
|
static Column |
sumDistinct(Column e)
Deprecated.
Use sum_distinct. Since 3.2.0.
|
static Column |
sumDistinct(String columnName)
Deprecated.
Use sum_distinct. Since 3.2.0.
|
static Column |
tan(Column e) |
static Column |
tan(String columnName) |
static Column |
tanh(Column e) |
static Column |
tanh(String columnName) |
static Column |
timestamp_micros(Column e)
Creates timestamp from the number of microseconds since UTC epoch.
|
static Column |
timestamp_millis(Column e)
Creates timestamp from the number of milliseconds since UTC epoch.
|
static Column |
timestamp_seconds(Column e)
Converts the number of seconds from the Unix epoch (1970-01-01T00:00:00Z)
to a timestamp.
|
static Column |
to_binary(Column e)
Converts the input
e to a binary value based on the default format "hex". |
static Column |
to_binary(Column e,
Column format)
Converts the input
e to a binary value based on the supplied format . |
static Column |
to_char(Column e,
Column format)
Convert
e to a string based on the format . |
static Column |
to_csv(Column e)
Converts a column containing a
StructType into a CSV string with the specified schema. |
static Column |
to_csv(Column e,
java.util.Map<String,String> options)
(Java-specific) Converts a column containing a
StructType into a CSV string with
the specified schema. |
static Column |
to_date(Column e)
Converts the column into
DateType by casting rules to DateType . |
static Column |
to_date(Column e,
String fmt)
Converts the column into a
DateType with a specified format |
static Column |
to_json(Column e)
Converts a column containing a
StructType , ArrayType or
a MapType into a JSON string with the specified schema. |
static Column |
to_json(Column e,
scala.collection.immutable.Map<String,String> options)
(Scala-specific) Converts a column containing a
StructType , ArrayType or
a MapType into a JSON string with the specified schema. |
static Column |
to_json(Column e,
java.util.Map<String,String> options)
(Java-specific) Converts a column containing a
StructType , ArrayType or
a MapType into a JSON string with the specified schema. |
static Column |
to_number(Column e,
Column format)
Convert string 'e' to a number based on the string format 'format'.
|
static Column |
to_timestamp_ltz(Column timestamp)
Parses the
timestamp expression with the default format to a timestamp without time zone. |
static Column |
to_timestamp_ltz(Column timestamp,
Column format)
Parses the
timestamp expression with the format expression
to a timestamp without time zone. |
static Column |
to_timestamp_ntz(Column timestamp)
Parses the
timestamp expression with the default format to a timestamp without time zone. |
static Column |
to_timestamp_ntz(Column timestamp,
Column format)
Parses the
timestamp_str expression with the format expression
to a timestamp without time zone. |
static Column |
to_timestamp(Column s)
Converts to a timestamp by casting rules to
TimestampType . |
static Column |
to_timestamp(Column s,
String fmt)
Converts time string with the given pattern to timestamp.
|
static Column |
to_unix_timestamp(Column e)
Returns the UNIX timestamp of the given time.
|
static Column |
to_unix_timestamp(Column e,
Column format)
Returns the UNIX timestamp of the given time.
|
static Column |
to_utc_timestamp(Column ts,
Column tz)
Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time
zone, and renders that time as a timestamp in UTC.
|
static Column |
to_utc_timestamp(Column ts,
String tz)
Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time
zone, and renders that time as a timestamp in UTC.
|
static Column |
to_varchar(Column e,
Column format)
Convert
e to a string based on the format . |
static Column |
toDegrees(Column e)
Deprecated.
Use degrees. Since 2.1.0.
|
static Column |
toDegrees(String columnName)
Deprecated.
Use degrees. Since 2.1.0.
|
static Column |
toRadians(Column e)
Deprecated.
Use radians. Since 2.1.0.
|
static Column |
toRadians(String columnName)
Deprecated.
Use radians. Since 2.1.0.
|
static Column |
transform_keys(Column expr,
scala.Function2<Column,Column,Column> f)
Applies a function to every key-value pair in a map and returns
a map with the results of those applications as the new keys for the pairs.
|
static Column |
transform_values(Column expr,
scala.Function2<Column,Column,Column> f)
Applies a function to every key-value pair in a map and returns
a map with the results of those applications as the new values for the pairs.
|
static Column |
transform(Column column,
scala.Function1<Column,Column> f)
Returns an array of elements after applying a transformation to each element
in the input array.
|
static Column |
transform(Column column,
scala.Function2<Column,Column,Column> f)
Returns an array of elements after applying a transformation to each element
in the input array.
|
static Column |
translate(Column src,
String matchingString,
String replaceString)
Translate any character in the src by a character in replaceString.
|
static Column |
trim(Column e)
Trim the spaces from both ends for the specified string column.
|
static Column |
trim(Column e,
String trimString)
Trim the specified character from both ends for the specified string column.
|
static Column |
trunc(Column date,
String format)
Returns date truncated to the unit specified by the format.
|
static Column |
try_add(Column left,
Column right)
Returns the sum of
left and right and the result is null on overflow. |
static Column |
try_aes_decrypt(Column input,
Column key)
Returns a decrypted value of
input . |
static Column |
try_aes_decrypt(Column input,
Column key,
Column mode)
Returns a decrypted value of
input . |
static Column |
try_aes_decrypt(Column input,
Column key,
Column mode,
Column padding)
Returns a decrypted value of
input . |
static Column |
try_aes_decrypt(Column input,
Column key,
Column mode,
Column padding,
Column aad)
This is a special version of
aes_decrypt that performs the same operation, but returns a
NULL value instead of raising an error if the decryption cannot be performed. |
static Column |
try_avg(Column e)
Returns the mean calculated from values of a group and the result is null on overflow.
|
static Column |
try_divide(Column dividend,
Column divisor)
Returns
dividend / divisor . |
static Column |
try_element_at(Column column,
Column value)
(array, index) - Returns element of array at given (1-based) index.
|
static Column |
try_multiply(Column left,
Column right)
Returns
left * right and the result is null on overflow. |
static Column |
try_subtract(Column left,
Column right)
Returns
left - right and the result is null on overflow. |
static Column |
try_sum(Column e)
Returns the sum calculated from values of a group and the result is null on overflow.
|
static Column |
try_to_binary(Column e)
This is a special version of
to_binary that performs the same operation, but returns a NULL
value instead of raising an error if the conversion cannot be performed. |
static Column |
try_to_binary(Column e,
Column format)
This is a special version of
to_binary that performs the same operation, but returns a NULL
value instead of raising an error if the conversion cannot be performed. |
static Column |
try_to_number(Column e,
Column format)
Convert string
e to a number based on the string format format . |
static Column |
try_to_timestamp(Column s)
Parses the
s to a timestamp. |
static Column |
try_to_timestamp(Column s,
Column format)
Parses the
s with the format to a timestamp. |
static <T> Column |
typedlit(T literal,
scala.reflect.api.TypeTags.TypeTag<T> evidence$2)
Creates a
Column of literal value. |
static <T> Column |
typedLit(T literal,
scala.reflect.api.TypeTags.TypeTag<T> evidence$1)
Creates a
Column of literal value. |
static Column |
typeof(Column col)
Return DDL-formatted type string for the data type of the input.
|
static Column |
ucase(Column str)
Returns
str with all characters changed to uppercase. |
static <IN,BUF,OUT> |
udaf(Aggregator<IN,BUF,OUT> agg,
Encoder<IN> inputEncoder)
Obtains a
UserDefinedFunction that wraps the given Aggregator
so that it may be used with untyped Data Frames. |
static <IN,BUF,OUT> |
udaf(Aggregator<IN,BUF,OUT> agg,
scala.reflect.api.TypeTags.TypeTag<IN> evidence$3)
Obtains a
UserDefinedFunction that wraps the given Aggregator
so that it may be used with untyped Data Frames. |
static <RT> UserDefinedFunction |
udf(scala.Function0<RT> f,
scala.reflect.api.TypeTags.TypeTag<RT> evidence$4)
Defines a Scala closure of 0 arguments as user-defined function (UDF).
|
static <RT,A1> UserDefinedFunction |
udf(scala.Function1<A1,RT> f,
scala.reflect.api.TypeTags.TypeTag<RT> evidence$5,
scala.reflect.api.TypeTags.TypeTag<A1> evidence$6)
Defines a Scala closure of 1 arguments as user-defined function (UDF).
|
static <RT,A1,A2,A3,A4,A5,A6,A7,A8,A9,A10> |
udf(scala.Function10<A1,A2,A3,A4,A5,A6,A7,A8,A9,A10,RT> f,
scala.reflect.api.TypeTags.TypeTag<RT> evidence$59,
scala.reflect.api.TypeTags.TypeTag<A1> evidence$60,
scala.reflect.api.TypeTags.TypeTag<A2> evidence$61,
scala.reflect.api.TypeTags.TypeTag<A3> evidence$62,
scala.reflect.api.TypeTags.TypeTag<A4> evidence$63,
scala.reflect.api.TypeTags.TypeTag<A5> evidence$64,
scala.reflect.api.TypeTags.TypeTag<A6> evidence$65,
scala.reflect.api.TypeTags.TypeTag<A7> evidence$66,
scala.reflect.api.TypeTags.TypeTag<A8> evidence$67,
scala.reflect.api.TypeTags.TypeTag<A9> evidence$68,
scala.reflect.api.TypeTags.TypeTag<A10> evidence$69)
Defines a Scala closure of 10 arguments as user-defined function (UDF).
|
static <RT,A1,A2> UserDefinedFunction |
udf(scala.Function2<A1,A2,RT> f,
scala.reflect.api.TypeTags.TypeTag<RT> evidence$7,
scala.reflect.api.TypeTags.TypeTag<A1> evidence$8,
scala.reflect.api.TypeTags.TypeTag<A2> evidence$9)
Defines a Scala closure of 2 arguments as user-defined function (UDF).
|
static <RT,A1,A2,A3> |
udf(scala.Function3<A1,A2,A3,RT> f,
scala.reflect.api.TypeTags.TypeTag<RT> evidence$10,
scala.reflect.api.TypeTags.TypeTag<A1> evidence$11,
scala.reflect.api.TypeTags.TypeTag<A2> evidence$12,
scala.reflect.api.TypeTags.TypeTag<A3> evidence$13)
Defines a Scala closure of 3 arguments as user-defined function (UDF).
|
static <RT,A1,A2,A3,A4> |
udf(scala.Function4<A1,A2,A3,A4,RT> f,
scala.reflect.api.TypeTags.TypeTag<RT> evidence$14,
scala.reflect.api.TypeTags.TypeTag<A1> evidence$15,
scala.reflect.api.TypeTags.TypeTag<A2> evidence$16,
scala.reflect.api.TypeTags.TypeTag<A3> evidence$17,
scala.reflect.api.TypeTags.TypeTag<A4> evidence$18)
Defines a Scala closure of 4 arguments as user-defined function (UDF).
|
static <RT,A1,A2,A3,A4,A5> |
udf(scala.Function5<A1,A2,A3,A4,A5,RT> f,
scala.reflect.api.TypeTags.TypeTag<RT> evidence$19,
scala.reflect.api.TypeTags.TypeTag<A1> evidence$20,
scala.reflect.api.TypeTags.TypeTag<A2> evidence$21,
scala.reflect.api.TypeTags.TypeTag<A3> evidence$22,
scala.reflect.api.TypeTags.TypeTag<A4> evidence$23,
scala.reflect.api.TypeTags.TypeTag<A5> evidence$24)
Defines a Scala closure of 5 arguments as user-defined function (UDF).
|
static <RT,A1,A2,A3,A4,A5,A6> |
udf(scala.Function6<A1,A2,A3,A4,A5,A6,RT> f,
scala.reflect.api.TypeTags.TypeTag<RT> evidence$25,
scala.reflect.api.TypeTags.TypeTag<A1> evidence$26,
scala.reflect.api.TypeTags.TypeTag<A2> evidence$27,
scala.reflect.api.TypeTags.TypeTag<A3> evidence$28,
scala.reflect.api.TypeTags.TypeTag<A4> evidence$29,
scala.reflect.api.TypeTags.TypeTag<A5> evidence$30,
scala.reflect.api.TypeTags.TypeTag<A6> evidence$31)
Defines a Scala closure of 6 arguments as user-defined function (UDF).
|
static <RT,A1,A2,A3,A4,A5,A6,A7> |
udf(scala.Function7<A1,A2,A3,A4,A5,A6,A7,RT> f,
scala.reflect.api.TypeTags.TypeTag<RT> evidence$32,
scala.reflect.api.TypeTags.TypeTag<A1> evidence$33,
scala.reflect.api.TypeTags.TypeTag<A2> evidence$34,
scala.reflect.api.TypeTags.TypeTag<A3> evidence$35,
scala.reflect.api.TypeTags.TypeTag<A4> evidence$36,
scala.reflect.api.TypeTags.TypeTag<A5> evidence$37,
scala.reflect.api.TypeTags.TypeTag<A6> evidence$38,
scala.reflect.api.TypeTags.TypeTag<A7> evidence$39)
Defines a Scala closure of 7 arguments as user-defined function (UDF).
|
static <RT,A1,A2,A3,A4,A5,A6,A7,A8> |
udf(scala.Function8<A1,A2,A3,A4,A5,A6,A7,A8,RT> f,
scala.reflect.api.TypeTags.TypeTag<RT> evidence$40,
scala.reflect.api.TypeTags.TypeTag<A1> evidence$41,
scala.reflect.api.TypeTags.TypeTag<A2> evidence$42,
scala.reflect.api.TypeTags.TypeTag<A3> evidence$43,
scala.reflect.api.TypeTags.TypeTag<A4> evidence$44,
scala.reflect.api.TypeTags.TypeTag<A5> evidence$45,
scala.reflect.api.TypeTags.TypeTag<A6> evidence$46,
scala.reflect.api.TypeTags.TypeTag<A7> evidence$47,
scala.reflect.api.TypeTags.TypeTag<A8> evidence$48)
Defines a Scala closure of 8 arguments as user-defined function (UDF).
|
static <RT,A1,A2,A3,A4,A5,A6,A7,A8,A9> |
udf(scala.Function9<A1,A2,A3,A4,A5,A6,A7,A8,A9,RT> f,
scala.reflect.api.TypeTags.TypeTag<RT> evidence$49,
scala.reflect.api.TypeTags.TypeTag<A1> evidence$50,
scala.reflect.api.TypeTags.TypeTag<A2> evidence$51,
scala.reflect.api.TypeTags.TypeTag<A3> evidence$52,
scala.reflect.api.TypeTags.TypeTag<A4> evidence$53,
scala.reflect.api.TypeTags.TypeTag<A5> evidence$54,
scala.reflect.api.TypeTags.TypeTag<A6> evidence$55,
scala.reflect.api.TypeTags.TypeTag<A7> evidence$56,
scala.reflect.api.TypeTags.TypeTag<A8> evidence$57,
scala.reflect.api.TypeTags.TypeTag<A9> evidence$58)
Defines a Scala closure of 9 arguments as user-defined function (UDF).
|
static UserDefinedFunction |
udf(Object f,
DataType dataType)
Deprecated.
Scala `udf` method with return type parameter is deprecated. Please use Scala `udf` method without return type parameter. Since 3.0.0.
|
static UserDefinedFunction |
udf(UDF0<?> f,
DataType returnType)
Defines a Java UDF0 instance as user-defined function (UDF).
|
static UserDefinedFunction |
udf(UDF1<?,?> f,
DataType returnType)
Defines a Java UDF1 instance as user-defined function (UDF).
|
static UserDefinedFunction |
udf(UDF10<?,?,?,?,?,?,?,?,?,?,?> f,
DataType returnType)
Defines a Java UDF10 instance as user-defined function (UDF).
|
static UserDefinedFunction |
udf(UDF2<?,?,?> f,
DataType returnType)
Defines a Java UDF2 instance as user-defined function (UDF).
|
static UserDefinedFunction |
udf(UDF3<?,?,?,?> f,
DataType returnType)
Defines a Java UDF3 instance as user-defined function (UDF).
|
static UserDefinedFunction |
udf(UDF4<?,?,?,?,?> f,
DataType returnType)
Defines a Java UDF4 instance as user-defined function (UDF).
|
static UserDefinedFunction |
udf(UDF5<?,?,?,?,?,?> f,
DataType returnType)
Defines a Java UDF5 instance as user-defined function (UDF).
|
static UserDefinedFunction |
udf(UDF6<?,?,?,?,?,?,?> f,
DataType returnType)
Defines a Java UDF6 instance as user-defined function (UDF).
|
static UserDefinedFunction |
udf(UDF7<?,?,?,?,?,?,?,?> f,
DataType returnType)
Defines a Java UDF7 instance as user-defined function (UDF).
|
static UserDefinedFunction |
udf(UDF8<?,?,?,?,?,?,?,?,?> f,
DataType returnType)
Defines a Java UDF8 instance as user-defined function (UDF).
|
static UserDefinedFunction |
udf(UDF9<?,?,?,?,?,?,?,?,?,?> f,
DataType returnType)
Defines a Java UDF9 instance as user-defined function (UDF).
|
static Column |
unbase64(Column e)
Decodes a BASE64 encoded string column and returns it as a binary column.
|
static Column |
unhex(Column column)
Inverse of hex.
|
static Column |
unix_date(Column e)
Returns the number of days since 1970-01-01.
|
static Column |
unix_micros(Column e)
Returns the number of microseconds since 1970-01-01 00:00:00 UTC.
|
static Column |
unix_millis(Column e)
Returns the number of milliseconds since 1970-01-01 00:00:00 UTC.
|
static Column |
unix_seconds(Column e)
Returns the number of seconds since 1970-01-01 00:00:00 UTC.
|
static Column |
unix_timestamp()
Returns the current Unix timestamp (in seconds) as a long.
|
static Column |
unix_timestamp(Column s)
Converts time string in format yyyy-MM-dd HH:mm:ss to Unix timestamp (in seconds),
using the default timezone and the default locale.
|
static Column |
unix_timestamp(Column s,
String p)
Converts time string with given pattern to Unix timestamp (in seconds).
|
static Column |
unwrap_udt(Column column)
Unwrap UDT data type column into its underlying type.
|
static Column |
upper(Column e)
Converts a string column to upper case.
|
static Column |
url_decode(Column str)
Decodes a
str in 'application/x-www-form-urlencoded' format
using a specific encoding scheme. |
static Column |
url_encode(Column str)
Translates a string into 'application/x-www-form-urlencoded' format
using a specific encoding scheme.
|
static Column |
user()
Returns the user name of current execution context.
|
static Column |
uuid()
Returns an universally unique identifier (UUID) string.
|
static Column |
var_pop(Column e)
Aggregate function: returns the population variance of the values in a group.
|
static Column |
var_pop(String columnName)
Aggregate function: returns the population variance of the values in a group.
|
static Column |
var_samp(Column e)
Aggregate function: returns the unbiased variance of the values in a group.
|
static Column |
var_samp(String columnName)
Aggregate function: returns the unbiased variance of the values in a group.
|
static Column |
variance(Column e)
Aggregate function: alias for
var_samp . |
static Column |
variance(String columnName)
Aggregate function: alias for
var_samp . |
static Column |
version()
Returns the Spark version.
|
static Column |
weekday(Column e)
Returns the day of the week for date/timestamp (0 = Monday, 1 = Tuesday, ..., 6 = Sunday).
|
static Column |
weekofyear(Column e)
Extracts the week number as an integer from a given date/timestamp/string.
|
static Column |
when(Column condition,
Object value)
Evaluates a list of conditions and returns one of multiple possible result expressions.
|
static Column |
width_bucket(Column v,
Column min,
Column max,
Column numBucket)
Returns the bucket number into which the value of this expression would fall
after being evaluated.
|
static Column |
window_time(Column windowColumn)
Extracts the event time from the window column.
|
static Column |
window(Column timeColumn,
String windowDuration)
Generates tumbling time windows given a timestamp specifying column.
|
static Column |
window(Column timeColumn,
String windowDuration,
String slideDuration)
Bucketize rows into one or more time windows given a timestamp specifying column.
|
static Column |
window(Column timeColumn,
String windowDuration,
String slideDuration,
String startTime)
Bucketize rows into one or more time windows given a timestamp specifying column.
|
static Column |
xpath_boolean(Column x,
Column p)
Returns true if the XPath expression evaluates to true, or if a matching node is found.
|
static Column |
xpath_double(Column x,
Column p)
Returns a double value, the value zero if no match is found,
or NaN if a match is found but the value is non-numeric.
|
static Column |
xpath_float(Column x,
Column p)
Returns a float value, the value zero if no match is found,
or NaN if a match is found but the value is non-numeric.
|
static Column |
xpath_int(Column x,
Column p)
Returns an integer value, or the value zero if no match is found,
or a match is found but the value is non-numeric.
|
static Column |
xpath_long(Column x,
Column p)
Returns a long integer value, or the value zero if no match is found,
or a match is found but the value is non-numeric.
|
static Column |
xpath_number(Column x,
Column p)
Returns a double value, the value zero if no match is found,
or NaN if a match is found but the value is non-numeric.
|
static Column |
xpath_short(Column x,
Column p)
Returns a short integer value, or the value zero if no match is found,
or a match is found but the value is non-numeric.
|
static Column |
xpath_string(Column x,
Column p)
Returns the text contents of the first xml node that matches the XPath expression.
|
static Column |
xpath(Column x,
Column p)
Returns a string array of values within the nodes of xml that match the XPath expression.
|
static Column |
xxhash64(Column... cols)
Calculates the hash code of given columns using the 64-bit
variant of the xxHash algorithm, and returns the result as a long
column.
|
static Column |
xxhash64(scala.collection.Seq<Column> cols)
Calculates the hash code of given columns using the 64-bit
variant of the xxHash algorithm, and returns the result as a long
column.
|
static Column |
year(Column e)
Extracts the year as an integer from a given date/timestamp/string.
|
static Column |
years(Column e)
A transform for timestamps and dates to partition data into years.
|
static Column |
zip_with(Column left,
Column right,
scala.Function2<Column,Column,Column> f)
Merge two given arrays, element-wise, into a single array using a function.
|
public static Column countDistinct(Column expr, Column... exprs)
An alias of count_distinct
, and it is encouraged to use count_distinct
directly.
expr
- (undocumented)exprs
- (undocumented)public static Column countDistinct(String columnName, String... columnNames)
An alias of count_distinct
, and it is encouraged to use count_distinct
directly.
columnName
- (undocumented)columnNames
- (undocumented)public static Column count_distinct(Column expr, Column... exprs)
expr
- (undocumented)exprs
- (undocumented)public static Column array(Column... cols)
cols
- (undocumented)public static Column array(String colName, String... colNames)
colName
- (undocumented)colNames
- (undocumented)public static Column map(Column... cols)
cols
- (undocumented)public static Column coalesce(Column... e)
For example, coalesce(a, b, c)
will return a if a is not null,
or b if a is null and b is not null, or c if both a and b are null but c is not null.
e
- (undocumented)public static Column struct(Column... cols)
DataFrame
, or a derived column expression
that is named (i.e. aliased), its name would be retained as the StructField's name,
otherwise, the newly generated StructField's name would be auto generated as
col
with a suffix index + 1
, i.e. col1, col2, col3, ...
cols
- (undocumented)public static Column struct(String colName, String... colNames)
colName
- (undocumented)colNames
- (undocumented)public static Column greatest(Column... exprs)
exprs
- (undocumented)public static Column greatest(String columnName, String... columnNames)
columnName
- (undocumented)columnNames
- (undocumented)public static Column least(Column... exprs)
exprs
- (undocumented)public static Column least(String columnName, String... columnNames)
columnName
- (undocumented)columnNames
- (undocumented)public static Column hash(Column... cols)
cols
- (undocumented)public static Column xxhash64(Column... cols)
cols
- (undocumented)public static Column concat_ws(String sep, Column... exprs)
sep
- (undocumented)exprs
- (undocumented)public static Column format_string(String format, Column... arguments)
format
- (undocumented)arguments
- (undocumented)public static Column elt(Column... inputs)
n
-th input, e.g., returns input2
when n
is 2.
The function returns NULL if the index exceeds the length of the array
and spark.sql.ansi.enabled
is set to false. If spark.sql.ansi.enabled
is set to true,
it throws ArrayIndexOutOfBoundsException for invalid indices.
inputs
- (undocumented)public static Column concat(Column... exprs)
exprs
- (undocumented)public static Column json_tuple(Column json, String... fields)
json
- (undocumented)fields
- (undocumented)public static Column arrays_zip(Column... e)
e
- (undocumented)public static Column map_concat(Column... cols)
cols
- (undocumented)public static Column callUDF(String udfName, Column... cols)
udfName
- (undocumented)cols
- (undocumented)public static Column call_udf(String udfName, Column... cols)
import org.apache.spark.sql._
val df = Seq(("id1", 1), ("id2", 4), ("id3", 5)).toDF("id", "value")
val spark = df.sparkSession
spark.udf.register("simpleUDF", (v: Int) => v * v)
df.select($"id", call_udf("simpleUDF", $"value"))
udfName
- (undocumented)cols
- (undocumented)public static Column call_function(String funcName, Column... cols)
funcName
- function name that follows the SQL identifier syntax
(can be quoted, can be qualified)cols
- the expression parameters of functionpublic static Column col(String colName)
Column
based on the given column name.
colName
- (undocumented)public static Column column(String colName)
Column
based on the given column name. Alias of col
.
colName
- (undocumented)public static Column lit(Object literal)
Column
of literal value.
The passed in object is returned directly if it is already a Column
.
If the object is a Scala Symbol, it is converted into a Column
also.
Otherwise, a new Column
is created to represent the literal value.
literal
- (undocumented)public static <T> Column typedLit(T literal, scala.reflect.api.TypeTags.TypeTag<T> evidence$1)
Column
of literal value.
An alias of typedlit
, and it is encouraged to use typedlit
directly.
literal
- (undocumented)evidence$1
- (undocumented)public static <T> Column typedlit(T literal, scala.reflect.api.TypeTags.TypeTag<T> evidence$2)
Column
of literal value.
The passed in object is returned directly if it is already a Column
.
If the object is a Scala Symbol, it is converted into a Column
also.
Otherwise, a new Column
is created to represent the literal value.
The difference between this function and lit
is that this function
can handle parameterized scala types e.g.: List, Seq and Map.
literal
- (undocumented)evidence$2
- (undocumented)typedlit
will call expensive Scala reflection APIs. lit
is preferred if parameterized
Scala types are not used.
public static Column asc(String columnName)
df.sort(asc("dept"), desc("age"))
columnName
- (undocumented)public static Column asc_nulls_first(String columnName)
df.sort(asc_nulls_first("dept"), desc("age"))
columnName
- (undocumented)public static Column asc_nulls_last(String columnName)
df.sort(asc_nulls_last("dept"), desc("age"))
columnName
- (undocumented)public static Column desc(String columnName)
df.sort(asc("dept"), desc("age"))
columnName
- (undocumented)public static Column desc_nulls_first(String columnName)
df.sort(asc("dept"), desc_nulls_first("age"))
columnName
- (undocumented)public static Column desc_nulls_last(String columnName)
df.sort(asc("dept"), desc_nulls_last("age"))
columnName
- (undocumented)public static Column approxCountDistinct(Column e)
e
- (undocumented)public static Column approxCountDistinct(String columnName)
columnName
- (undocumented)public static Column approxCountDistinct(Column e, double rsd)
e
- (undocumented)rsd
- (undocumented)public static Column approxCountDistinct(String columnName, double rsd)
columnName
- (undocumented)rsd
- (undocumented)public static Column approx_count_distinct(Column e)
e
- (undocumented)public static Column approx_count_distinct(String columnName)
columnName
- (undocumented)public static Column approx_count_distinct(Column e, double rsd)
rsd
- maximum relative standard deviation allowed (default = 0.05)
e
- (undocumented)public static Column approx_count_distinct(String columnName, double rsd)
rsd
- maximum relative standard deviation allowed (default = 0.05)
columnName
- (undocumented)public static Column avg(Column e)
e
- (undocumented)public static Column avg(String columnName)
columnName
- (undocumented)public static Column collect_list(Column e)
e
- (undocumented)public static Column collect_list(String columnName)
columnName
- (undocumented)public static Column collect_set(Column e)
e
- (undocumented)public static Column collect_set(String columnName)
columnName
- (undocumented)public static Column count_min_sketch(Column e, Column eps, Column confidence, Column seed)
CountMinSketch
before usage.
Count-min sketch is a probabilistic data structure used for cardinality estimation using
sub-linear space.
e
- (undocumented)eps
- (undocumented)confidence
- (undocumented)seed
- (undocumented)public static Column corr(Column column1, Column column2)
column1
- (undocumented)column2
- (undocumented)public static Column corr(String columnName1, String columnName2)
columnName1
- (undocumented)columnName2
- (undocumented)public static Column count(Column e)
e
- (undocumented)public static TypedColumn<Object,Object> count(String columnName)
columnName
- (undocumented)public static Column countDistinct(Column expr, scala.collection.Seq<Column> exprs)
An alias of count_distinct
, and it is encouraged to use count_distinct
directly.
expr
- (undocumented)exprs
- (undocumented)public static Column countDistinct(String columnName, scala.collection.Seq<String> columnNames)
An alias of count_distinct
, and it is encouraged to use count_distinct
directly.
columnName
- (undocumented)columnNames
- (undocumented)public static Column count_distinct(Column expr, scala.collection.Seq<Column> exprs)
expr
- (undocumented)exprs
- (undocumented)public static Column covar_pop(Column column1, Column column2)
column1
- (undocumented)column2
- (undocumented)public static Column covar_pop(String columnName1, String columnName2)
columnName1
- (undocumented)columnName2
- (undocumented)public static Column covar_samp(Column column1, Column column2)
column1
- (undocumented)column2
- (undocumented)public static Column covar_samp(String columnName1, String columnName2)
columnName1
- (undocumented)columnName2
- (undocumented)public static Column first(Column e, boolean ignoreNulls)
The function by default returns the first values it sees. It will return the first non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
e
- (undocumented)ignoreNulls
- (undocumented)public static Column first(String columnName, boolean ignoreNulls)
The function by default returns the first values it sees. It will return the first non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
columnName
- (undocumented)ignoreNulls
- (undocumented)public static Column first(Column e)
The function by default returns the first values it sees. It will return the first non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
e
- (undocumented)public static Column first(String columnName)
The function by default returns the first values it sees. It will return the first non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
columnName
- (undocumented)public static Column first_value(Column e)
e
- (undocumented)public static Column first_value(Column e, Column ignoreNulls)
The function by default returns the first values it sees. It will return the first non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
e
- (undocumented)ignoreNulls
- (undocumented)public static Column grouping(Column e)
e
- (undocumented)public static Column grouping(String columnName)
columnName
- (undocumented)public static Column grouping_id(scala.collection.Seq<Column> cols)
(grouping(c1) <<; (n-1)) + (grouping(c2) <<; (n-2)) + ... + grouping(cn)
cols
- (undocumented)public static Column grouping_id(String colName, scala.collection.Seq<String> colNames)
(grouping(c1) <<; (n-1)) + (grouping(c2) <<; (n-2)) + ... + grouping(cn)
colName
- (undocumented)colNames
- (undocumented)public static Column hll_sketch_agg(Column e, Column lgConfigK)
e
- (undocumented)lgConfigK
- (undocumented)public static Column hll_sketch_agg(Column e, int lgConfigK)
e
- (undocumented)lgConfigK
- (undocumented)public static Column hll_sketch_agg(String columnName, int lgConfigK)
columnName
- (undocumented)lgConfigK
- (undocumented)public static Column hll_sketch_agg(Column e)
e
- (undocumented)public static Column hll_sketch_agg(String columnName)
columnName
- (undocumented)public static Column hll_union_agg(Column e, Column allowDifferentLgConfigK)
e
- (undocumented)allowDifferentLgConfigK
- (undocumented)public static Column hll_union_agg(Column e, boolean allowDifferentLgConfigK)
e
- (undocumented)allowDifferentLgConfigK
- (undocumented)public static Column hll_union_agg(String columnName, boolean allowDifferentLgConfigK)
columnName
- (undocumented)allowDifferentLgConfigK
- (undocumented)public static Column hll_union_agg(Column e)
e
- (undocumented)public static Column hll_union_agg(String columnName)
columnName
- (undocumented)public static Column kurtosis(Column e)
e
- (undocumented)public static Column kurtosis(String columnName)
columnName
- (undocumented)public static Column last(Column e, boolean ignoreNulls)
The function by default returns the last values it sees. It will return the last non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
e
- (undocumented)ignoreNulls
- (undocumented)public static Column last(String columnName, boolean ignoreNulls)
The function by default returns the last values it sees. It will return the last non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
columnName
- (undocumented)ignoreNulls
- (undocumented)public static Column last(Column e)
The function by default returns the last values it sees. It will return the last non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
e
- (undocumented)public static Column last(String columnName)
The function by default returns the last values it sees. It will return the last non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
columnName
- (undocumented)public static Column last_value(Column e)
e
- (undocumented)public static Column last_value(Column e, Column ignoreNulls)
The function by default returns the last values it sees. It will return the last non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
e
- (undocumented)ignoreNulls
- (undocumented)public static Column mode(Column e)
e
- (undocumented)public static Column max(Column e)
e
- (undocumented)public static Column max(String columnName)
columnName
- (undocumented)public static Column max_by(Column e, Column ord)
e
- (undocumented)ord
- (undocumented)public static Column mean(Column e)
e
- (undocumented)public static Column mean(String columnName)
columnName
- (undocumented)public static Column median(Column e)
e
- (undocumented)public static Column min(Column e)
e
- (undocumented)public static Column min(String columnName)
columnName
- (undocumented)public static Column min_by(Column e, Column ord)
e
- (undocumented)ord
- (undocumented)public static Column percentile(Column e, Column percentage)
expr
at the
given percentage(s) with value range in [0.0, 1.0].
e
- (undocumented)percentage
- (undocumented)public static Column percentile(Column e, Column percentage, Column frequency)
expr
at the
given percentage(s) with value range in [0.0, 1.0].
e
- (undocumented)percentage
- (undocumented)frequency
- (undocumented)public static Column percentile_approx(Column e, Column percentage, Column accuracy)
percentile
of the numeric column col
which
is the smallest value in the ordered col
values (sorted from least to greatest) such that
no more than percentage
of col
values is less than the value or equal to that value.
If percentage is an array, each value must be between 0.0 and 1.0. If it is a single floating point value, it must be between 0.0 and 1.0.
The accuracy parameter is a positive numeric literal which controls approximation accuracy at the cost of memory. Higher value of accuracy yields better accuracy, 1.0/accuracy is the relative error of the approximation.
e
- (undocumented)percentage
- (undocumented)accuracy
- (undocumented)public static Column approx_percentile(Column e, Column percentage, Column accuracy)
percentile
of the numeric column col
which
is the smallest value in the ordered col
values (sorted from least to greatest) such that
no more than percentage
of col
values is less than the value or equal to that value.
If percentage is an array, each value must be between 0.0 and 1.0. If it is a single floating point value, it must be between 0.0 and 1.0.
The accuracy parameter is a positive numeric literal which controls approximation accuracy at the cost of memory. Higher value of accuracy yields better accuracy, 1.0/accuracy is the relative error of the approximation.
e
- (undocumented)percentage
- (undocumented)accuracy
- (undocumented)public static Column product(Column e)
e
- (undocumented)public static Column skewness(Column e)
e
- (undocumented)public static Column skewness(String columnName)
columnName
- (undocumented)public static Column std(Column e)
stddev_samp
.
e
- (undocumented)public static Column stddev(Column e)
stddev_samp
.
e
- (undocumented)public static Column stddev(String columnName)
stddev_samp
.
columnName
- (undocumented)public static Column stddev_samp(Column e)
e
- (undocumented)public static Column stddev_samp(String columnName)
columnName
- (undocumented)public static Column stddev_pop(Column e)
e
- (undocumented)public static Column stddev_pop(String columnName)
columnName
- (undocumented)public static Column sum(Column e)
e
- (undocumented)public static Column sum(String columnName)
columnName
- (undocumented)public static Column sumDistinct(Column e)
e
- (undocumented)public static Column sumDistinct(String columnName)
columnName
- (undocumented)public static Column sum_distinct(Column e)
e
- (undocumented)public static Column variance(Column e)
var_samp
.
e
- (undocumented)public static Column variance(String columnName)
var_samp
.
columnName
- (undocumented)public static Column var_samp(Column e)
e
- (undocumented)public static Column var_samp(String columnName)
columnName
- (undocumented)public static Column var_pop(Column e)
e
- (undocumented)public static Column var_pop(String columnName)
columnName
- (undocumented)public static Column regr_avgx(Column y, Column x)
y
is the dependent variable and x
is the independent variable.
y
- (undocumented)x
- (undocumented)public static Column regr_avgy(Column y, Column x)
y
is the dependent variable and x
is the independent variable.
y
- (undocumented)x
- (undocumented)public static Column regr_count(Column y, Column x)
y
is the dependent variable and x
is the independent variable.
y
- (undocumented)x
- (undocumented)public static Column regr_intercept(Column y, Column x)
y
is the dependent variable and
x
is the independent variable.
y
- (undocumented)x
- (undocumented)public static Column regr_r2(Column y, Column x)
y
is the dependent variable and x
is the independent variable.
y
- (undocumented)x
- (undocumented)public static Column regr_slope(Column y, Column x)
y
is the dependent variable and x
is the independent variable.
y
- (undocumented)x
- (undocumented)public static Column regr_sxx(Column y, Column x)
y
is the dependent variable and x
is the independent variable.
y
- (undocumented)x
- (undocumented)public static Column regr_sxy(Column y, Column x)
y
is the dependent variable and x
is the independent variable.
y
- (undocumented)x
- (undocumented)public static Column regr_syy(Column y, Column x)
y
is the dependent variable and x
is the independent variable.
y
- (undocumented)x
- (undocumented)public static Column any_value(Column e)
e
for a group of rows.
e
- (undocumented)public static Column any_value(Column e, Column ignoreNulls)
e
for a group of rows.
If isIgnoreNull
is true, returns only non-null values.
e
- (undocumented)ignoreNulls
- (undocumented)public static Column count_if(Column e)
TRUE
values for the expression.
e
- (undocumented)public static Column histogram_numeric(Column e, Column nBins)
e
- (undocumented)nBins
- (undocumented)public static Column every(Column e)
e
are true.
e
- (undocumented)public static Column bool_and(Column e)
e
are true.
e
- (undocumented)public static Column some(Column e)
e
is true.
e
- (undocumented)public static Column any(Column e)
e
is true.
e
- (undocumented)public static Column bool_or(Column e)
e
is true.
e
- (undocumented)public static Column bit_and(Column e)
e
- (undocumented)public static Column bit_or(Column e)
e
- (undocumented)public static Column bit_xor(Column e)
e
- (undocumented)public static Column cume_dist()
N = total number of rows in the partition
cumeDist(x) = number of values before (and including) x / N
public static Column dense_rank()
The difference between rank and dense_rank is that denseRank leaves no gaps in ranking sequence when there are ties. That is, if you were ranking a competition using dense_rank and had three people tie for second place, you would say that all three were in second place and that the next person came in third. Rank would give me sequential numbers, making the person that came in third place (after the ties) would register as coming in fifth.
This is equivalent to the DENSE_RANK function in SQL.
public static Column lag(Column e, int offset)
offset
rows before the current row, and
null
if there is less than offset
rows before the current row. For example,
an offset
of one will return the previous row at any given point in the window partition.
This is equivalent to the LAG function in SQL.
e
- (undocumented)offset
- (undocumented)public static Column lag(String columnName, int offset)
offset
rows before the current row, and
null
if there is less than offset
rows before the current row. For example,
an offset
of one will return the previous row at any given point in the window partition.
This is equivalent to the LAG function in SQL.
columnName
- (undocumented)offset
- (undocumented)public static Column lag(String columnName, int offset, Object defaultValue)
offset
rows before the current row, and
defaultValue
if there is less than offset
rows before the current row. For example,
an offset
of one will return the previous row at any given point in the window partition.
This is equivalent to the LAG function in SQL.
columnName
- (undocumented)offset
- (undocumented)defaultValue
- (undocumented)public static Column lag(Column e, int offset, Object defaultValue)
offset
rows before the current row, and
defaultValue
if there is less than offset
rows before the current row. For example,
an offset
of one will return the previous row at any given point in the window partition.
This is equivalent to the LAG function in SQL.
e
- (undocumented)offset
- (undocumented)defaultValue
- (undocumented)public static Column lag(Column e, int offset, Object defaultValue, boolean ignoreNulls)
offset
rows before the current row, and
defaultValue
if there is less than offset
rows before the current row. ignoreNulls
determines whether null values of row are included in or eliminated from the calculation.
For example, an offset
of one will return the previous row at any given point in the
window partition.
This is equivalent to the LAG function in SQL.
e
- (undocumented)offset
- (undocumented)defaultValue
- (undocumented)ignoreNulls
- (undocumented)public static Column lead(String columnName, int offset)
offset
rows after the current row, and
null
if there is less than offset
rows after the current row. For example,
an offset
of one will return the next row at any given point in the window partition.
This is equivalent to the LEAD function in SQL.
columnName
- (undocumented)offset
- (undocumented)public static Column lead(Column e, int offset)
offset
rows after the current row, and
null
if there is less than offset
rows after the current row. For example,
an offset
of one will return the next row at any given point in the window partition.
This is equivalent to the LEAD function in SQL.
e
- (undocumented)offset
- (undocumented)public static Column lead(String columnName, int offset, Object defaultValue)
offset
rows after the current row, and
defaultValue
if there is less than offset
rows after the current row. For example,
an offset
of one will return the next row at any given point in the window partition.
This is equivalent to the LEAD function in SQL.
columnName
- (undocumented)offset
- (undocumented)defaultValue
- (undocumented)public static Column lead(Column e, int offset, Object defaultValue)
offset
rows after the current row, and
defaultValue
if there is less than offset
rows after the current row. For example,
an offset
of one will return the next row at any given point in the window partition.
This is equivalent to the LEAD function in SQL.
e
- (undocumented)offset
- (undocumented)defaultValue
- (undocumented)public static Column lead(Column e, int offset, Object defaultValue, boolean ignoreNulls)
offset
rows after the current row, and
defaultValue
if there is less than offset
rows after the current row. ignoreNulls
determines whether null values of row are included in or eliminated from the calculation.
The default value of ignoreNulls
is false. For example, an offset
of one will return
the next row at any given point in the window partition.
This is equivalent to the LEAD function in SQL.
e
- (undocumented)offset
- (undocumented)defaultValue
- (undocumented)ignoreNulls
- (undocumented)public static Column nth_value(Column e, int offset, boolean ignoreNulls)
offset
th row of the window frame
(counting from 1), and null
if the size of window frame is less than offset
rows.
It will return the offset
th non-null value it sees when ignoreNulls is set to true.
If all values are null, then null is returned.
This is equivalent to the nth_value function in SQL.
e
- (undocumented)offset
- (undocumented)ignoreNulls
- (undocumented)public static Column nth_value(Column e, int offset)
offset
th row of the window frame
(counting from 1), and null
if the size of window frame is less than offset
rows.
This is equivalent to the nth_value function in SQL.
e
- (undocumented)offset
- (undocumented)public static Column ntile(int n)
n
inclusive) in an ordered window
partition. For example, if n
is 4, the first quarter of the rows will get value 1, the second
quarter will get 2, the third quarter will get 3, and the last quarter will get 4.
This is equivalent to the NTILE function in SQL.
n
- (undocumented)public static Column percent_rank()
This is computed by:
(rank of row in its partition - 1) / (number of rows in the partition - 1)
This is equivalent to the PERCENT_RANK function in SQL.
public static Column rank()
The difference between rank and dense_rank is that dense_rank leaves no gaps in ranking sequence when there are ties. That is, if you were ranking a competition using dense_rank and had three people tie for second place, you would say that all three were in second place and that the next person came in third. Rank would give me sequential numbers, making the person that came in third place (after the ties) would register as coming in fifth.
This is equivalent to the RANK function in SQL.
public static Column row_number()
public static Column array(scala.collection.Seq<Column> cols)
cols
- (undocumented)public static Column array(String colName, scala.collection.Seq<String> colNames)
colName
- (undocumented)colNames
- (undocumented)public static Column map(scala.collection.Seq<Column> cols)
cols
- (undocumented)public static Column named_struct(scala.collection.Seq<Column> cols)
cols
- (undocumented)public static Column map_from_arrays(Column keys, Column values)
keys
- (undocumented)values
- (undocumented)public static Column str_to_map(Column text, Column pairDelim, Column keyValueDelim)
pairDelim
and keyValueDelim
are treated as regular expressions.
text
- (undocumented)pairDelim
- (undocumented)keyValueDelim
- (undocumented)public static Column str_to_map(Column text, Column pairDelim)
pairDelim
is treated as regular expressions.
text
- (undocumented)pairDelim
- (undocumented)public static Column str_to_map(Column text)
text
- (undocumented)public static <T> Dataset<T> broadcast(Dataset<T> df)
The following example marks the right DataFrame for broadcast hash join using joinKey
.
// left and right are DataFrames
left.join(broadcast(right), "joinKey")
df
- (undocumented)public static Column coalesce(scala.collection.Seq<Column> e)
For example, coalesce(a, b, c)
will return a if a is not null,
or b if a is null and b is not null, or c if both a and b are null but c is not null.
e
- (undocumented)public static Column input_file_name()
public static Column isnan(Column e)
e
- (undocumented)public static Column isnull(Column e)
e
- (undocumented)public static Column monotonicallyIncreasingId()
The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive. The current implementation puts the partition ID in the upper 31 bits, and the record number within each partition in the lower 33 bits. The assumption is that the data frame has less than 1 billion partitions, and each partition has less than 8 billion records.
As an example, consider a DataFrame
with two partitions, each with 3 records.
This expression would return the following IDs:
0, 1, 2, 8589934592 (1L << 33), 8589934593, 8589934594.
public static Column monotonically_increasing_id()
The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive. The current implementation puts the partition ID in the upper 31 bits, and the record number within each partition in the lower 33 bits. The assumption is that the data frame has less than 1 billion partitions, and each partition has less than 8 billion records.
As an example, consider a DataFrame
with two partitions, each with 3 records.
This expression would return the following IDs:
0, 1, 2, 8589934592 (1L << 33), 8589934593, 8589934594.
public static Column nanvl(Column col1, Column col2)
Both inputs should be floating point columns (DoubleType or FloatType).
col1
- (undocumented)col2
- (undocumented)public static Column negate(Column e)
// Select the amount column and negates all values.
// Scala:
df.select( -df("amount") )
// Java:
df.select( negate(df.col("amount")) );
e
- (undocumented)public static Column not(Column e)
// Scala: select rows that are not active (isActive === false)
df.filter( !df("isActive") )
// Java:
df.filter( not(df.col("isActive")) );
e
- (undocumented)public static Column rand(long seed)
seed
- (undocumented)public static Column rand()
public static Column randn(long seed)
seed
- (undocumented)public static Column randn()
public static Column spark_partition_id()
public static Column sqrt(Column e)
e
- (undocumented)public static Column sqrt(String colName)
colName
- (undocumented)public static Column try_add(Column left, Column right)
left
and right
and the result is null on overflow. The acceptable
input types are the same with the +
operator.
left
- (undocumented)right
- (undocumented)public static Column try_avg(Column e)
e
- (undocumented)public static Column try_divide(Column dividend, Column divisor)
dividend
/
divisor
. It always performs floating point division. Its result is
always null if divisor
is 0.
dividend
- (undocumented)divisor
- (undocumented)public static Column try_multiply(Column left, Column right)
left
*
right
and the result is null on overflow. The acceptable input types are
the same with the *
operator.
left
- (undocumented)right
- (undocumented)public static Column try_subtract(Column left, Column right)
left
-
right
and the result is null on overflow. The acceptable input types are
the same with the -
operator.
left
- (undocumented)right
- (undocumented)public static Column try_sum(Column e)
e
- (undocumented)public static Column struct(scala.collection.Seq<Column> cols)
DataFrame
, or a derived column expression
that is named (i.e. aliased), its name would be retained as the StructField's name,
otherwise, the newly generated StructField's name would be auto generated as
col
with a suffix index + 1
, i.e. col1, col2, col3, ...
cols
- (undocumented)public static Column struct(String colName, scala.collection.Seq<String> colNames)
colName
- (undocumented)colNames
- (undocumented)public static Column when(Column condition, Object value)
// Example: encoding gender string column into integer.
// Scala:
people.select(when(people("gender") === "male", 0)
.when(people("gender") === "female", 1)
.otherwise(2))
// Java:
people.select(when(col("gender").equalTo("male"), 0)
.when(col("gender").equalTo("female"), 1)
.otherwise(2))
condition
- (undocumented)value
- (undocumented)public static Column bitwiseNOT(Column e)
e
- (undocumented)public static Column bitwise_not(Column e)
e
- (undocumented)public static Column bit_count(Column e)
e
- (undocumented)public static Column bit_get(Column e, Column pos)
e
- (undocumented)pos
- (undocumented)public static Column getbit(Column e, Column pos)
e
- (undocumented)pos
- (undocumented)public static Column expr(String expr)
Dataset.selectExpr(java.lang.String...)
.
// get the number of words of each length
df.groupBy(expr("length(word)")).count()
expr
- (undocumented)public static Column abs(Column e)
e
- (undocumented)public static Column acos(Column e)
e
- (undocumented)e
in radians, as if computed by java.lang.Math.acos
public static Column acos(String columnName)
columnName
- (undocumented)columnName
, as if computed by java.lang.Math.acos
public static Column acosh(Column e)
e
- (undocumented)e
public static Column acosh(String columnName)
columnName
- (undocumented)columnName
public static Column asin(Column e)
e
- (undocumented)e
in radians, as if computed by java.lang.Math.asin
public static Column asin(String columnName)
columnName
- (undocumented)columnName
, as if computed by java.lang.Math.asin
public static Column asinh(Column e)
e
- (undocumented)e
public static Column asinh(String columnName)
columnName
- (undocumented)columnName
public static Column atan(Column e)
e
- (undocumented)e
as if computed by java.lang.Math.atan
public static Column atan(String columnName)
columnName
- (undocumented)columnName
, as if computed by java.lang.Math.atan
public static Column atan2(Column y, Column x)
y
- coordinate on y-axisx
- coordinate on x-axisjava.lang.Math.atan2
public static Column atan2(Column y, String xName)
y
- coordinate on y-axisxName
- coordinate on x-axisjava.lang.Math.atan2
public static Column atan2(String yName, Column x)
yName
- coordinate on y-axisx
- coordinate on x-axisjava.lang.Math.atan2
public static Column atan2(String yName, String xName)
yName
- coordinate on y-axisxName
- coordinate on x-axisjava.lang.Math.atan2
public static Column atan2(Column y, double xValue)
y
- coordinate on y-axisxValue
- coordinate on x-axisjava.lang.Math.atan2
public static Column atan2(String yName, double xValue)
yName
- coordinate on y-axisxValue
- coordinate on x-axisjava.lang.Math.atan2
public static Column atan2(double yValue, Column x)
yValue
- coordinate on y-axisx
- coordinate on x-axisjava.lang.Math.atan2
public static Column atan2(double yValue, String xName)
yValue
- coordinate on y-axisxName
- coordinate on x-axisjava.lang.Math.atan2
public static Column atanh(Column e)
e
- (undocumented)e
public static Column atanh(String columnName)
columnName
- (undocumented)columnName
public static Column bin(Column e)
e
- (undocumented)public static Column bin(String columnName)
columnName
- (undocumented)public static Column cbrt(Column e)
e
- (undocumented)public static Column cbrt(String columnName)
columnName
- (undocumented)public static Column ceil(Column e, Column scale)
e
to scale
decimal places.
e
- (undocumented)scale
- (undocumented)public static Column ceil(Column e)
e
to 0 decimal places.
e
- (undocumented)public static Column ceil(String columnName)
e
to 0 decimal places.
columnName
- (undocumented)public static Column ceiling(Column e, Column scale)
e
to scale
decimal places.
e
- (undocumented)scale
- (undocumented)public static Column ceiling(Column e)
e
to 0 decimal places.
e
- (undocumented)public static Column conv(Column num, int fromBase, int toBase)
num
- (undocumented)fromBase
- (undocumented)toBase
- (undocumented)public static Column cos(Column e)
e
- angle in radiansjava.lang.Math.cos
public static Column cos(String columnName)
columnName
- angle in radiansjava.lang.Math.cos
public static Column cosh(Column e)
e
- hyperbolic anglejava.lang.Math.cosh
public static Column cosh(String columnName)
columnName
- hyperbolic anglejava.lang.Math.cosh
public static Column cot(Column e)
e
- angle in radianspublic static Column csc(Column e)
e
- angle in radianspublic static Column e()
public static Column exp(Column e)
e
- (undocumented)public static Column exp(String columnName)
columnName
- (undocumented)public static Column expm1(Column e)
e
- (undocumented)public static Column expm1(String columnName)
columnName
- (undocumented)public static Column factorial(Column e)
e
- (undocumented)public static Column floor(Column e, Column scale)
e
to scale
decimal places.
e
- (undocumented)scale
- (undocumented)public static Column floor(Column e)
e
to 0 decimal places.
e
- (undocumented)public static Column floor(String columnName)
columnName
- (undocumented)public static Column greatest(scala.collection.Seq<Column> exprs)
exprs
- (undocumented)public static Column greatest(String columnName, scala.collection.Seq<String> columnNames)
columnName
- (undocumented)columnNames
- (undocumented)public static Column hex(Column column)
column
- (undocumented)public static Column unhex(Column column)
column
- (undocumented)public static Column hypot(Column l, Column r)
sqrt(a^2^ + b^2^)
without intermediate overflow or underflow.
l
- (undocumented)r
- (undocumented)public static Column hypot(Column l, String rightName)
sqrt(a^2^ + b^2^)
without intermediate overflow or underflow.
l
- (undocumented)rightName
- (undocumented)public static Column hypot(String leftName, Column r)
sqrt(a^2^ + b^2^)
without intermediate overflow or underflow.
leftName
- (undocumented)r
- (undocumented)public static Column hypot(String leftName, String rightName)
sqrt(a^2^ + b^2^)
without intermediate overflow or underflow.
leftName
- (undocumented)rightName
- (undocumented)public static Column hypot(Column l, double r)
sqrt(a^2^ + b^2^)
without intermediate overflow or underflow.
l
- (undocumented)r
- (undocumented)public static Column hypot(String leftName, double r)
sqrt(a^2^ + b^2^)
without intermediate overflow or underflow.
leftName
- (undocumented)r
- (undocumented)public static Column hypot(double l, Column r)
sqrt(a^2^ + b^2^)
without intermediate overflow or underflow.
l
- (undocumented)r
- (undocumented)public static Column hypot(double l, String rightName)
sqrt(a^2^ + b^2^)
without intermediate overflow or underflow.
l
- (undocumented)rightName
- (undocumented)public static Column least(scala.collection.Seq<Column> exprs)
exprs
- (undocumented)public static Column least(String columnName, scala.collection.Seq<String> columnNames)
columnName
- (undocumented)columnNames
- (undocumented)public static Column ln(Column e)
e
- (undocumented)public static Column log(Column e)
e
- (undocumented)public static Column log(String columnName)
columnName
- (undocumented)public static Column log(double base, Column a)
base
- (undocumented)a
- (undocumented)public static Column log(double base, String columnName)
base
- (undocumented)columnName
- (undocumented)public static Column log10(Column e)
e
- (undocumented)public static Column log10(String columnName)
columnName
- (undocumented)public static Column log1p(Column e)
e
- (undocumented)public static Column log1p(String columnName)
columnName
- (undocumented)public static Column log2(Column expr)
expr
- (undocumented)public static Column log2(String columnName)
columnName
- (undocumented)public static Column negative(Column e)
e
- (undocumented)public static Column pi()
public static Column positive(Column e)
e
- (undocumented)public static Column pow(Column l, Column r)
l
- (undocumented)r
- (undocumented)public static Column pow(Column l, String rightName)
l
- (undocumented)rightName
- (undocumented)public static Column pow(String leftName, Column r)
leftName
- (undocumented)r
- (undocumented)public static Column pow(String leftName, String rightName)
leftName
- (undocumented)rightName
- (undocumented)public static Column pow(Column l, double r)
l
- (undocumented)r
- (undocumented)public static Column pow(String leftName, double r)
leftName
- (undocumented)r
- (undocumented)public static Column pow(double l, Column r)
l
- (undocumented)r
- (undocumented)public static Column pow(double l, String rightName)
l
- (undocumented)rightName
- (undocumented)public static Column power(Column l, Column r)
l
- (undocumented)r
- (undocumented)public static Column pmod(Column dividend, Column divisor)
dividend
- (undocumented)divisor
- (undocumented)public static Column rint(Column e)
e
- (undocumented)public static Column rint(String columnName)
columnName
- (undocumented)public static Column round(Column e)
e
rounded to 0 decimal places with HALF_UP round mode.
e
- (undocumented)public static Column round(Column e, int scale)
e
to scale
decimal places with HALF_UP round mode
if scale
is greater than or equal to 0 or at integral part when scale
is less than 0.
e
- (undocumented)scale
- (undocumented)public static Column bround(Column e)
e
rounded to 0 decimal places with HALF_EVEN round mode.
e
- (undocumented)public static Column bround(Column e, int scale)
e
to scale
decimal places with HALF_EVEN round mode
if scale
is greater than or equal to 0 or at integral part when scale
is less than 0.
e
- (undocumented)scale
- (undocumented)public static Column sec(Column e)
e
- angle in radianspublic static Column shiftLeft(Column e, int numBits)
e
- (undocumented)numBits
- (undocumented)public static Column shiftleft(Column e, int numBits)
e
- (undocumented)numBits
- (undocumented)public static Column shiftRight(Column e, int numBits)
e
- (undocumented)numBits
- (undocumented)public static Column shiftright(Column e, int numBits)
e
- (undocumented)numBits
- (undocumented)public static Column shiftRightUnsigned(Column e, int numBits)
e
- (undocumented)numBits
- (undocumented)public static Column shiftrightunsigned(Column e, int numBits)
e
- (undocumented)numBits
- (undocumented)public static Column sign(Column e)
e
- (undocumented)public static Column signum(Column e)
e
- (undocumented)public static Column signum(String columnName)
columnName
- (undocumented)public static Column sin(Column e)
e
- angle in radiansjava.lang.Math.sin
public static Column sin(String columnName)
columnName
- angle in radiansjava.lang.Math.sin
public static Column sinh(Column e)
e
- hyperbolic anglejava.lang.Math.sinh
public static Column sinh(String columnName)
columnName
- hyperbolic anglejava.lang.Math.sinh
public static Column tan(Column e)
e
- angle in radiansjava.lang.Math.tan
public static Column tan(String columnName)
columnName
- angle in radiansjava.lang.Math.tan
public static Column tanh(Column e)
e
- hyperbolic anglejava.lang.Math.tanh
public static Column tanh(String columnName)
columnName
- hyperbolic anglejava.lang.Math.tanh
public static Column toDegrees(Column e)
e
- (undocumented)public static Column toDegrees(String columnName)
columnName
- (undocumented)public static Column degrees(Column e)
e
- angle in radiansjava.lang.Math.toDegrees
public static Column degrees(String columnName)
columnName
- angle in radiansjava.lang.Math.toDegrees
public static Column toRadians(Column e)
e
- (undocumented)public static Column toRadians(String columnName)
columnName
- (undocumented)public static Column radians(Column e)
e
- angle in degreesjava.lang.Math.toRadians
public static Column radians(String columnName)
columnName
- angle in degreesjava.lang.Math.toRadians
public static Column width_bucket(Column v, Column min, Column max, Column numBucket)
v
- value to compute a bucket number in the histogrammin
- minimum value of the histogrammax
- maximum value of the histogramnumBucket
- the number of bucketspublic static Column current_catalog()
public static Column current_database()
public static Column current_schema()
public static Column current_user()
public static Column md5(Column e)
e
- (undocumented)public static Column sha1(Column e)
e
- (undocumented)public static Column sha2(Column e, int numBits)
e
- column to compute SHA-2 on.numBits
- one of 224, 256, 384, or 512.
public static Column crc32(Column e)
e
- (undocumented)public static Column hash(scala.collection.Seq<Column> cols)
cols
- (undocumented)public static Column xxhash64(scala.collection.Seq<Column> cols)
cols
- (undocumented)public static Column assert_true(Column c)
c
- (undocumented)public static Column assert_true(Column c, Column e)
c
- (undocumented)e
- (undocumented)public static Column raise_error(Column c)
c
- (undocumented)public static Column hll_sketch_estimate(Column c)
c
- (undocumented)public static Column hll_sketch_estimate(String columnName)
columnName
- (undocumented)public static Column hll_union(Column c1, Column c2)
c1
- (undocumented)c2
- (undocumented)public static Column hll_union(String columnName1, String columnName2)
columnName1
- (undocumented)columnName2
- (undocumented)public static Column hll_union(Column c1, Column c2, boolean allowDifferentLgConfigK)
c1
- (undocumented)c2
- (undocumented)allowDifferentLgConfigK
- (undocumented)public static Column hll_union(String columnName1, String columnName2, boolean allowDifferentLgConfigK)
columnName1
- (undocumented)columnName2
- (undocumented)allowDifferentLgConfigK
- (undocumented)public static Column user()
public static Column uuid()
public static Column aes_encrypt(Column input, Column key, Column mode, Column padding, Column iv, Column aad)
input
using AES in given mode
with the specified padding
.
Key lengths of 16, 24 and 32 bits are supported. Supported combinations of (mode
,
padding
) are ('ECB', 'PKCS'), ('GCM', 'NONE') and ('CBC', 'PKCS'). Optional initialization
vectors (IVs) are only supported for CBC and GCM modes. These must be 16 bytes for CBC and 12
bytes for GCM. If not provided, a random vector will be generated and prepended to the
output. Optional additional authenticated data (AAD) is only supported for GCM. If provided
for encryption, the identical AAD value must be provided for decryption. The default mode is
GCM.
input
- The binary value to encrypt.key
- The passphrase to use to encrypt the data.mode
- Specifies which block cipher mode should be used to encrypt messages. Valid modes: ECB,
GCM, CBC.padding
- Specifies how to pad messages whose length is not a multiple of the block size. Valid
values: PKCS, NONE, DEFAULT. The DEFAULT padding means PKCS for ECB, NONE for GCM and PKCS
for CBC.iv
- Optional initialization vector. Only supported for CBC and GCM modes. Valid values: None or
"". 16-byte array for CBC mode. 12-byte array for GCM mode.aad
- Optional additional authenticated data. Only supported for GCM mode. This can be any
free-form input and must be provided for both encryption and decryption.
public static Column aes_encrypt(Column input, Column key, Column mode, Column padding, Column iv)
input
.
input
- (undocumented)key
- (undocumented)mode
- (undocumented)padding
- (undocumented)iv
- (undocumented)org.apache.spark.sql.functions.aes_encrypt(Column, Column, Column, Column, Column,
Column)
public static Column aes_encrypt(Column input, Column key, Column mode, Column padding)
input
.
input
- (undocumented)key
- (undocumented)mode
- (undocumented)padding
- (undocumented)org.apache.spark.sql.functions.aes_encrypt(Column, Column, Column, Column, Column,
Column)
public static Column aes_encrypt(Column input, Column key, Column mode)
input
.
input
- (undocumented)key
- (undocumented)mode
- (undocumented)org.apache.spark.sql.functions.aes_encrypt(Column, Column, Column, Column, Column,
Column)
public static Column aes_encrypt(Column input, Column key)
input
.
input
- (undocumented)key
- (undocumented)org.apache.spark.sql.functions.aes_encrypt(Column, Column, Column, Column, Column,
Column)
public static Column aes_decrypt(Column input, Column key, Column mode, Column padding, Column aad)
input
using AES in mode
with padding
. Key lengths of 16,
24 and 32 bits are supported. Supported combinations of (mode
, padding
) are ('ECB',
'PKCS'), ('GCM', 'NONE') and ('CBC', 'PKCS'). Optional additional authenticated data (AAD) is
only supported for GCM. If provided for encryption, the identical AAD value must be provided
for decryption. The default mode is GCM.
input
- The binary value to decrypt.key
- The passphrase to use to decrypt the data.mode
- Specifies which block cipher mode should be used to decrypt messages. Valid modes: ECB,
GCM, CBC.padding
- Specifies how to pad messages whose length is not a multiple of the block size. Valid
values: PKCS, NONE, DEFAULT. The DEFAULT padding means PKCS for ECB, NONE for GCM and PKCS
for CBC.aad
- Optional additional authenticated data. Only supported for GCM mode. This can be any
free-form input and must be provided for both encryption and decryption.
public static Column aes_decrypt(Column input, Column key, Column mode, Column padding)
input
.
input
- (undocumented)key
- (undocumented)mode
- (undocumented)padding
- (undocumented)org.apache.spark.sql.functions.aes_decrypt(Column, Column, Column, Column, Column)
public static Column aes_decrypt(Column input, Column key, Column mode)
input
.
input
- (undocumented)key
- (undocumented)mode
- (undocumented)org.apache.spark.sql.functions.aes_decrypt(Column, Column, Column, Column, Column)
public static Column aes_decrypt(Column input, Column key)
input
.
input
- (undocumented)key
- (undocumented)org.apache.spark.sql.functions.aes_decrypt(Column, Column, Column, Column, Column)
public static Column try_aes_decrypt(Column input, Column key, Column mode, Column padding, Column aad)
aes_decrypt
that performs the same operation, but returns a
NULL value instead of raising an error if the decryption cannot be performed.
input
- The binary value to decrypt.key
- The passphrase to use to decrypt the data.mode
- Specifies which block cipher mode should be used to decrypt messages. Valid modes: ECB,
GCM, CBC.padding
- Specifies how to pad messages whose length is not a multiple of the block size. Valid
values: PKCS, NONE, DEFAULT. The DEFAULT padding means PKCS for ECB, NONE for GCM and PKCS
for CBC.aad
- Optional additional authenticated data. Only supported for GCM mode. This can be any
free-form input and must be provided for both encryption and decryption.
public static Column try_aes_decrypt(Column input, Column key, Column mode, Column padding)
input
.
input
- (undocumented)key
- (undocumented)mode
- (undocumented)padding
- (undocumented)org.apache.spark.sql.functions.try_aes_decrypt(Column, Column, Column, Column, Column)
public static Column try_aes_decrypt(Column input, Column key, Column mode)
input
.
input
- (undocumented)key
- (undocumented)mode
- (undocumented)org.apache.spark.sql.functions.try_aes_decrypt(Column, Column, Column, Column, Column)
public static Column try_aes_decrypt(Column input, Column key)
input
.
input
- (undocumented)key
- (undocumented)org.apache.spark.sql.functions.try_aes_decrypt(Column, Column, Column, Column, Column)
public static Column sha(Column col)
col
.
col
- (undocumented)public static Column input_file_block_length()
public static Column input_file_block_start()
public static Column reflect(scala.collection.Seq<Column> cols)
cols
- (undocumented)public static Column java_method(scala.collection.Seq<Column> cols)
cols
- (undocumented)public static Column version()
public static Column typeof(Column col)
col
- (undocumented)public static Column stack(scala.collection.Seq<Column> cols)
col1
, ..., colk
into n
rows. Uses column names col0, col1, etc. by default
unless specified otherwise.
cols
- (undocumented)public static Column random(Column seed)
seed
- (undocumented)public static Column random()
public static Column bitmap_bucket_number(Column col)
col
- (undocumented)public static Column bitmap_bit_position(Column col)
col
- (undocumented)public static Column bitmap_construct_agg(Column col)
col
- (undocumented)public static Column bitmap_count(Column col)
col
- (undocumented)public static Column bitmap_or_agg(Column col)
col
- (undocumented)public static Column ascii(Column e)
e
- (undocumented)public static Column base64(Column e)
e
- (undocumented)public static Column bit_length(Column e)
e
- (undocumented)public static Column concat_ws(String sep, scala.collection.Seq<Column> exprs)
sep
- (undocumented)exprs
- (undocumented)public static Column decode(Column value, String charset)
value
- (undocumented)charset
- (undocumented)public static Column encode(Column value, String charset)
value
- (undocumented)charset
- (undocumented)public static Column format_number(Column x, int d)
If d is 0, the result has no decimal point or fractional part. If d is less than 0, the result will be null.
x
- (undocumented)d
- (undocumented)public static Column format_string(String format, scala.collection.Seq<Column> arguments)
format
- (undocumented)arguments
- (undocumented)public static Column initcap(Column e)
For example, "hello world" will become "Hello World".
e
- (undocumented)public static Column instr(Column str, String substring)
str
- (undocumented)substring
- (undocumented)public static Column length(Column e)
e
- (undocumented)public static Column len(Column e)
e
- (undocumented)public static Column lower(Column e)
e
- (undocumented)public static Column levenshtein(Column l, Column r, int threshold)
l
- (undocumented)r
- (undocumented)threshold
- (undocumented)public static Column levenshtein(Column l, Column r)
l
- (undocumented)r
- (undocumented)public static Column locate(String substr, Column str)
substr
- (undocumented)str
- (undocumented)public static Column locate(String substr, Column str, int pos)
substr
- (undocumented)str
- (undocumented)pos
- (undocumented)public static Column lpad(Column str, int len, String pad)
str
- (undocumented)len
- (undocumented)pad
- (undocumented)public static Column lpad(Column str, int len, byte[] pad)
str
- (undocumented)len
- (undocumented)pad
- (undocumented)public static Column ltrim(Column e)
e
- (undocumented)public static Column ltrim(Column e, String trimString)
e
- (undocumented)trimString
- (undocumented)public static Column octet_length(Column e)
e
- (undocumented)public static Column rlike(Column str, Column regexp)
str
matches regexp
, or false otherwise.
str
- (undocumented)regexp
- (undocumented)public static Column regexp(Column str, Column regexp)
str
matches regexp
, or false otherwise.
str
- (undocumented)regexp
- (undocumented)public static Column regexp_like(Column str, Column regexp)
str
matches regexp
, or false otherwise.
str
- (undocumented)regexp
- (undocumented)public static Column regexp_count(Column str, Column regexp)
regexp
is matched in the string str
.
str
- (undocumented)regexp
- (undocumented)public static Column regexp_extract(Column e, String exp, int groupIdx)
e
- (undocumented)exp
- (undocumented)groupIdx
- (undocumented)public static Column regexp_extract_all(Column str, Column regexp)
str
that match the regexp
expression and
corresponding to the first regex group index.
str
- (undocumented)regexp
- (undocumented)public static Column regexp_extract_all(Column str, Column regexp, Column idx)
str
that match the regexp
expression and
corresponding to the regex group index.
str
- (undocumented)regexp
- (undocumented)idx
- (undocumented)public static Column regexp_replace(Column e, String pattern, String replacement)
e
- (undocumented)pattern
- (undocumented)replacement
- (undocumented)public static Column regexp_replace(Column e, Column pattern, Column replacement)
e
- (undocumented)pattern
- (undocumented)replacement
- (undocumented)public static Column regexp_substr(Column str, Column regexp)
regexp
within the string str
.
If the regular expression is not found, the result is null.
str
- (undocumented)regexp
- (undocumented)public static Column regexp_instr(Column str, Column regexp)
str
- (undocumented)regexp
- (undocumented)public static Column regexp_instr(Column str, Column regexp, Column idx)
str
- (undocumented)regexp
- (undocumented)idx
- (undocumented)public static Column unbase64(Column e)
e
- (undocumented)public static Column rpad(Column str, int len, String pad)
str
- (undocumented)len
- (undocumented)pad
- (undocumented)public static Column rpad(Column str, int len, byte[] pad)
str
- (undocumented)len
- (undocumented)pad
- (undocumented)public static Column repeat(Column str, int n)
str
- (undocumented)n
- (undocumented)public static Column rtrim(Column e)
e
- (undocumented)public static Column rtrim(Column e, String trimString)
e
- (undocumented)trimString
- (undocumented)public static Column soundex(Column e)
e
- (undocumented)public static Column split(Column str, String pattern)
str
- a string expression to splitpattern
- a string representing a regular expression. The regex string should be
a Java regular expression.
public static Column split(Column str, String pattern, int limit)
str
- a string expression to splitpattern
- a string representing a regular expression. The regex string should be
a Java regular expression.limit
- an integer expression which controls the number of times the regex is applied.
regex
will be applied as many times as
possible, and the resulting array can be of any size.public static Column substring(Column str, int pos, int len)
pos
and is of length len
when str is String type or
returns the slice of byte array that starts at pos
in byte and is of length len
when str is Binary type
str
- (undocumented)pos
- (undocumented)len
- (undocumented)public static Column substring_index(Column str, String delim, int count)
str
- (undocumented)delim
- (undocumented)count
- (undocumented)public static Column overlay(Column src, Column replace, Column pos, Column len)
src
with replace
,
starting from byte position pos
of src
and proceeding for len
bytes.
src
- (undocumented)replace
- (undocumented)pos
- (undocumented)len
- (undocumented)public static Column overlay(Column src, Column replace, Column pos)
src
with replace
,
starting from byte position pos
of src
.
src
- (undocumented)replace
- (undocumented)pos
- (undocumented)public static Column sentences(Column string, Column language, Column country)
string
- (undocumented)language
- (undocumented)country
- (undocumented)public static Column sentences(Column string)
string
- (undocumented)public static Column translate(Column src, String matchingString, String replaceString)
matchingString
.
src
- (undocumented)matchingString
- (undocumented)replaceString
- (undocumented)public static Column trim(Column e)
e
- (undocumented)public static Column trim(Column e, String trimString)
e
- (undocumented)trimString
- (undocumented)public static Column upper(Column e)
e
- (undocumented)public static Column to_binary(Column e, Column format)
e
to a binary value based on the supplied format
.
The format
can be a case-insensitive string literal of "hex", "utf-8", "utf8", or "base64".
By default, the binary format for conversion is "hex" if format
is omitted.
The function returns NULL if at least one of the input parameters is NULL.
e
- (undocumented)format
- (undocumented)public static Column to_binary(Column e)
e
to a binary value based on the default format "hex".
The function returns NULL if at least one of the input parameters is NULL.
e
- (undocumented)public static Column to_char(Column e, Column format)
e
to a string based on the format
.
Throws an exception if the conversion fails. The format can consist of the following
characters, case insensitive:
'0' or '9': Specifies an expected digit between 0 and 9. A sequence of 0 or 9 in the format
string matches a sequence of digits in the input value, generating a result string of the
same length as the corresponding sequence in the format string. The result string is
left-padded with zeros if the 0/9 sequence comprises more digits than the matching part of
the decimal value, starts with 0, and is before the decimal point. Otherwise, it is
padded with spaces.
'.' or 'D': Specifies the position of the decimal point (optional, only allowed once).
',' or 'G': Specifies the position of the grouping (thousands) separator (,). There must be
a 0 or 9 to the left and right of each grouping separator.
'$': Specifies the location of the $ currency sign. This character may only be specified
once.
'S' or 'MI': Specifies the position of a '-' or '+' sign (optional, only allowed once at
the beginning or end of the format string). Note that 'S' prints '+' for positive values
but 'MI' prints a space.
'PR': Only allowed at the end of the format string; specifies that the result string will be
wrapped by angle brackets if the input value is negative.
e
- (undocumented)format
- (undocumented)public static Column to_varchar(Column e, Column format)
e
to a string based on the format
.
Throws an exception if the conversion fails. The format can consist of the following
characters, case insensitive:
'0' or '9': Specifies an expected digit between 0 and 9. A sequence of 0 or 9 in the format
string matches a sequence of digits in the input value, generating a result string of the
same length as the corresponding sequence in the format string. The result string is
left-padded with zeros if the 0/9 sequence comprises more digits than the matching part of
the decimal value, starts with 0, and is before the decimal point. Otherwise, it is
padded with spaces.
'.' or 'D': Specifies the position of the decimal point (optional, only allowed once).
',' or 'G': Specifies the position of the grouping (thousands) separator (,). There must be
a 0 or 9 to the left and right of each grouping separator.
'$': Specifies the location of the $ currency sign. This character may only be specified
once.
'S' or 'MI': Specifies the position of a '-' or '+' sign (optional, only allowed once at
the beginning or end of the format string). Note that 'S' prints '+' for positive values
but 'MI' prints a space.
'PR': Only allowed at the end of the format string; specifies that the result string will be
wrapped by angle brackets if the input value is negative.
e
- (undocumented)format
- (undocumented)public static Column to_number(Column e, Column format)
e
- (undocumented)format
- (undocumented)public static Column replace(Column src, Column search, Column replace)
search
with replace
.
src
- A column of string to be replacedsearch
- A column of string, If search
is not found in str
, str
is returned unchanged.replace
- A column of string, If replace
is not specified or is an empty string, nothing replaces
the string that is removed from str
.
public static Column replace(Column src, Column search)
search
with replace
.
src
- A column of string to be replacedsearch
- A column of string, If search
is not found in src
, src
is returned unchanged.
public static Column split_part(Column str, Column delimiter, Column partNum)
str
by delimiter and return requested part of the split (1-based).
If any input is null, returns null. if partNum
is out of range of split parts,
returns empty string. If partNum
is 0, throws an error. If partNum
is negative,
the parts are counted backward from the end of the string.
If the delimiter
is an empty string, the str
is not split.
str
- (undocumented)delimiter
- (undocumented)partNum
- (undocumented)public static Column substr(Column str, Column pos, Column len)
str
that starts at pos
and is of length len
,
or the slice of byte array that starts at pos
and is of length len
.
str
- (undocumented)pos
- (undocumented)len
- (undocumented)public static Column substr(Column str, Column pos)
str
that starts at pos
,
or the slice of byte array that starts at pos
.
str
- (undocumented)pos
- (undocumented)public static Column parse_url(Column url, Column partToExtract, Column key)
url
- (undocumented)partToExtract
- (undocumented)key
- (undocumented)public static Column parse_url(Column url, Column partToExtract)
url
- (undocumented)partToExtract
- (undocumented)public static Column printf(Column format, scala.collection.Seq<Column> arguments)
format
- (undocumented)arguments
- (undocumented)public static Column url_decode(Column str)
str
in 'application/x-www-form-urlencoded' format
using a specific encoding scheme.
str
- (undocumented)public static Column url_encode(Column str)
str
- (undocumented)public static Column position(Column substr, Column str, Column start)
substr
in str
after position start
.
The given start
and return value are 1-based.
substr
- (undocumented)str
- (undocumented)start
- (undocumented)public static Column position(Column substr, Column str)
substr
in str
after position 1
.
The return value are 1-based.
substr
- (undocumented)str
- (undocumented)public static Column endswith(Column str, Column suffix)
str
- (undocumented)suffix
- (undocumented)public static Column startswith(Column str, Column prefix)
str
- (undocumented)prefix
- (undocumented)public static Column btrim(Column str)
str
.
str
- (undocumented)public static Column btrim(Column str, Column trim)
trim
characters from str
.
str
- (undocumented)trim
- (undocumented)public static Column try_to_binary(Column e, Column format)
to_binary
that performs the same operation, but returns a NULL
value instead of raising an error if the conversion cannot be performed.
e
- (undocumented)format
- (undocumented)public static Column try_to_binary(Column e)
to_binary
that performs the same operation, but returns a NULL
value instead of raising an error if the conversion cannot be performed.
e
- (undocumented)public static Column try_to_number(Column e, Column format)
e
to a number based on the string format format
. Returns NULL if the
string e
does not match the expected format. The format follows the same semantics as the
to_number function.
e
- (undocumented)format
- (undocumented)public static Column char_length(Column str)
str
- (undocumented)public static Column character_length(Column str)
str
- (undocumented)public static Column chr(Column n)
n
.
If n is larger than 256 the result is equivalent to chr(n % 256)
n
- (undocumented)public static Column contains(Column left, Column right)
left
- (undocumented)right
- (undocumented)public static Column elt(scala.collection.Seq<Column> inputs)
n
-th input, e.g., returns input2
when n
is 2.
The function returns NULL if the index exceeds the length of the array
and spark.sql.ansi.enabled
is set to false. If spark.sql.ansi.enabled
is set to true,
it throws ArrayIndexOutOfBoundsException for invalid indices.
inputs
- (undocumented)public static Column find_in_set(Column str, Column strArray)
str
) in the comma-delimited
list (strArray
). Returns 0, if the string was not found or if the given string (str
)
contains a comma.
str
- (undocumented)strArray
- (undocumented)public static Column like(Column str, Column pattern, Column escapeChar)
pattern
with escapeChar
, null if any arguments are null,
false otherwise.
str
- (undocumented)pattern
- (undocumented)escapeChar
- (undocumented)public static Column like(Column str, Column pattern)
pattern
with escapeChar
('\'), null if any arguments are null,
false otherwise.
str
- (undocumented)pattern
- (undocumented)public static Column ilike(Column str, Column pattern, Column escapeChar)
pattern
with escapeChar
case-insensitively, null if any
arguments are null, false otherwise.
str
- (undocumented)pattern
- (undocumented)escapeChar
- (undocumented)public static Column ilike(Column str, Column pattern)
pattern
with escapeChar
('\') case-insensitively, null if any
arguments are null, false otherwise.
str
- (undocumented)pattern
- (undocumented)public static Column lcase(Column str)
str
with all characters changed to lowercase.
str
- (undocumented)public static Column ucase(Column str)
str
with all characters changed to uppercase.
str
- (undocumented)public static Column left(Column str, Column len)
len
(len
can be string type) characters from the string str
,
if len
is less or equal than 0 the result is an empty string.
str
- (undocumented)len
- (undocumented)public static Column right(Column str, Column len)
len
(len
can be string type) characters from the string str
,
if len
is less or equal than 0 the result is an empty string.
str
- (undocumented)len
- (undocumented)public static Column add_months(Column startDate, int numMonths)
numMonths
after startDate
.
startDate
- A date, timestamp or string. If a string, the data must be in a format that
can be cast to a date, such as yyyy-MM-dd
or yyyy-MM-dd HH:mm:ss.SSSS
numMonths
- The number of months to add to startDate
, can be negative to subtract monthsstartDate
was a string that could not be cast to a datepublic static Column add_months(Column startDate, Column numMonths)
numMonths
after startDate
.
startDate
- A date, timestamp or string. If a string, the data must be in a format that
can be cast to a date, such as yyyy-MM-dd
or yyyy-MM-dd HH:mm:ss.SSSS
numMonths
- A column of the number of months to add to startDate
, can be negative to
subtract monthsstartDate
was a string that could not be cast to a datepublic static Column curdate()
public static Column current_date()
public static Column current_timezone()
public static Column current_timestamp()
public static Column now()
public static Column localtimestamp()
public static Column date_format(Column dateExpr, String format)
See Datetime Patterns for valid date and time format patterns
dateExpr
- A date, timestamp or string. If a string, the data must be in a format that
can be cast to a timestamp, such as yyyy-MM-dd
or yyyy-MM-dd HH:mm:ss.SSSS
format
- A pattern dd.MM.yyyy
would return a string like 18.03.1993
dateExpr
was a string that could not be cast to a timestampIllegalArgumentException
- if the format
pattern is invalidyear
whenever possible as they benefit from a
specialized implementation.public static Column date_add(Column start, int days)
days
days after start
start
- A date, timestamp or string. If a string, the data must be in a format that
can be cast to a date, such as yyyy-MM-dd
or yyyy-MM-dd HH:mm:ss.SSSS
days
- The number of days to add to start
, can be negative to subtract daysstart
was a string that could not be cast to a datepublic static Column date_add(Column start, Column days)
days
days after start
start
- A date, timestamp or string. If a string, the data must be in a format that
can be cast to a date, such as yyyy-MM-dd
or yyyy-MM-dd HH:mm:ss.SSSS
days
- A column of the number of days to add to start
, can be negative to subtract daysstart
was a string that could not be cast to a datepublic static Column dateadd(Column start, Column days)
days
days after start
start
- A date, timestamp or string. If a string, the data must be in a format that
can be cast to a date, such as yyyy-MM-dd
or yyyy-MM-dd HH:mm:ss.SSSS
days
- A column of the number of days to add to start
, can be negative to subtract daysstart
was a string that could not be cast to a datepublic static Column date_sub(Column start, int days)
days
days before start
start
- A date, timestamp or string. If a string, the data must be in a format that
can be cast to a date, such as yyyy-MM-dd
or yyyy-MM-dd HH:mm:ss.SSSS
days
- The number of days to subtract from start
, can be negative to add daysstart
was a string that could not be cast to a datepublic static Column date_sub(Column start, Column days)
days
days before start
start
- A date, timestamp or string. If a string, the data must be in a format that
can be cast to a date, such as yyyy-MM-dd
or yyyy-MM-dd HH:mm:ss.SSSS
days
- A column of the number of days to subtract from start
, can be negative to add
daysstart
was a string that could not be cast to a datepublic static Column datediff(Column end, Column start)
start
to end
.
Only considers the date part of the input. For example:
dateddiff("2018-01-10 00:00:00", "2018-01-09 23:59:59")
// returns 1
end
- A date, timestamp or string. If a string, the data must be in a format that
can be cast to a date, such as yyyy-MM-dd
or yyyy-MM-dd HH:mm:ss.SSSS
start
- A date, timestamp or string. If a string, the data must be in a format that
can be cast to a date, such as yyyy-MM-dd
or yyyy-MM-dd HH:mm:ss.SSSS
end
or start
were strings that could not be cast to
a date. Negative if end
is before start
public static Column date_diff(Column end, Column start)
start
to end
.
Only considers the date part of the input. For example:
dateddiff("2018-01-10 00:00:00", "2018-01-09 23:59:59")
// returns 1
end
- A date, timestamp or string. If a string, the data must be in a format that
can be cast to a date, such as yyyy-MM-dd
or yyyy-MM-dd HH:mm:ss.SSSS
start
- A date, timestamp or string. If a string, the data must be in a format that
can be cast to a date, such as yyyy-MM-dd
or yyyy-MM-dd HH:mm:ss.SSSS
end
or start
were strings that could not be cast to
a date. Negative if end
is before start
public static Column date_from_unix_date(Column days)
days
since 1970-01-01.
days
- (undocumented)public static Column year(Column e)
e
- (undocumented)public static Column quarter(Column e)
e
- (undocumented)public static Column month(Column e)
e
- (undocumented)public static Column dayofweek(Column e)
e
- (undocumented)public static Column dayofmonth(Column e)
e
- (undocumented)public static Column day(Column e)
e
- (undocumented)public static Column dayofyear(Column e)
e
- (undocumented)public static Column hour(Column e)
e
- (undocumented)public static Column extract(Column field, Column source)
field
- selects which part of the source should be extracted.source
- a date/timestamp or interval column from where field
should be extracted.public static Column date_part(Column field, Column source)
field
- selects which part of the source should be extracted, and supported string values
are as same as the fields of the equivalent function extract
.source
- a date/timestamp or interval column from where field
should be extracted.public static Column datepart(Column field, Column source)
field
- selects which part of the source should be extracted, and supported string values
are as same as the fields of the equivalent function EXTRACT
.source
- a date/timestamp or interval column from where field
should be extracted.public static Column last_day(Column e)
e
- A date, timestamp or string. If a string, the data must be in a format that can be
cast to a date, such as yyyy-MM-dd
or yyyy-MM-dd HH:mm:ss.SSSS
public static Column minute(Column e)
e
- (undocumented)public static Column weekday(Column e)
e
- (undocumented)public static Column make_date(Column year, Column month, Column day)
year
- (undocumented)month
- (undocumented)day
- (undocumented)public static Column months_between(Column end, Column start)
start
and end
.
A whole number is returned if both inputs have the same day of month or both are the last day of their respective months. Otherwise, the difference is calculated assuming 31 days per month.
For example:
months_between("2017-11-14", "2017-07-14") // returns 4.0
months_between("2017-01-01", "2017-01-10") // returns 0.29032258
months_between("2017-06-01", "2017-06-16 12:00:00") // returns -0.5
end
- A date, timestamp or string. If a string, the data must be in a format that can
be cast to a timestamp, such as yyyy-MM-dd
or yyyy-MM-dd HH:mm:ss.SSSS
start
- A date, timestamp or string. If a string, the data must be in a format that can
cast to a timestamp, such as yyyy-MM-dd
or yyyy-MM-dd HH:mm:ss.SSSS
end
or start
were strings that could not be cast to a
timestamp. Negative if end
is before start
public static Column months_between(Column end, Column start, boolean roundOff)
end
and start
. If roundOff
is set to true, the
result is rounded off to 8 digits; it is not rounded otherwise.end
- (undocumented)start
- (undocumented)roundOff
- (undocumented)public static Column next_day(Column date, String dayOfWeek)
date
column that is on the
specified day of the week.
For example, next_day('2015-07-27', "Sunday")
returns 2015-08-02 because that is the first
Sunday after 2015-07-27.
date
- A date, timestamp or string. If a string, the data must be in a format that
can be cast to a date, such as yyyy-MM-dd
or yyyy-MM-dd HH:mm:ss.SSSS
dayOfWeek
- Case insensitive, and accepts: "Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"date
was a string that could not be cast to a date or if
dayOfWeek
was an invalid valuepublic static Column next_day(Column date, Column dayOfWeek)
date
column that is on the
specified day of the week.
For example, next_day('2015-07-27', "Sunday")
returns 2015-08-02 because that is the first
Sunday after 2015-07-27.
date
- A date, timestamp or string. If a string, the data must be in a format that
can be cast to a date, such as yyyy-MM-dd
or yyyy-MM-dd HH:mm:ss.SSSS
dayOfWeek
- A column of the day of week. Case insensitive, and accepts: "Mon", "Tue",
"Wed", "Thu", "Fri", "Sat", "Sun"date
was a string that could not be cast to a date or if
dayOfWeek
was an invalid valuepublic static Column second(Column e)
e
- (undocumented)public static Column weekofyear(Column e)
A week is considered to start on a Monday and week 1 is the first week with more than 3 days, as defined by ISO 8601
e
- (undocumented)public static Column from_unixtime(Column ut)
ut
- A number of a type that is castable to a long, such as string or integer. Can be
negative for timestamps before the unix epochpublic static Column from_unixtime(Column ut, String f)
See Datetime Patterns for valid date and time format patterns
ut
- A number of a type that is castable to a long, such as string or integer. Can be
negative for timestamps before the unix epochf
- A date time pattern that the input will be formatted tout
was a string that could not be cast to a long or f
was
an invalid date time patternpublic static Column unix_timestamp()
unix_timestamp
within the same query return the same value
(i.e. the current timestamp is calculated at the start of query evaluation).
public static Column unix_timestamp(Column s)
s
- A date, timestamp or string. If a string, the data must be in the
yyyy-MM-dd HH:mm:ss
formatpublic static Column unix_timestamp(Column s, String p)
See Datetime Patterns for valid date and time format patterns
s
- A date, timestamp or string. If a string, the data must be in a format that can be
cast to a date, such as yyyy-MM-dd
or yyyy-MM-dd HH:mm:ss.SSSS
p
- A date time pattern detailing the format of s
when s
is a strings
was a string that could not be cast to a date or p
was
an invalid formatpublic static Column to_timestamp(Column s)
TimestampType
.
s
- A date, timestamp or string. If a string, the data must be in a format that can be
cast to a timestamp, such as yyyy-MM-dd
or yyyy-MM-dd HH:mm:ss.SSSS
public static Column to_timestamp(Column s, String fmt)
See Datetime Patterns for valid date and time format patterns
s
- A date, timestamp or string. If a string, the data must be in a format that can be
cast to a timestamp, such as yyyy-MM-dd
or yyyy-MM-dd HH:mm:ss.SSSS
fmt
- A date time pattern detailing the format of s
when s
is a strings
was a string that could not be cast to a timestamp or
fmt
was an invalid formatpublic static Column try_to_timestamp(Column s, Column format)
s
with the format
to a timestamp. The function always returns null on an
invalid input with/
without ANSI SQL mode enabled. The result data type is consistent with
the value of configuration spark.sql.timestampType
.
s
- (undocumented)format
- (undocumented)public static Column try_to_timestamp(Column s)
s
to a timestamp. The function always returns null on an invalid
input with/
without ANSI SQL mode enabled. It follows casting rules to a timestamp. The
result data type is consistent with the value of configuration spark.sql.timestampType
.
s
- (undocumented)public static Column to_date(Column e)
DateType
by casting rules to DateType
.
e
- (undocumented)public static Column to_date(Column e, String fmt)
DateType
with a specified format
See Datetime Patterns for valid date and time format patterns
e
- A date, timestamp or string. If a string, the data must be in a format that can be
cast to a date, such as yyyy-MM-dd
or yyyy-MM-dd HH:mm:ss.SSSS
fmt
- A date time pattern detailing the format of e
when e
is a stringe
was a string that could not be cast to a date or fmt
was an
invalid formatpublic static Column unix_date(Column e)
e
- (undocumented)public static Column unix_micros(Column e)
e
- (undocumented)public static Column unix_millis(Column e)
e
- (undocumented)public static Column unix_seconds(Column e)
e
- (undocumented)public static Column trunc(Column date, String format)
For example, trunc("2018-11-19 12:01:19", "year")
returns 2018-01-01
date
- A date, timestamp or string. If a string, the data must be in a format that can be
cast to a date, such as yyyy-MM-dd
or yyyy-MM-dd HH:mm:ss.SSSS
format:
- 'year', 'yyyy', 'yy' to truncate by year,
or 'month', 'mon', 'mm' to truncate by month
Other options are: 'week', 'quarter'
date
was a string that could not be cast to a date or format
was an invalid valuepublic static Column date_trunc(String format, Column timestamp)
For example, date_trunc("year", "2018-11-19 12:01:19")
returns 2018-01-01 00:00:00
format:
- 'year', 'yyyy', 'yy' to truncate by year,
'month', 'mon', 'mm' to truncate by month,
'day', 'dd' to truncate by day,
Other options are:
'microsecond', 'millisecond', 'second', 'minute', 'hour', 'week', 'quarter'timestamp
- A date, timestamp or string. If a string, the data must be in a format that
can be cast to a timestamp, such as yyyy-MM-dd
or yyyy-MM-dd HH:mm:ss.SSSS
timestamp
was a string that could not be cast to a timestamp
or format
was an invalid valuepublic static Column from_utc_timestamp(Column ts, String tz)
ts
- A date, timestamp or string. If a string, the data must be in a format that can be
cast to a timestamp, such as yyyy-MM-dd
or yyyy-MM-dd HH:mm:ss.SSSS
tz
- A string detailing the time zone ID that the input should be adjusted to. It should
be in the format of either region-based zone IDs or zone offsets. Region IDs must
have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in
the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are
supported as aliases of '+00:00'. Other short names are not recommended to use
because they can be ambiguous.ts
was a string that could not be cast to a timestamp or
tz
was an invalid valuepublic static Column from_utc_timestamp(Column ts, Column tz)
ts
- (undocumented)tz
- (undocumented)public static Column to_utc_timestamp(Column ts, String tz)
ts
- A date, timestamp or string. If a string, the data must be in a format that can be
cast to a timestamp, such as yyyy-MM-dd
or yyyy-MM-dd HH:mm:ss.SSSS
tz
- A string detailing the time zone ID that the input should be adjusted to. It should
be in the format of either region-based zone IDs or zone offsets. Region IDs must
have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in
the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are
supported as aliases of '+00:00'. Other short names are not recommended to use
because they can be ambiguous.ts
was a string that could not be cast to a timestamp or
tz
was an invalid valuepublic static Column to_utc_timestamp(Column ts, Column tz)
ts
- (undocumented)tz
- (undocumented)public static Column window(Column timeColumn, String windowDuration, String slideDuration, String startTime)
val df = ... // schema => timestamp: TimestampType, stockId: StringType, price: DoubleType
df.groupBy(window($"timestamp", "1 minute", "10 seconds", "5 seconds"), $"stockId")
.agg(mean("price"))
The windows will look like:
09:00:05-09:01:05
09:00:15-09:01:15
09:00:25-09:01:25 ...
For a streaming query, you may use the function current_timestamp
to generate windows on
processing time.
timeColumn
- The column or the expression to use as the timestamp for windowing by time.
The time column must be of TimestampType or TimestampNTZType.windowDuration
- A string specifying the width of the window, e.g. 10 minutes
,
1 second
. Check org.apache.spark.unsafe.types.CalendarInterval
for
valid duration identifiers. Note that the duration is a fixed length of
time, and does not vary over time according to a calendar. For example,
1 day
always means 86,400,000 milliseconds, not a calendar day.slideDuration
- A string specifying the sliding interval of the window, e.g. 1 minute
.
A new window will be generated every slideDuration
. Must be less than
or equal to the windowDuration
. Check
org.apache.spark.unsafe.types.CalendarInterval
for valid duration
identifiers. This duration is likewise absolute, and does not vary
according to a calendar.startTime
- The offset with respect to 1970-01-01 00:00:00 UTC with which to start
window intervals. For example, in order to have hourly tumbling windows that
start 15 minutes past the hour, e.g. 12:15-13:15, 13:15-14:15... provide
startTime
as 15 minutes
.
public static Column window(Column timeColumn, String windowDuration, String slideDuration)
val df = ... // schema => timestamp: TimestampType, stockId: StringType, price: DoubleType
df.groupBy(window($"timestamp", "1 minute", "10 seconds"), $"stockId")
.agg(mean("price"))
The windows will look like:
09:00:00-09:01:00
09:00:10-09:01:10
09:00:20-09:01:20 ...
For a streaming query, you may use the function current_timestamp
to generate windows on
processing time.
timeColumn
- The column or the expression to use as the timestamp for windowing by time.
The time column must be of TimestampType or TimestampNTZType.windowDuration
- A string specifying the width of the window, e.g. 10 minutes
,
1 second
. Check org.apache.spark.unsafe.types.CalendarInterval
for
valid duration identifiers. Note that the duration is a fixed length of
time, and does not vary over time according to a calendar. For example,
1 day
always means 86,400,000 milliseconds, not a calendar day.slideDuration
- A string specifying the sliding interval of the window, e.g. 1 minute
.
A new window will be generated every slideDuration
. Must be less than
or equal to the windowDuration
. Check
org.apache.spark.unsafe.types.CalendarInterval
for valid duration
identifiers. This duration is likewise absolute, and does not vary
according to a calendar.
public static Column window(Column timeColumn, String windowDuration)
val df = ... // schema => timestamp: TimestampType, stockId: StringType, price: DoubleType
df.groupBy(window($"timestamp", "1 minute"), $"stockId")
.agg(mean("price"))
The windows will look like:
09:00:00-09:01:00
09:01:00-09:02:00
09:02:00-09:03:00 ...
For a streaming query, you may use the function current_timestamp
to generate windows on
processing time.
timeColumn
- The column or the expression to use as the timestamp for windowing by time.
The time column must be of TimestampType or TimestampNTZType.windowDuration
- A string specifying the width of the window, e.g. 10 minutes
,
1 second
. Check org.apache.spark.unsafe.types.CalendarInterval
for
valid duration identifiers.
public static Column window_time(Column windowColumn)
The window column is of StructType { start: Timestamp, end: Timestamp } where start is inclusive and end is exclusive. Since event time can support microsecond precision, window_time(window) = window.end - 1 microsecond.
windowColumn
- The window column (typically produced by window aggregation) of type
StructType { start: Timestamp, end: Timestamp }
public static Column session_window(Column timeColumn, String gapDuration)
Session window is one of dynamic windows, which means the length of window is varying according to the given inputs. The length of session window is defined as "the timestamp of latest input of the session + gap duration", so when the new inputs are bound to the current session window, the end time of session window can be expanded according to the new inputs.
Windows can support microsecond precision. gapDuration in the order of months are not supported.
For a streaming query, you may use the function current_timestamp
to generate windows on
processing time.
timeColumn
- The column or the expression to use as the timestamp for windowing by time.
The time column must be of TimestampType or TimestampNTZType.gapDuration
- A string specifying the timeout of the session, e.g. 10 minutes
,
1 second
. Check org.apache.spark.unsafe.types.CalendarInterval
for
valid duration identifiers.
public static Column session_window(Column timeColumn, Column gapDuration)
Session window is one of dynamic windows, which means the length of window is varying according to the given inputs. For static gap duration, the length of session window is defined as "the timestamp of latest input of the session + gap duration", so when the new inputs are bound to the current session window, the end time of session window can be expanded according to the new inputs.
Besides a static gap duration value, users can also provide an expression to specify gap duration dynamically based on the input row. With dynamic gap duration, the closing of a session window does not depend on the latest input anymore. A session window's range is the union of all events' ranges which are determined by event start time and evaluated gap duration during the query execution. Note that the rows with negative or zero gap duration will be filtered out from the aggregation.
Windows can support microsecond precision. gapDuration in the order of months are not supported.
For a streaming query, you may use the function current_timestamp
to generate windows on
processing time.
timeColumn
- The column or the expression to use as the timestamp for windowing by time.
The time column must be of TimestampType or TimestampNTZType.gapDuration
- A column specifying the timeout of the session. It could be static value,
e.g. 10 minutes
, 1 second
, or an expression/UDF that specifies gap
duration dynamically based on the input row.
public static Column timestamp_seconds(Column e)
e
- (undocumented)public static Column timestamp_millis(Column e)
e
- (undocumented)public static Column timestamp_micros(Column e)
e
- (undocumented)public static Column to_timestamp_ltz(Column timestamp, Column format)
timestamp
expression with the format
expression
to a timestamp without time zone. Returns null with invalid input.
timestamp
- (undocumented)format
- (undocumented)public static Column to_timestamp_ltz(Column timestamp)
timestamp
expression with the default format to a timestamp without time zone.
The default format follows casting rules to a timestamp. Returns null with invalid input.
timestamp
- (undocumented)public static Column to_timestamp_ntz(Column timestamp, Column format)
timestamp_str
expression with the format
expression
to a timestamp without time zone. Returns null with invalid input.
timestamp
- (undocumented)format
- (undocumented)public static Column to_timestamp_ntz(Column timestamp)
timestamp
expression with the default format to a timestamp without time zone.
The default format follows casting rules to a timestamp. Returns null with invalid input.
timestamp
- (undocumented)public static Column to_unix_timestamp(Column e, Column format)
e
- (undocumented)format
- (undocumented)public static Column to_unix_timestamp(Column e)
e
- (undocumented)public static Column array_contains(Column column, Object value)
value
, and false otherwise.column
- (undocumented)value
- (undocumented)public static Column array_append(Column column, Object element)
column
- (undocumented)element
- (undocumented)public static Column arrays_overlap(Column a1, Column a2)
true
if a1
and a2
have at least one non-null element in common. If not and both
the arrays are non-empty and any of them contains a null
, it returns null
. It returns
false
otherwise.a1
- (undocumented)a2
- (undocumented)public static Column slice(Column x, int start, int length)
x
from index start
(or starting from the
end if start
is negative) with the specified length
.
x
- the array column to be slicedstart
- the starting indexlength
- the length of the slice
public static Column slice(Column x, Column start, Column length)
x
from index start
(or starting from the
end if start
is negative) with the specified length
.
x
- the array column to be slicedstart
- the starting indexlength
- the length of the slice
public static Column array_join(Column column, String delimiter, String nullReplacement)
column
using the delimiter
. Null values are replaced with
nullReplacement
.column
- (undocumented)delimiter
- (undocumented)nullReplacement
- (undocumented)public static Column array_join(Column column, String delimiter)
column
using the delimiter
.column
- (undocumented)delimiter
- (undocumented)public static Column concat(scala.collection.Seq<Column> exprs)
exprs
- (undocumented)public static Column array_position(Column column, Object value)
column
- (undocumented)value
- (undocumented)public static Column element_at(Column column, Object value)
column
- (undocumented)value
- (undocumented)public static Column try_element_at(Column column, Column value)
(map, key) - Returns value for given key. The function always returns NULL if the key is not contained in the map.
column
- (undocumented)value
- (undocumented)public static Column get(Column column, Column index)
column
- (undocumented)index
- (undocumented)public static Column array_sort(Column e)
e
- (undocumented)public static Column array_sort(Column e, scala.Function2<Column,Column,Column> comparator)
e
- (undocumented)comparator
- (undocumented)public static Column array_remove(Column column, Object element)
column
- (undocumented)element
- (undocumented)public static Column array_compact(Column column)
column
- (undocumented)public static Column array_prepend(Column column, Object element)
column
- (undocumented)element
- (undocumented)public static Column array_distinct(Column e)
e
- (undocumented)public static Column array_intersect(Column col1, Column col2)
col1
- (undocumented)col2
- (undocumented)public static Column array_insert(Column arr, Column pos, Column value)
arr
- (undocumented)pos
- (undocumented)value
- (undocumented)public static Column array_union(Column col1, Column col2)
col1
- (undocumented)col2
- (undocumented)public static Column array_except(Column col1, Column col2)
col1
- (undocumented)col2
- (undocumented)public static Column transform(Column column, scala.Function1<Column,Column> f)
df.select(transform(col("i"), x => x + 1))
column
- the input array columnf
- col => transformed_col, the lambda function to transform the input column
public static Column transform(Column column, scala.Function2<Column,Column,Column> f)
df.select(transform(col("i"), (x, i) => x + i))
column
- the input array columnf
- (col, index) => transformed_col, the lambda function to filter the input column
given the index. Indices start at 0.
public static Column exists(Column column, scala.Function1<Column,Column> f)
df.select(exists(col("i"), _ % 2 === 0))
column
- the input array columnf
- col => predicate, the Boolean predicate to check the input column
public static Column forall(Column column, scala.Function1<Column,Column> f)
df.select(forall(col("i"), x => x % 2 === 0))
column
- the input array columnf
- col => predicate, the Boolean predicate to check the input column
public static Column filter(Column column, scala.Function1<Column,Column> f)
df.select(filter(col("s"), x => x % 2 === 0))
column
- the input array columnf
- col => predicate, the Boolean predicate to filter the input column
public static Column filter(Column column, scala.Function2<Column,Column,Column> f)
df.select(filter(col("s"), (x, i) => i % 2 === 0))
column
- the input array columnf
- (col, index) => predicate, the Boolean predicate to filter the input column
given the index. Indices start at 0.
public static Column aggregate(Column expr, Column initialValue, scala.Function2<Column,Column,Column> merge, scala.Function1<Column,Column> finish)
df.select(aggregate(col("i"), lit(0), (acc, x) => acc + x, _ * 10))
expr
- the input array columninitialValue
- the initial valuemerge
- (combined_value, input_value) => combined_value, the merge function to merge
an input value to the combined_valuefinish
- combined_value => final_value, the lambda function to convert the combined value
of all inputs to final result
public static Column aggregate(Column expr, Column initialValue, scala.Function2<Column,Column,Column> merge)
df.select(aggregate(col("i"), lit(0), (acc, x) => acc + x))
expr
- the input array columninitialValue
- the initial valuemerge
- (combined_value, input_value) => combined_value, the merge function to merge
an input value to the combined_valuepublic static Column reduce(Column expr, Column initialValue, scala.Function2<Column,Column,Column> merge, scala.Function1<Column,Column> finish)
df.select(aggregate(col("i"), lit(0), (acc, x) => acc + x, _ * 10))
expr
- the input array columninitialValue
- the initial valuemerge
- (combined_value, input_value) => combined_value, the merge function to merge
an input value to the combined_valuefinish
- combined_value => final_value, the lambda function to convert the combined value
of all inputs to final result
public static Column reduce(Column expr, Column initialValue, scala.Function2<Column,Column,Column> merge)
df.select(aggregate(col("i"), lit(0), (acc, x) => acc + x))
expr
- the input array columninitialValue
- the initial valuemerge
- (combined_value, input_value) => combined_value, the merge function to merge
an input value to the combined_valuepublic static Column zip_with(Column left, Column right, scala.Function2<Column,Column,Column> f)
df.select(zip_with(df1("val1"), df1("val2"), (x, y) => x + y))
left
- the left input array columnright
- the right input array columnf
- (lCol, rCol) => col, the lambda function to merge two input columns into one column
public static Column transform_keys(Column expr, scala.Function2<Column,Column,Column> f)
df.select(transform_keys(col("i"), (k, v) => k + v))
expr
- the input map columnf
- (key, value) => new_key, the lambda function to transform the key of input map column
public static Column transform_values(Column expr, scala.Function2<Column,Column,Column> f)
df.select(transform_values(col("i"), (k, v) => k + v))
expr
- the input map columnf
- (key, value) => new_value, the lambda function to transform the value of input map
column
public static Column map_filter(Column expr, scala.Function2<Column,Column,Column> f)
df.select(map_filter(col("m"), (k, v) => k * 10 === v))
expr
- the input map columnf
- (key, value) => predicate, the Boolean predicate to filter the input map column
public static Column map_zip_with(Column left, Column right, scala.Function3<Column,Column,Column,Column> f)
df.select(map_zip_with(df("m1"), df("m2"), (k, v1, v2) => k === v1 + v2))
left
- the left input map columnright
- the right input map columnf
- (key, value1, value2) => new_value, the lambda function to merge the map values
public static Column explode(Column e)
col
for elements in the array and
key
and value
for elements in the map unless specified otherwise.
e
- (undocumented)public static Column explode_outer(Column e)
col
for elements in the array and
key
and value
for elements in the map unless specified otherwise.
Unlike explode, if the array/map is null or empty then null is produced.