Class functions
You can call the functions defined here by two ways: _FUNC_(...)
and
functions.expr("_FUNC_(...)")
.
As an example, regr_count
is a function that is defined here. You can use
regr_count(col("yCol", col("xCol")))
to invoke the regr_count
function. This way the
programming language's compiler ensures regr_count
exists and is of the proper form. You can
also use expr("regr_count(yCol, xCol)")
function to invoke the same function. In this case,
Spark itself will ensure regr_count
exists when it analyzes the query.
You can find the entire list of functions at SQL API documentation of your Spark version, see also the latest list
This function APIs usually have methods with Column
signature only because it can support not
only Column
but also other types such as a native string. The other variants currently exist
for historical reasons.
- Since:
- 1.3.0
-
Nested Class Summary
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionstatic Column
Computes the absolute value of a numeric value.static Column
static Column
static Column
static Column
static Column
add_months
(Column startDate, int numMonths) Returns the date that isnumMonths
afterstartDate
.static Column
add_months
(Column startDate, Column numMonths) Returns the date that isnumMonths
afterstartDate
.static Column
aes_decrypt
(Column input, Column key) Returns a decrypted value ofinput
.static Column
aes_decrypt
(Column input, Column key, Column mode) Returns a decrypted value ofinput
.static Column
aes_decrypt
(Column input, Column key, Column mode, Column padding) Returns a decrypted value ofinput
.static Column
Returns a decrypted value ofinput
using AES inmode
withpadding
.static Column
aes_encrypt
(Column input, Column key) Returns an encrypted value ofinput
.static Column
aes_encrypt
(Column input, Column key, Column mode) Returns an encrypted value ofinput
.static Column
aes_encrypt
(Column input, Column key, Column mode, Column padding) Returns an encrypted value ofinput
.static Column
Returns an encrypted value ofinput
.static Column
Returns an encrypted value ofinput
using AES in givenmode
with the specifiedpadding
.static Column
Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state.static Column
aggregate
(Column expr, Column initialValue, scala.Function2<Column, Column, Column> merge, scala.Function1<Column, Column> finish) Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state.static Column
Aggregate function: returns true if at least one value ofe
is true.static Column
Aggregate function: returns some value ofe
for a group of rows.static Column
Aggregate function: returns some value ofe
for a group of rows.static Column
approx_count_distinct
(String columnName) Aggregate function: returns the approximate number of distinct items in a group.static Column
approx_count_distinct
(String columnName, double rsd) Aggregate function: returns the approximate number of distinct items in a group.static Column
Aggregate function: returns the approximate number of distinct items in a group.static Column
approx_count_distinct
(Column e, double rsd) Aggregate function: returns the approximate number of distinct items in a group.static Column
approx_percentile
(Column e, Column percentage, Column accuracy) Aggregate function: returns the approximatepercentile
of the numeric columncol
which is the smallest value in the orderedcol
values (sorted from least to greatest) such that no more thanpercentage
ofcol
values is less than the value or equal to that value.static Column
approxCountDistinct
(String columnName) Deprecated.Use approx_count_distinct.static Column
approxCountDistinct
(String columnName, double rsd) Deprecated.Use approx_count_distinct.static Column
Deprecated.Use approx_count_distinct.static Column
approxCountDistinct
(Column e, double rsd) Deprecated.Use approx_count_distinct.static Column
Creates a new array column.static Column
Creates a new array column.static Column
Creates a new array column.static Column
Creates a new array column.static Column
Aggregate function: returns a list of objects with duplicates.static Column
array_append
(Column column, Object element) Returns an ARRAY containing all elements from the source ARRAY as well as the new element.static Column
array_compact
(Column column) Remove all null elements from the given array.static Column
array_contains
(Column column, Object value) Returns null if the array is null, true if the array containsvalue
, and false otherwise.static Column
Removes duplicate values from the array.static Column
array_except
(Column col1, Column col2) Returns an array of the elements in the first array but not in the second array, without duplicates.static Column
array_insert
(Column arr, Column pos, Column value) Adds an item into a given array at a specified positionstatic Column
array_intersect
(Column col1, Column col2) Returns an array of the elements in the intersection of the given two arrays, without duplicates.static Column
array_join
(Column column, String delimiter) Concatenates the elements ofcolumn
using thedelimiter
.static Column
array_join
(Column column, String delimiter, String nullReplacement) Concatenates the elements ofcolumn
using thedelimiter
.static Column
Returns the maximum value in the array.static Column
Returns the minimum value in the array.static Column
array_position
(Column column, Object value) Locates the position of the first occurrence of the value in the given array as long.static Column
array_prepend
(Column column, Object element) Returns an array containing value as well as all elements from array.static Column
array_remove
(Column column, Object element) Remove all elements that equal to element from the given array.static Column
array_repeat
(Column e, int count) Creates an array containing the left argument repeated the number of times given by the right argument.static Column
array_repeat
(Column left, Column right) Creates an array containing the left argument repeated the number of times given by the right argument.static Column
array_size
(Column e) Returns the total number of elements in the array.static Column
array_sort
(Column e) Sorts the input array in ascending order.static Column
array_sort
(Column e, scala.Function2<Column, Column, Column> comparator) Sorts the input array based on the given comparator function.static Column
array_union
(Column col1, Column col2) Returns an array of the elements in the union of the given two arrays, without duplicates.static Column
arrays_overlap
(Column a1, Column a2) Returnstrue
ifa1
anda2
have at least one non-null element in common.static Column
arrays_zip
(Column... e) Returns a merged array of structs in which the N-th struct contains all N-th values of input arrays.static Column
arrays_zip
(scala.collection.immutable.Seq<Column> e) Returns a merged array of structs in which the N-th struct contains all N-th values of input arrays.static Column
Returns a sort expression based on ascending order of the column.static Column
asc_nulls_first
(String columnName) Returns a sort expression based on ascending order of the column, and null values return before non-null values.static Column
asc_nulls_last
(String columnName) Returns a sort expression based on ascending order of the column, and null values appear after non-null values.static Column
Computes the numeric value of the first character of the string column, and returns the result as an int column.static Column
static Column
static Column
static Column
static Column
Returns null if the condition is true, and throws an exception otherwise.static Column
assert_true
(Column c, Column e) Returns null if the condition is true; throws an exception with the error message otherwise.static Column
static Column
static Column
static Column
static Column
static Column
static Column
static Column
static Column
static Column
static Column
static Column
static Column
Aggregate function: returns the average of the values in a group.static Column
Aggregate function: returns the average of the values in a group.static Column
Computes the BASE64 encoding of a binary column and returns it as a string column.static Column
An expression that returns the string representation of the binary value of the given long column.static Column
An expression that returns the string representation of the binary value of the given long column.static Column
Aggregate function: returns the bitwise AND of all non-null input values, or null if none.static Column
Returns the number of bits that are set in the argument expr as an unsigned 64-bit integer, or NULL if the argument is NULL.static Column
Returns the value of the bit (0 or 1) at the specified position.static Column
bit_length
(Column e) Calculates the bit length for the specified string column.static Column
Aggregate function: returns the bitwise OR of all non-null input values, or null if none.static Column
Aggregate function: returns the bitwise XOR of all non-null input values, or null if none.static Column
Returns the bucket number for the given input column.static Column
Returns the bit position for the given input column.static Column
Returns a bitmap with the positions of the bits set from all the values from the input column.static Column
bitmap_count
(Column col) Returns the number of set bits in the input bitmap.static Column
bitmap_or_agg
(Column col) Returns a bitmap that is the bitwise OR of all of the bitmaps from the input column.static Column
Computes bitwise NOT (~) of a number.static Column
bitwiseNOT
(Column e) Deprecated.Use bitwise_not.static Column
Aggregate function: returns true if all values ofe
are true.static Column
Aggregate function: returns true if at least one value ofe
is true.broadcast
(DS df) Marks a DataFrame as small enough for use in broadcast joins.static Column
Returns the value of the columne
rounded to 0 decimal places with HALF_EVEN round mode.static Column
Round the value ofe
toscale
decimal places with HALF_EVEN round mode ifscale
is greater than or equal to 0 or at integral part whenscale
is less than 0.static Column
Round the value ofe
toscale
decimal places with HALF_EVEN round mode ifscale
is greater than or equal to 0 or at integral part whenscale
is less than 0.static Column
Removes the leading and trailing space characters fromstr
.static Column
Remove the leading and trailingtrim
characters fromstr
.static Column
(Java-specific) A transform for any type that partitions by a hash of the input column.static Column
(Java-specific) A transform for any type that partitions by a hash of the input column.static Column
call_function
(String funcName, Column... cols) Call a SQL function.static Column
call_function
(String funcName, scala.collection.immutable.Seq<Column> cols) Call a SQL function.static Column
Call an user-defined function.static Column
Call an user-defined function.static Column
Call an user-defined function.static Column
Deprecated.Use call_udf.static Column
Returns length of array or map.static Column
Computes the cube-root of the given column.static Column
Computes the cube-root of the given value.static Column
Computes the ceiling of the given value ofe
to 0 decimal places.static Column
Computes the ceiling of the given value ofe
to 0 decimal places.static Column
Computes the ceiling of the given value ofe
toscale
decimal places.static Column
Computes the ceiling of the given value ofe
to 0 decimal places.static Column
Computes the ceiling of the given value ofe
toscale
decimal places.static Column
char_length
(Column str) Returns the character length of string data or number of bytes of binary data.static Column
character_length
(Column str) Returns the character length of string data or number of bytes of binary data.static Column
Returns the ASCII character having the binary equivalent ton
.static Column
Returns the first column that is not null, or null if all inputs are null.static Column
Returns the first column that is not null, or null if all inputs are null.static Column
Returns aColumn
based on the given column name.static Column
Marks a given column with specified collation.static Column
Returns the collation name of a given column.static Column
collect_list
(String columnName) Aggregate function: returns a list of objects with duplicates.static Column
Aggregate function: returns a list of objects with duplicates.static Column
collect_set
(String columnName) Aggregate function: returns a set of objects with duplicate elements eliminated.static Column
Aggregate function: returns a set of objects with duplicate elements eliminated.static Column
Returns aColumn
based on the given column name.static Column
Concatenates multiple input columns together into a single column.static Column
Concatenates multiple input columns together into a single column.static Column
Concatenates multiple input string columns together into a single string column, using the given separator.static Column
Concatenates multiple input string columns together into a single string column, using the given separator.static Column
Returns a boolean.static Column
Convert a number in a string column from one base to another.static Column
convert_timezone
(Column targetTz, Column sourceTs) Converts the timestamp without time zonesourceTs
from the current time zone totargetTz
.static Column
convert_timezone
(Column sourceTz, Column targetTz, Column sourceTs) Converts the timestamp without time zonesourceTs
from thesourceTz
time zone totargetTz
.static Column
Aggregate function: returns the Pearson Correlation Coefficient for two columns.static Column
Aggregate function: returns the Pearson Correlation Coefficient for two columns.static Column
static Column
static Column
static Column
static Column
static TypedColumn<Object,
Object> Aggregate function: returns the number of items in a group.static Column
Aggregate function: returns the number of items in a group.static Column
count_distinct
(Column expr, Column... exprs) Aggregate function: returns the number of distinct items in a group.static Column
count_distinct
(Column expr, scala.collection.immutable.Seq<Column> exprs) Aggregate function: returns the number of distinct items in a group.static Column
Aggregate function: returns the number ofTRUE
values for the expression.static Column
count_min_sketch
(Column e, Column eps, Column confidence, Column seed) Returns a count-min sketch of a column with the given esp, confidence and seed.static Column
countDistinct
(String columnName, String... columnNames) Aggregate function: returns the number of distinct items in a group.static Column
countDistinct
(String columnName, scala.collection.immutable.Seq<String> columnNames) Aggregate function: returns the number of distinct items in a group.static Column
countDistinct
(Column expr, Column... exprs) Aggregate function: returns the number of distinct items in a group.static Column
countDistinct
(Column expr, scala.collection.immutable.Seq<Column> exprs) Aggregate function: returns the number of distinct items in a group.static Column
Aggregate function: returns the population covariance for two columns.static Column
Aggregate function: returns the population covariance for two columns.static Column
covar_samp
(String columnName1, String columnName2) Aggregate function: returns the sample covariance for two columns.static Column
covar_samp
(Column column1, Column column2) Aggregate function: returns the sample covariance for two columns.static Column
Calculates the cyclic redundancy check value (CRC32) of a binary column and returns the value as a bigint.static Column
static Column
Window function: returns the cumulative distribution of values within a window partition, i.e.static Column
curdate()
Returns the current date at the start of query evaluation as a date column.static Column
Returns the current catalog.static Column
Returns the current database.static Column
Returns the current date at the start of query evaluation as a date column.static Column
Returns the current schema.static Column
Returns the current timestamp at the start of query evaluation as a timestamp column.static Column
Returns the current session local timezone.static Column
Returns the user name of current execution context.static Column
Returns the date that isdays
days afterstart
static Column
Returns the date that isdays
days afterstart
static Column
Returns the number of days fromstart
toend
.static Column
date_format
(Column dateExpr, String format) Converts a date/timestamp/string to a value of string in the format specified by the date format given by the second argument.static Column
date_from_unix_date
(Column days) Create date from the number ofdays
since 1970-01-01.static Column
Extracts a part of the date/timestamp or interval source.static Column
Returns the date that isdays
days beforestart
static Column
Returns the date that isdays
days beforestart
static Column
date_trunc
(String format, Column timestamp) Returns timestamp truncated to the unit specified by the format.static Column
Returns the date that isdays
days afterstart
static Column
Returns the number of days fromstart
toend
.static Column
Extracts a part of the date/timestamp or interval source.static Column
Extracts the day of the month as an integer from a given date/timestamp/string.static Column
Extracts the three-letter abbreviated day name from a given date/timestamp/string.static Column
dayofmonth
(Column e) Extracts the day of the month as an integer from a given date/timestamp/string.static Column
Extracts the day of the week as an integer from a given date/timestamp/string.static Column
Extracts the day of the year as an integer from a given date/timestamp/string.static Column
(Java-specific) A transform for timestamps and dates to partition data into days.static Column
Computes the first argument into a string from a binary using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16', 'UTF-32').static Column
Converts an angle measured in radians to an approximately equivalent angle measured in degrees.static Column
Converts an angle measured in radians to an approximately equivalent angle measured in degrees.static Column
Window function: returns the rank of rows within a window partition, without any gaps.static Column
Returns a sort expression based on the descending order of the column.static Column
desc_nulls_first
(String columnName) Returns a sort expression based on the descending order of the column, and null values appear before non-null values.static Column
desc_nulls_last
(String columnName) Returns a sort expression based on the descending order of the column, and null values appear after non-null values.static Column
e()
Returns Euler's number.static Column
element_at
(Column column, Object value) Returns element of array at given index in value if column is array.static Column
Returns then
-th input, e.g., returnsinput2
whenn
is 2.static Column
Returns then
-th input, e.g., returnsinput2
whenn
is 2.static Column
Computes the first argument into a binary from a string using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16', 'UTF-32').static Column
Returns a boolean.static Column
equal_null
(Column col1, Column col2) Returns same result as the EQUAL(=) operator for non-null operands, but returns true if both are null, false if one of the them is null.static Column
Aggregate function: returns true if all values ofe
are true.static Column
Returns whether a predicate holds for one or more elements in the array.static Column
Computes the exponential of the given column.static Column
Computes the exponential of the given value.static Column
Creates a new row for each element in the given array or map column.static Column
Creates a new row for each element in the given array or map column.static Column
Computes the exponential of the given column minus one.static Column
Computes the exponential of the given value minus one.static Column
Parses the expression string into the column that it represents, similar toDataset.selectExpr(java.lang.String...)
.static Column
Extracts a part of the date/timestamp or interval source.static Column
Computes the factorial of the given value.static Column
Returns an array of elements for which a predicate holds in a given array.static Column
Returns an array of elements for which a predicate holds in a given array.static Column
find_in_set
(Column str, Column strArray) Returns the index (1-based) of the given string (str
) in the comma-delimited list (strArray
).static Column
Aggregate function: returns the first value of a column in a group.static Column
Aggregate function: returns the first value of a column in a group.static Column
Aggregate function: returns the first value in a group.static Column
Aggregate function: returns the first value in a group.static Column
Aggregate function: returns the first value in a group.static Column
first_value
(Column e, Column ignoreNulls) Aggregate function: returns the first value in a group.static Column
Creates a single array from an array of arrays.static Column
Computes the floor of the given column value to 0 decimal places.static Column
Computes the floor of the given value ofe
to 0 decimal places.static Column
Computes the floor of the given value ofe
toscale
decimal places.static Column
Returns whether a predicate holds for every element in the array.static Column
format_number
(Column x, int d) Formats numeric column x to a format like '#,###,###.##', rounded to d decimal places with HALF_EVEN round mode, and returns the result as a string column.static Column
format_string
(String format, Column... arguments) Formats the arguments in printf-style and returns the result as a string column.static Column
format_string
(String format, scala.collection.immutable.Seq<Column> arguments) Formats the arguments in printf-style and returns the result as a string column.static Column
(Java-specific) Parses a column containing a CSV string into aStructType
with the specified schema.static Column
from_csv
(Column e, StructType schema, scala.collection.immutable.Map<String, String> options) Parses a column containing a CSV string into aStructType
with the specified schema.static Column
(Java-specific) Parses a column containing a JSON string into aMapType
withStringType
as keys type,StructType
orArrayType
with the specified schema.static Column
(Scala-specific) Parses a column containing a JSON string into aMapType
withStringType
as keys type,StructType
orArrayType
with the specified schema.static Column
(Scala-specific) Parses a column containing a JSON string into aMapType
withStringType
as keys type,StructType
orArrayType
ofStructType
s with the specified schema.static Column
(Java-specific) Parses a column containing a JSON string into aMapType
withStringType
as keys type,StructType
orArrayType
ofStructType
s with the specified schema.static Column
Parses a column containing a JSON string into aMapType
withStringType
as keys type,StructType
orArrayType
with the specified schema.static Column
(Java-specific) Parses a column containing a JSON string into aMapType
withStringType
as keys type,StructType
orArrayType
with the specified schema.static Column
(Scala-specific) Parses a column containing a JSON string into aMapType
withStringType
as keys type,StructType
orArrayType
with the specified schema.static Column
from_json
(Column e, StructType schema) Parses a column containing a JSON string into aStructType
with the specified schema.static Column
(Java-specific) Parses a column containing a JSON string into aStructType
with the specified schema.static Column
from_json
(Column e, StructType schema, scala.collection.immutable.Map<String, String> options) (Scala-specific) Parses a column containing a JSON string into aStructType
with the specified schema.static Column
from_unixtime
(Column ut) Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the yyyy-MM-dd HH:mm:ss format.static Column
from_unixtime
(Column ut, String f) Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the given format.static Column
from_utc_timestamp
(Column ts, String tz) Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders that time as a timestamp in the given time zone.static Column
from_utc_timestamp
(Column ts, Column tz) Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders that time as a timestamp in the given time zone.static Column
(Java-specific) Parses a column containing a XML string into aStructType
with the specified schema.static Column
(Java-specific) Parses a column containing a XML string into aStructType
with the specified schema.static Column
(Java-specific) Parses a column containing a XML string into aStructType
with the specified schema.static Column
from_xml
(Column e, StructType schema) Parses a column containing a XML string into the data type corresponding to the specified schema.static Column
Parses a column containing a XML string into the data type corresponding to the specified schema.static Column
Returns element of array at given (0-based) index.static Column
get_json_object
(Column e, String path) Extracts json object from a json string based on json path specified, and returns json string of the extracted json object.static Column
Returns the value of the bit (0 or 1) at the specified position.static Column
Returns the greatest value of the list of column names, skipping null values.static Column
Returns the greatest value of the list of column names, skipping null values.static Column
Returns the greatest value of the list of values, skipping null values.static Column
Returns the greatest value of the list of values, skipping null values.static Column
Aggregate function: indicates whether a specified column in a GROUP BY list is aggregated or not, returns 1 for aggregated or 0 for not aggregated in the result set.static Column
Aggregate function: indicates whether a specified column in a GROUP BY list is aggregated or not, returns 1 for aggregated or 0 for not aggregated in the result set.static Column
grouping_id
(String colName, scala.collection.immutable.Seq<String> colNames) Aggregate function: returns the level of grouping, equals tostatic Column
grouping_id
(scala.collection.immutable.Seq<Column> cols) Aggregate function: returns the level of grouping, equals tostatic Column
Calculates the hash code of given columns, and returns the result as an int column.static Column
Calculates the hash code of given columns, and returns the result as an int column.static Column
Computes hex value of the given column.static Column
histogram_numeric
(Column e, Column nBins) Aggregate function: computes a histogram on numeric 'expr' using nb bins.static Column
hll_sketch_agg
(String columnName) Aggregate function: returns the updatable binary representation of the Datasketches HllSketch configured with default lgConfigK value.static Column
hll_sketch_agg
(String columnName, int lgConfigK) Aggregate function: returns the updatable binary representation of the Datasketches HllSketch configured with lgConfigK arg.static Column
Aggregate function: returns the updatable binary representation of the Datasketches HllSketch configured with default lgConfigK value.static Column
hll_sketch_agg
(Column e, int lgConfigK) Aggregate function: returns the updatable binary representation of the Datasketches HllSketch configured with lgConfigK arg.static Column
hll_sketch_agg
(Column e, Column lgConfigK) Aggregate function: returns the updatable binary representation of the Datasketches HllSketch configured with lgConfigK arg.static Column
hll_sketch_estimate
(String columnName) Returns the estimated number of unique values given the binary representation of a Datasketches HllSketch.static Column
Returns the estimated number of unique values given the binary representation of a Datasketches HllSketch.static Column
Merges two binary representations of Datasketches HllSketch objects, using a Datasketches Union object.static Column
Merges two binary representations of Datasketches HllSketch objects, using a Datasketches Union object.static Column
Merges two binary representations of Datasketches HllSketch objects, using a Datasketches Union object.static Column
Merges two binary representations of Datasketches HllSketch objects, using a Datasketches Union object.static Column
hll_union_agg
(String columnName) Aggregate function: returns the updatable binary representation of the Datasketches HllSketch, generated by merging previously created Datasketches HllSketch instances via a Datasketches Union instance.static Column
hll_union_agg
(String columnName, boolean allowDifferentLgConfigK) Aggregate function: returns the updatable binary representation of the Datasketches HllSketch, generated by merging previously created Datasketches HllSketch instances via a Datasketches Union instance.static Column
Aggregate function: returns the updatable binary representation of the Datasketches HllSketch, generated by merging previously created Datasketches HllSketch instances via a Datasketches Union instance.static Column
hll_union_agg
(Column e, boolean allowDifferentLgConfigK) Aggregate function: returns the updatable binary representation of the Datasketches HllSketch, generated by merging previously created Datasketches HllSketch instances via a Datasketches Union instance.static Column
hll_union_agg
(Column e, Column allowDifferentLgConfigK) Aggregate function: returns the updatable binary representation of the Datasketches HllSketch, generated by merging previously created Datasketches HllSketch instances via a Datasketches Union instance.static Column
Extracts the hours as an integer from a given date/timestamp/string.static Column
(Java-specific) A transform for timestamps to partition data into hours.static Column
Computessqrt(a^2^ + b^2^)
without intermediate overflow or underflow.static Column
Computessqrt(a^2^ + b^2^)
without intermediate overflow or underflow.static Column
Computessqrt(a^2^ + b^2^)
without intermediate overflow or underflow.static Column
Computessqrt(a^2^ + b^2^)
without intermediate overflow or underflow.static Column
Computessqrt(a^2^ + b^2^)
without intermediate overflow or underflow.static Column
Computessqrt(a^2^ + b^2^)
without intermediate overflow or underflow.static Column
Computessqrt(a^2^ + b^2^)
without intermediate overflow or underflow.static Column
Computessqrt(a^2^ + b^2^)
without intermediate overflow or underflow.static Column
Returnscol2
ifcol1
is null, orcol1
otherwise.static Column
Returns true if str matchespattern
withescapeChar
('\') case-insensitively, null if any arguments are null, false otherwise.static Column
Returns true if str matchespattern
withescapeChar
case-insensitively, null if any arguments are null, false otherwise.static Column
Returns a new string column by converting the first letter of each word to uppercase.static Column
Creates a new row for each element in the given array of structs.static Column
Creates a new row for each element in the given array of structs.static Column
Returns the length of the block being read, or -1 if not available.static Column
Returns the start offset of the block being read, or -1 if not available.static Column
Creates a string column for the file name of the current Spark task.static Column
Locate the position of the first occurrence of substr column in the given string.static Column
Check if a variant value is a variant null.static Column
Return true iff the column is NaN.static Column
Returns true ifcol
is not null, or false otherwise.static Column
Return true iff the column is null.static Column
java_method
(scala.collection.immutable.Seq<Column> cols) Calls a method with reflection.static Column
Returns the number of elements in the outermost JSON array.static Column
Returns all the keys of the outermost JSON object as an array.static Column
json_tuple
(Column json, String... fields) Creates a new row for a json column according to the given field names.static Column
json_tuple
(Column json, scala.collection.immutable.Seq<String> fields) Creates a new row for a json column according to the given field names.static Column
Aggregate function: returns the kurtosis of the values in a group.static Column
Aggregate function: returns the kurtosis of the values in a group.static Column
Window function: returns the value that isoffset
rows before the current row, andnull
if there is less thanoffset
rows before the current row.static Column
Window function: returns the value that isoffset
rows before the current row, anddefaultValue
if there is less thanoffset
rows before the current row.static Column
Window function: returns the value that isoffset
rows before the current row, andnull
if there is less thanoffset
rows before the current row.static Column
Window function: returns the value that isoffset
rows before the current row, anddefaultValue
if there is less thanoffset
rows before the current row.static Column
Window function: returns the value that isoffset
rows before the current row, anddefaultValue
if there is less thanoffset
rows before the current row.static Column
Aggregate function: returns the last value of the column in a group.static Column
Aggregate function: returns the last value of the column in a group.static Column
Aggregate function: returns the last value in a group.static Column
Aggregate function: returns the last value in a group.static Column
Returns the last day of the month which the given date belongs to.static Column
last_value
(Column e) Aggregate function: returns the last value in a group.static Column
last_value
(Column e, Column ignoreNulls) Aggregate function: returns the last value in a group.static Column
Returnsstr
with all characters changed to lowercase.static Column
Window function: returns the value that isoffset
rows after the current row, andnull
if there is less thanoffset
rows after the current row.static Column
Window function: returns the value that isoffset
rows after the current row, anddefaultValue
if there is less thanoffset
rows after the current row.static Column
Window function: returns the value that isoffset
rows after the current row, andnull
if there is less thanoffset
rows after the current row.static Column
Window function: returns the value that isoffset
rows after the current row, anddefaultValue
if there is less thanoffset
rows after the current row.static Column
Window function: returns the value that isoffset
rows after the current row, anddefaultValue
if there is less thanoffset
rows after the current row.static Column
Returns the least value of the list of column names, skipping null values.static Column
Returns the least value of the list of column names, skipping null values.static Column
Returns the least value of the list of values, skipping null values.static Column
Returns the least value of the list of values, skipping null values.static Column
Returns the leftmostlen
(len
can be string type) characters from the stringstr
, iflen
is less or equal than 0 the result is an empty string.static Column
Computes the character length of a given string or number of bytes of a binary string.static Column
Computes the character length of a given string or number of bytes of a binary string.static Column
levenshtein
(Column l, Column r) Computes the Levenshtein distance of the two given string columns.static Column
levenshtein
(Column l, Column r, int threshold) Computes the Levenshtein distance of the two given string columns if it's less than or equal to a given threshold.static Column
Returns true if str matchespattern
withescapeChar
('\'), null if any arguments are null, false otherwise.static Column
Returns true if str matchespattern
withescapeChar
, null if any arguments are null, false otherwise.static Column
Creates aColumn
of literal value.static Column
Computes the natural logarithm of the given value.static Column
Returns the current timestamp without time zone at the start of query evaluation as a timestamp without time zone column.static Column
Locate the position of the first occurrence of substr.static Column
Locate the position of the first occurrence of substr in a string column, after position pos.static Column
Returns the first argument-base logarithm of the second argument.static Column
Returns the first argument-base logarithm of the second argument.static Column
Computes the natural logarithm of the given column.static Column
Computes the natural logarithm of the given value.static Column
Computes the logarithm of the given value in base 10.static Column
Computes the logarithm of the given value in base 10.static Column
Computes the natural logarithm of the given column plus one.static Column
Computes the natural logarithm of the given value plus one.static Column
Computes the logarithm of the given value in base 2.static Column
Computes the logarithm of the given column in base 2.static Column
Converts a string column to lower case.static Column
Left-pad the binary column with pad to a byte length of len.static Column
Left-pad the string column with pad to a length of len.static Column
Trim the spaces from left end for the specified string value.static Column
Trim the specified character string from left end for the specified string column.static Column
static Column
Make DayTimeIntervalType duration.static Column
make_dt_interval
(Column days) Make DayTimeIntervalType duration from days.static Column
make_dt_interval
(Column days, Column hours) Make DayTimeIntervalType duration from days and hours.static Column
make_dt_interval
(Column days, Column hours, Column mins) Make DayTimeIntervalType duration from days, hours and mins.static Column
make_dt_interval
(Column days, Column hours, Column mins, Column secs) Make DayTimeIntervalType duration from days, hours, mins and secs.static Column
Make interval.static Column
make_interval
(Column years) Make interval from years.static Column
make_interval
(Column years, Column months) Make interval from years and months.static Column
make_interval
(Column years, Column months, Column weeks) Make interval from years, months and weeks.static Column
make_interval
(Column years, Column months, Column weeks, Column days) Make interval from years, months, weeks and days.static Column
Make interval from years, months, weeks, days and hours.static Column
Make interval from years, months, weeks, days, hours and mins.static Column
make_interval
(Column years, Column months, Column weeks, Column days, Column hours, Column mins, Column secs) Make interval from years, months, weeks, days, hours, mins and secs.static Column
Create timestamp from years, months, days, hours, mins and secs fields.static Column
make_timestamp
(Column years, Column months, Column days, Column hours, Column mins, Column secs, Column timezone) Create timestamp from years, months, days, hours, mins, secs and timezone fields.static Column
make_timestamp_ltz
(Column years, Column months, Column days, Column hours, Column mins, Column secs) Create the current timestamp with local time zone from years, months, days, hours, mins and secs fields.static Column
make_timestamp_ltz
(Column years, Column months, Column days, Column hours, Column mins, Column secs, Column timezone) Create the current timestamp with local time zone from years, months, days, hours, mins, secs and timezone fields.static Column
make_timestamp_ntz
(Column years, Column months, Column days, Column hours, Column mins, Column secs) Create local date-time from years, months, days, hours, mins, secs fields.static Column
Make year-month interval.static Column
make_ym_interval
(Column years) Make year-month interval from years.static Column
make_ym_interval
(Column years, Column months) Make year-month interval from years, months.static Column
Creates a new map column.static Column
Creates a new map column.static Column
map_concat
(Column... cols) Returns the union of all the given maps.static Column
map_concat
(scala.collection.immutable.Seq<Column> cols) Returns the union of all the given maps.static Column
map_contains_key
(Column column, Object key) Returns true if the map contains the key.static Column
Returns an unordered array of all entries in the given map.static Column
map_filter
(Column expr, scala.Function2<Column, Column, Column> f) Returns a map whose key-value pairs satisfy a predicate.static Column
map_from_arrays
(Column keys, Column values) Creates a new map column.static Column
Returns a map created from the given array of entries.static Column
Returns an unordered array containing the keys of the map.static Column
map_values
(Column e) Returns an unordered array containing the values of the map.static Column
Merge two given maps, key-wise into a single map using a function.static Column
Masks the given string value.static Column
Masks the given string value.static Column
Masks the given string value.static Column
Masks the given string value.static Column
Masks the given string value.static Column
Aggregate function: returns the maximum value of the column in a group.static Column
Aggregate function: returns the maximum value of the expression in a group.static Column
Aggregate function: returns the value associated with the maximum value of ord.static Column
Calculates the MD5 digest of a binary column and returns the value as a 32 character hex string.static Column
Aggregate function: returns the average of the values in a group.static Column
Aggregate function: returns the average of the values in a group.static Column
Aggregate function: returns the median of the values in a group.static Column
Aggregate function: returns the minimum value of the column in a group.static Column
Aggregate function: returns the minimum value of the expression in a group.static Column
Aggregate function: returns the value associated with the minimum value of ord.static Column
Extracts the minutes as an integer from a given date/timestamp/string.static Column
Aggregate function: returns the most frequent value in a group.static Column
Aggregate function: returns the most frequent value in a group.static Column
A column expression that generates monotonically increasing 64-bit integers.static Column
Deprecated.Use monotonically_increasing_id().static Column
Extracts the month as an integer from a given date/timestamp/string.static Column
Extracts the three-letter abbreviated month name from a given date/timestamp/string.static Column
(Java-specific) A transform for timestamps and dates to partition data into months.static Column
months_between
(Column end, Column start) Returns number of months between datesstart
andend
.static Column
months_between
(Column end, Column start, boolean roundOff) Returns number of months between datesend
andstart
.static Column
named_struct
(scala.collection.immutable.Seq<Column> cols) Creates a struct with the given field names and values.static Column
Returns col1 if it is not NaN, or col2 if col1 is NaN.static Column
Unary minus, i.e.static Column
Returns the negated value.static Column
Returns the first date which is later than the value of thedate
column that is on the specified day of the week.static Column
Returns the first date which is later than the value of thedate
column that is on the specified day of the week.static Column
Inversion of boolean expression, i.e.static Column
now()
Returns the current timestamp at the start of query evaluation.static Column
Window function: returns the value that is theoffset
th row of the window frame (counting from 1), andnull
if the size of window frame is less thanoffset
rows.static Column
Window function: returns the value that is theoffset
th row of the window frame (counting from 1), andnull
if the size of window frame is less thanoffset
rows.static Column
ntile
(int n) Window function: returns the ntile group id (from 1 ton
inclusive) in an ordered window partition.static Column
Returns null ifcol1
equals tocol2
, orcol1
otherwise.static Column
nullifzero
(Column col) Returns null ifcol
is equal to zero, orcol
otherwise.static Column
Returnscol2
ifcol1
is null, orcol1
otherwise.static Column
Returnscol2
ifcol1
is not null, orcol3
otherwise.static Column
Calculates the byte length for the specified string column.static Column
Overlay the specified portion ofsrc
withreplace
, starting from byte positionpos
ofsrc
.static Column
Overlay the specified portion ofsrc
withreplace
, starting from byte positionpos
ofsrc
and proceeding forlen
bytes.static Column
parse_json
(Column json) Parses a JSON string and constructs a Variant value.static Column
Extracts a part from a URL.static Column
Extracts a part from a URL.static Column
Window function: returns the relative rank (i.e.static Column
percentile
(Column e, Column percentage) Aggregate function: returns the exact percentile(s) of numeric columnexpr
at the given percentage(s) with value range in [0.0, 1.0].static Column
percentile
(Column e, Column percentage, Column frequency) Aggregate function: returns the exact percentile(s) of numeric columnexpr
at the given percentage(s) with value range in [0.0, 1.0].static Column
percentile_approx
(Column e, Column percentage, Column accuracy) Aggregate function: returns the approximatepercentile
of the numeric columncol
which is the smallest value in the orderedcol
values (sorted from least to greatest) such that no more thanpercentage
ofcol
values is less than the value or equal to that value.static Column
pi()
Returns Pi.static Column
Returns the positive value of dividend mod divisor.static Column
posexplode
(Column e) Creates a new row for each element with position in the given array or map column.static Column
Creates a new row for each element with position in the given array or map column.static Column
Returns the position of the first occurrence ofsubstr
instr
after position1
.static Column
Returns the position of the first occurrence ofsubstr
instr
after positionstart
.static Column
Returns the value.static Column
Returns the value of the first argument raised to the power of the second argument.static Column
Returns the value of the first argument raised to the power of the second argument.static Column
Returns the value of the first argument raised to the power of the second argument.static Column
Returns the value of the first argument raised to the power of the second argument.static Column
Returns the value of the first argument raised to the power of the second argument.static Column
Returns the value of the first argument raised to the power of the second argument.static Column
Returns the value of the first argument raised to the power of the second argument.static Column
Returns the value of the first argument raised to the power of the second argument.static Column
Returns the value of the first argument raised to the power of the second argument.static Column
Formats the arguments in printf-style and returns the result as a string column.static Column
Aggregate function: returns the product of all numerical elements in a group.static Column
Extracts the quarter as an integer from a given date/timestamp/string.static Column
Converts an angle measured in degrees to an approximately equivalent angle measured in radians.static Column
Converts an angle measured in degrees to an approximately equivalent angle measured in radians.static Column
Throws an exception with the provided error message.static Column
rand()
Generate a random column with independent and identically distributed (i.i.d.) samples uniformly distributed in [0.0, 1.0).static Column
rand
(long seed) Generate a random column with independent and identically distributed (i.i.d.) samples uniformly distributed in [0.0, 1.0).static Column
randn()
Generate a column with independent and identically distributed (i.i.d.) samples from the standard normal distribution.static Column
randn
(long seed) Generate a column with independent and identically distributed (i.i.d.) samples from the standard normal distribution.static Column
random()
Returns a random value with independent and identically distributed (i.i.d.) uniformly distributed values in [0, 1).static Column
Returns a random value with independent and identically distributed (i.i.d.) uniformly distributed values in [0, 1).static Column
rank()
Window function: returns the rank of rows within a window partition.static Column
Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state.static Column
reduce
(Column expr, Column initialValue, scala.Function2<Column, Column, Column> merge, scala.Function1<Column, Column> finish) Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state.static Column
Calls a method with reflection.static Column
Returns true ifstr
matchesregexp
, or false otherwise.static Column
regexp_count
(Column str, Column regexp) Returns a count of the number of times that the regular expression patternregexp
is matched in the stringstr
.static Column
regexp_extract
(Column e, String exp, int groupIdx) Extract a specific group matched by a Java regex, from the specified string column.static Column
regexp_extract_all
(Column str, Column regexp) Extract all strings in thestr
that match theregexp
expression and corresponding to the first regex group index.static Column
regexp_extract_all
(Column str, Column regexp, Column idx) Extract all strings in thestr
that match theregexp
expression and corresponding to the regex group index.static Column
regexp_instr
(Column str, Column regexp) Searches a string for a regular expression and returns an integer that indicates the beginning position of the matched substring.static Column
regexp_instr
(Column str, Column regexp, Column idx) Searches a string for a regular expression and returns an integer that indicates the beginning position of the matched substring.static Column
regexp_like
(Column str, Column regexp) Returns true ifstr
matchesregexp
, or false otherwise.static Column
regexp_replace
(Column e, String pattern, String replacement) Replace all substrings of the specified string value that match regexp with rep.static Column
regexp_replace
(Column e, Column pattern, Column replacement) Replace all substrings of the specified string value that match regexp with rep.static Column
regexp_substr
(Column str, Column regexp) Returns the substring that matches the regular expressionregexp
within the stringstr
.static Column
Aggregate function: returns the average of the independent variable for non-null pairs in a group, wherey
is the dependent variable andx
is the independent variable.static Column
Aggregate function: returns the average of the independent variable for non-null pairs in a group, wherey
is the dependent variable andx
is the independent variable.static Column
regr_count
(Column y, Column x) Aggregate function: returns the number of non-null number pairs in a group, wherey
is the dependent variable andx
is the independent variable.static Column
regr_intercept
(Column y, Column x) Aggregate function: returns the intercept of the univariate linear regression line for non-null pairs in a group, wherey
is the dependent variable andx
is the independent variable.static Column
Aggregate function: returns the coefficient of determination for non-null pairs in a group, wherey
is the dependent variable andx
is the independent variable.static Column
regr_slope
(Column y, Column x) Aggregate function: returns the slope of the linear regression line for non-null pairs in a group, wherey
is the dependent variable andx
is the independent variable.static Column
Aggregate function: returns REGR_COUNT(y, x) * VAR_POP(x) for non-null pairs in a group, wherey
is the dependent variable andx
is the independent variable.static Column
Aggregate function: returns REGR_COUNT(y, x) * COVAR_POP(y, x) for non-null pairs in a group, wherey
is the dependent variable andx
is the independent variable.static Column
Aggregate function: returns REGR_COUNT(y, x) * VAR_POP(y) for non-null pairs in a group, wherey
is the dependent variable andx
is the independent variable.static Column
Repeats a string column n times, and returns it as a new string column.static Column
Repeats a string column n times, and returns it as a new string column.static Column
Replaces all occurrences ofsearch
withreplace
.static Column
Replaces all occurrences ofsearch
withreplace
.static Column
Returns a reversed string or an array with reverse order of elements.static Column
Returns the rightmostlen
(len
can be string type) characters from the stringstr
, iflen
is less or equal than 0 the result is an empty string.static Column
Returns the double value that is closest in value to the argument and is equal to a mathematical integer.static Column
Returns the double value that is closest in value to the argument and is equal to a mathematical integer.static Column
Returns true ifstr
matchesregexp
, or false otherwise.static Column
Returns the value of the columne
rounded to 0 decimal places with HALF_UP round mode.static Column
Round the value ofe
toscale
decimal places with HALF_UP round mode ifscale
is greater than or equal to 0 or at integral part whenscale
is less than 0.static Column
Round the value ofe
toscale
decimal places with HALF_UP round mode ifscale
is greater than or equal to 0 or at integral part whenscale
is less than 0.static Column
Window function: returns a sequential number starting at 1 within a window partition.static Column
Right-pad the binary column with pad to a byte length of len.static Column
Right-pad the string column with pad to a length of len.static Column
Trim the spaces from right end for the specified string value.static Column
Trim the specified character string from right end for the specified string column.static Column
schema_of_csv
(String csv) Parses a CSV string and infers its schema in DDL format.static Column
schema_of_csv
(Column csv) Parses a CSV string and infers its schema in DDL format.static Column
schema_of_csv
(Column csv, Map<String, String> options) Parses a CSV string and infers its schema in DDL format using options.static Column
schema_of_json
(String json) Parses a JSON string and infers its schema in DDL format.static Column
schema_of_json
(Column json) Parses a JSON string and infers its schema in DDL format.static Column
schema_of_json
(Column json, Map<String, String> options) Parses a JSON string and infers its schema in DDL format using options.static Column
Returns schema in the SQL format of a variant.static Column
Returns the merged schema in the SQL format of a variant column.static Column
schema_of_xml
(String xml) Parses a XML string and infers its schema in DDL format.static Column
schema_of_xml
(Column xml) Parses a XML string and infers its schema in DDL format.static Column
schema_of_xml
(Column xml, Map<String, String> options) Parses a XML string and infers its schema in DDL format using options.static Column
static Column
Extracts the seconds as an integer from a given date/timestamp/string.static Column
Splits a string into arrays of sentences, where each sentence is an array of words.static Column
Splits a string into arrays of sentences, where each sentence is an array of words.static Column
Splits a string into arrays of sentences, where each sentence is an array of words.static Column
Generate a sequence of integers from start to stop, incrementing by 1 if start is less than or equal to stop, otherwise -1.static Column
Generate a sequence of integers from start to stop, incrementing by step.static Column
Returns the user name of current execution context.static Column
session_window
(Column timeColumn, String gapDuration) Generates session window given a timestamp specifying column.static Column
session_window
(Column timeColumn, Column gapDuration) Generates session window given a timestamp specifying column.static Column
Returns a sha1 hash value as a hex string of thecol
.static Column
Calculates the SHA-1 digest of a binary column and returns the value as a 40 character hex string.static Column
Calculates the SHA-2 family of hash functions of a binary column and returns the value as a hex string.static Column
Shift the given value numBits left.static Column
Deprecated.Use shiftleft.static Column
shiftright
(Column e, int numBits) (Signed) shift the given value numBits right.static Column
shiftRight
(Column e, int numBits) Deprecated.Use shiftright.static Column
shiftrightunsigned
(Column e, int numBits) Unsigned shift the given value numBits right.static Column
shiftRightUnsigned
(Column e, int numBits) Deprecated.Use shiftrightunsigned.static Column
Returns a random permutation of the given array.static Column
Computes the signum of the given value.static Column
Computes the signum of the given column.static Column
Computes the signum of the given value.static Column
static Column
static Column
static Column
static Column
Returns length of array or map.static Column
Aggregate function: returns the skewness of the values in a group.static Column
Aggregate function: returns the skewness of the values in a group.static Column
Returns an array containing all the elements inx
from indexstart
(or starting from the end ifstart
is negative) with the specifiedlength
.static Column
Returns an array containing all the elements inx
from indexstart
(or starting from the end ifstart
is negative) with the specifiedlength
.static Column
Aggregate function: returns true if at least one value ofe
is true.static Column
sort_array
(Column e) Sorts the input array for the given column in ascending order, according to the natural ordering of the array elements.static Column
sort_array
(Column e, boolean asc) Sorts the input array for the given column in ascending or descending order, according to the natural ordering of the array elements.static Column
Returns the soundex code for the specified expression.static Column
Partition ID.static Column
Splits str around matches of the given pattern.static Column
Splits str around matches of the given pattern.static Column
Splits str around matches of the given pattern.static Column
Splits str around matches of the given pattern.static Column
split_part
(Column str, Column delimiter, Column partNum) Splitsstr
by delimiter and return requested part of the split (1-based).static Column
Computes the square root of the specified float value.static Column
Computes the square root of the specified float value.static Column
Separatescol1
, ...,colk
inton
rows.static Column
startswith
(Column str, Column prefix) Returns a boolean.static Column
Aggregate function: alias forstddev_samp
.static Column
Aggregate function: alias forstddev_samp
.static Column
Aggregate function: alias forstddev_samp
.static Column
stddev_pop
(String columnName) Aggregate function: returns the population standard deviation of the expression in a group.static Column
stddev_pop
(Column e) Aggregate function: returns the population standard deviation of the expression in a group.static Column
stddev_samp
(String columnName) Aggregate function: returns the sample standard deviation of the expression in a group.static Column
Aggregate function: returns the sample standard deviation of the expression in a group.static Column
str_to_map
(Column text) Creates a map after splitting the text into key/value pairs using delimiters.static Column
str_to_map
(Column text, Column pairDelim) Creates a map after splitting the text into key/value pairs using delimiters.static Column
str_to_map
(Column text, Column pairDelim, Column keyValueDelim) Creates a map after splitting the text into key/value pairs using delimiters.static Column
Creates a new struct column that composes multiple input columns.static Column
Creates a new struct column that composes multiple input columns.static Column
Creates a new struct column.static Column
Creates a new struct column.static Column
Returns the substring ofstr
that starts atpos
, or the slice of byte array that starts atpos
.static Column
Returns the substring ofstr
that starts atpos
and is of lengthlen
, or the slice of byte array that starts atpos
and is of lengthlen
.static Column
Substring starts atpos
and is of lengthlen
when str is String type or returns the slice of byte array that starts atpos
in byte and is of lengthlen
when str is Binary typestatic Column
Substring starts atpos
and is of lengthlen
when str is String type or returns the slice of byte array that starts atpos
in byte and is of lengthlen
when str is Binary typestatic Column
substring_index
(Column str, String delim, int count) Returns the substring from string str before count occurrences of the delimiter delim.static Column
Aggregate function: returns the sum of all values in the given column.static Column
Aggregate function: returns the sum of all values in the expression.static Column
Aggregate function: returns the sum of distinct values in the expression.static Column
sumDistinct
(String columnName) Deprecated.Use sum_distinct.static Column
Deprecated.Use sum_distinct.static Column
static Column
static Column
static Column
static Column
timestamp_add
(String unit, Column quantity, Column ts) Adds the specified number of units to the given timestamp.static Column
timestamp_diff
(String unit, Column start, Column end) Gets the difference between the timestamps in the specified units by truncating the fraction part.static Column
Creates timestamp from the number of microseconds since UTC epoch.static Column
Creates timestamp from the number of milliseconds since UTC epoch.static Column
Converts the number of seconds from the Unix epoch (1970-01-01T00:00:00Z) to a timestamp.static Column
Converts the inpute
to a binary value based on the default format "hex".static Column
Converts the inpute
to a binary value based on the suppliedformat
.static Column
Converte
to a string based on theformat
.static Column
Converts a column containing aStructType
into a CSV string with the specified schema.static Column
(Java-specific) Converts a column containing aStructType
into a CSV string with the specified schema.static Column
Converts the column intoDateType
by casting rules toDateType
.static Column
Converts the column into aDateType
with a specified formatstatic Column
Converts a column containing aStructType
,ArrayType
or aMapType
into a JSON string with the specified schema.static Column
(Java-specific) Converts a column containing aStructType
,ArrayType
or aMapType
into a JSON string with the specified schema.static Column
(Scala-specific) Converts a column containing aStructType
,ArrayType
or aMapType
into a JSON string with the specified schema.static Column
Convert string 'e' to a number based on the string format 'format'.static Column
Converts to a timestamp by casting rules toTimestampType
.static Column
to_timestamp
(Column s, String fmt) Converts time string with the given pattern to timestamp.static Column
to_timestamp_ltz
(Column timestamp) Parses thetimestamp
expression with the default format to a timestamp without time zone.static Column
to_timestamp_ltz
(Column timestamp, Column format) Parses thetimestamp
expression with theformat
expression to a timestamp without time zone.static Column
to_timestamp_ntz
(Column timestamp) Parses thetimestamp
expression with the default format to a timestamp without time zone.static Column
to_timestamp_ntz
(Column timestamp, Column format) Parses thetimestamp_str
expression with theformat
expression to a timestamp without time zone.static Column
to_unix_timestamp
(Column timeExp) Returns the UNIX timestamp of the given time.static Column
to_unix_timestamp
(Column timeExp, Column format) Returns the UNIX timestamp of the given time.static Column
to_utc_timestamp
(Column ts, String tz) Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTC.static Column
to_utc_timestamp
(Column ts, Column tz) Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTC.static Column
to_varchar
(Column e, Column format) Converte
to a string based on theformat
.static Column
to_variant_object
(Column col) Converts a column containing nested inputs (array/map/struct) into a variants where maps and structs are converted to variant objects which are unordered unlike SQL structs.static Column
Converts a column containing aStructType
into a XML string with the specified schema.static Column
(Java-specific) Converts a column containing aStructType
into a XML string with the specified schema.static Column
Deprecated.Use degrees.static Column
Deprecated.Use degrees.static Column
Deprecated.Use radians.static Column
Deprecated.Use radians.static Column
Returns an array of elements after applying a transformation to each element in the input array.static Column
Returns an array of elements after applying a transformation to each element in the input array.static Column
transform_keys
(Column expr, scala.Function2<Column, Column, Column> f) Applies a function to every key-value pair in a map and returns a map with the results of those applications as the new keys for the pairs.static Column
transform_values
(Column expr, scala.Function2<Column, Column, Column> f) Applies a function to every key-value pair in a map and returns a map with the results of those applications as the new values for the pairs.static Column
Translate any character in the src by a character in replaceString.static Column
Trim the spaces from both ends for the specified string column.static Column
Trim the specified character from both ends for the specified string column.static Column
Returns date truncated to the unit specified by the format.static Column
Returns the sum ofleft
andright
and the result is null on overflow.static Column
try_aes_decrypt
(Column input, Column key) Returns a decrypted value ofinput
.static Column
try_aes_decrypt
(Column input, Column key, Column mode) Returns a decrypted value ofinput
.static Column
try_aes_decrypt
(Column input, Column key, Column mode, Column padding) Returns a decrypted value ofinput
.static Column
This is a special version ofaes_decrypt
that performs the same operation, but returns a NULL value instead of raising an error if the decryption cannot be performed.static Column
Returns the mean calculated from values of a group and the result is null on overflow.static Column
try_divide
(Column left, Column right) Returnsdividend
/
divisor
.static Column
try_element_at
(Column column, Column value) (array, index) - Returns element of array at given (1-based) index.static Column
Returns the remainder ofdividend
/
divisor
.static Column
try_multiply
(Column left, Column right) Returnsleft
*
right
and the result is null on overflow.static Column
try_parse_json
(Column json) Parses a JSON string and constructs a Variant value.static Column
try_reflect
(scala.collection.immutable.Seq<Column> cols) This is a special version ofreflect
that performs the same operation, but returns a NULL value instead of raising an error if the invoke method thrown exception.static Column
try_subtract
(Column left, Column right) Returnsleft
-
right
and the result is null on overflow.static Column
Returns the sum calculated from values of a group and the result is null on overflow.static Column
This is a special version ofto_binary
that performs the same operation, but returns a NULL value instead of raising an error if the conversion cannot be performed.static Column
try_to_binary
(Column e, Column f) This is a special version ofto_binary
that performs the same operation, but returns a NULL value instead of raising an error if the conversion cannot be performed.static Column
try_to_number
(Column e, Column format) Convert stringe
to a number based on the string formatformat
.static Column
Parses thes
to a timestamp.static Column
try_to_timestamp
(Column s, Column format) Parses thes
with theformat
to a timestamp.static Column
try_url_decode
(Column str) This is a special version ofurl_decode
that performs the same operation, but returns a NULL value instead of raising an error if the decoding cannot be performed.static Column
try_variant_get
(Column v, String path, String targetType) Extracts a sub-variant fromv
according topath
, and then cast the sub-variant totargetType
.static <T> Column
typedlit
(T literal, scala.reflect.api.TypeTags.TypeTag<T> evidence$2) Creates aColumn
of literal value.static <T> Column
typedLit
(T literal, scala.reflect.api.TypeTags.TypeTag<T> evidence$1) Creates aColumn
of literal value.static Column
Return DDL-formatted type string for the data type of the input.static Column
Returnsstr
with all characters changed to uppercase.static <IN,
BUF, OUT>
UserDefinedFunctionudaf
(Aggregator<IN, BUF, OUT> agg, Encoder<IN> inputEncoder) Obtains aUserDefinedFunction
that wraps the givenAggregator
so that it may be used with untyped Data Frames.static <IN,
BUF, OUT>
UserDefinedFunctionudaf
(Aggregator<IN, BUF, OUT> agg, scala.reflect.api.TypeTags.TypeTag<IN> evidence$3) Obtains aUserDefinedFunction
that wraps the givenAggregator
so that it may be used with untyped Data Frames.static UserDefinedFunction
Deprecated.Scala `udf` method with return type parameter is deprecated.static UserDefinedFunction
Defines a Java UDF0 instance as user-defined function (UDF).static UserDefinedFunction
Defines a Java UDF1 instance as user-defined function (UDF).static UserDefinedFunction
Defines a Java UDF10 instance as user-defined function (UDF).static UserDefinedFunction
Defines a Java UDF2 instance as user-defined function (UDF).static UserDefinedFunction
Defines a Java UDF3 instance as user-defined function (UDF).static UserDefinedFunction
Defines a Java UDF4 instance as user-defined function (UDF).static UserDefinedFunction
Defines a Java UDF5 instance as user-defined function (UDF).static UserDefinedFunction
Defines a Java UDF6 instance as user-defined function (UDF).static UserDefinedFunction
Defines a Java UDF7 instance as user-defined function (UDF).static UserDefinedFunction
Defines a Java UDF8 instance as user-defined function (UDF).static UserDefinedFunction
Defines a Java UDF9 instance as user-defined function (UDF).static <RT> UserDefinedFunction
udf
(scala.Function0<RT> f, scala.reflect.api.TypeTags.TypeTag<RT> evidence$4) Defines a Scala closure of 0 arguments as user-defined function (UDF).static <RT,
A1> UserDefinedFunction udf
(scala.Function1<A1, RT> f, scala.reflect.api.TypeTags.TypeTag<RT> evidence$5, scala.reflect.api.TypeTags.TypeTag<A1> evidence$6) Defines a Scala closure of 1 arguments as user-defined function (UDF).static <RT,
A1, A2, A3, A4, A5, A6, A7, A8, A9, A10>
UserDefinedFunctionudf
(scala.Function10<A1, A2, A3, A4, A5, A6, A7, A8, A9, A10, RT> f, scala.reflect.api.TypeTags.TypeTag<RT> evidence$59, scala.reflect.api.TypeTags.TypeTag<A1> evidence$60, scala.reflect.api.TypeTags.TypeTag<A2> evidence$61, scala.reflect.api.TypeTags.TypeTag<A3> evidence$62, scala.reflect.api.TypeTags.TypeTag<A4> evidence$63, scala.reflect.api.TypeTags.TypeTag<A5> evidence$64, scala.reflect.api.TypeTags.TypeTag<A6> evidence$65, scala.reflect.api.TypeTags.TypeTag<A7> evidence$66, scala.reflect.api.TypeTags.TypeTag<A8> evidence$67, scala.reflect.api.TypeTags.TypeTag<A9> evidence$68, scala.reflect.api.TypeTags.TypeTag<A10> evidence$69) Defines a Scala closure of 10 arguments as user-defined function (UDF).static <RT,
A1, A2> UserDefinedFunction udf
(scala.Function2<A1, A2, RT> f, scala.reflect.api.TypeTags.TypeTag<RT> evidence$7, scala.reflect.api.TypeTags.TypeTag<A1> evidence$8, scala.reflect.api.TypeTags.TypeTag<A2> evidence$9) Defines a Scala closure of 2 arguments as user-defined function (UDF).static <RT,
A1, A2, A3>
UserDefinedFunctionudf
(scala.Function3<A1, A2, A3, RT> f, scala.reflect.api.TypeTags.TypeTag<RT> evidence$10, scala.reflect.api.TypeTags.TypeTag<A1> evidence$11, scala.reflect.api.TypeTags.TypeTag<A2> evidence$12, scala.reflect.api.TypeTags.TypeTag<A3> evidence$13) Defines a Scala closure of 3 arguments as user-defined function (UDF).static <RT,
A1, A2, A3, A4>
UserDefinedFunctionudf
(scala.Function4<A1, A2, A3, A4, RT> f, scala.reflect.api.TypeTags.TypeTag<RT> evidence$14, scala.reflect.api.TypeTags.TypeTag<A1> evidence$15, scala.reflect.api.TypeTags.TypeTag<A2> evidence$16, scala.reflect.api.TypeTags.TypeTag<A3> evidence$17, scala.reflect.api.TypeTags.TypeTag<A4> evidence$18) Defines a Scala closure of 4 arguments as user-defined function (UDF).static <RT,
A1, A2, A3, A4, A5>
UserDefinedFunctionudf
(scala.Function5<A1, A2, A3, A4, A5, RT> f, scala.reflect.api.TypeTags.TypeTag<RT> evidence$19, scala.reflect.api.TypeTags.TypeTag<A1> evidence$20, scala.reflect.api.TypeTags.TypeTag<A2> evidence$21, scala.reflect.api.TypeTags.TypeTag<A3> evidence$22, scala.reflect.api.TypeTags.TypeTag<A4> evidence$23, scala.reflect.api.TypeTags.TypeTag<A5> evidence$24) Defines a Scala closure of 5 arguments as user-defined function (UDF).static <RT,
A1, A2, A3, A4, A5, A6>
UserDefinedFunctionudf
(scala.Function6<A1, A2, A3, A4, A5, A6, RT> f, scala.reflect.api.TypeTags.TypeTag<RT> evidence$25, scala.reflect.api.TypeTags.TypeTag<A1> evidence$26, scala.reflect.api.TypeTags.TypeTag<A2> evidence$27, scala.reflect.api.TypeTags.TypeTag<A3> evidence$28, scala.reflect.api.TypeTags.TypeTag<A4> evidence$29, scala.reflect.api.TypeTags.TypeTag<A5> evidence$30, scala.reflect.api.TypeTags.TypeTag<A6> evidence$31) Defines a Scala closure of 6 arguments as user-defined function (UDF).static <RT,
A1, A2, A3, A4, A5, A6, A7>
UserDefinedFunctionudf
(scala.Function7<A1, A2, A3, A4, A5, A6, A7, RT> f, scala.reflect.api.TypeTags.TypeTag<RT> evidence$32, scala.reflect.api.TypeTags.TypeTag<A1> evidence$33, scala.reflect.api.TypeTags.TypeTag<A2> evidence$34, scala.reflect.api.TypeTags.TypeTag<A3> evidence$35, scala.reflect.api.TypeTags.TypeTag<A4> evidence$36, scala.reflect.api.TypeTags.TypeTag<A5> evidence$37, scala.reflect.api.TypeTags.TypeTag<A6> evidence$38, scala.reflect.api.TypeTags.TypeTag<A7> evidence$39) Defines a Scala closure of 7 arguments as user-defined function (UDF).static <RT,
A1, A2, A3, A4, A5, A6, A7, A8>
UserDefinedFunctionudf
(scala.Function8<A1, A2, A3, A4, A5, A6, A7, A8, RT> f, scala.reflect.api.TypeTags.TypeTag<RT> evidence$40, scala.reflect.api.TypeTags.TypeTag<A1> evidence$41, scala.reflect.api.TypeTags.TypeTag<A2> evidence$42, scala.reflect.api.TypeTags.TypeTag<A3> evidence$43, scala.reflect.api.TypeTags.TypeTag<A4> evidence$44, scala.reflect.api.TypeTags.TypeTag<A5> evidence$45, scala.reflect.api.TypeTags.TypeTag<A6> evidence$46, scala.reflect.api.TypeTags.TypeTag<A7> evidence$47, scala.reflect.api.TypeTags.TypeTag<A8> evidence$48) Defines a Scala closure of 8 arguments as user-defined function (UDF).static <RT,
A1, A2, A3, A4, A5, A6, A7, A8, A9>
UserDefinedFunctionudf
(scala.Function9<A1, A2, A3, A4, A5, A6, A7, A8, A9, RT> f, scala.reflect.api.TypeTags.TypeTag<RT> evidence$49, scala.reflect.api.TypeTags.TypeTag<A1> evidence$50, scala.reflect.api.TypeTags.TypeTag<A2> evidence$51, scala.reflect.api.TypeTags.TypeTag<A3> evidence$52, scala.reflect.api.TypeTags.TypeTag<A4> evidence$53, scala.reflect.api.TypeTags.TypeTag<A5> evidence$54, scala.reflect.api.TypeTags.TypeTag<A6> evidence$55, scala.reflect.api.TypeTags.TypeTag<A7> evidence$56, scala.reflect.api.TypeTags.TypeTag<A8> evidence$57, scala.reflect.api.TypeTags.TypeTag<A9> evidence$58) Defines a Scala closure of 9 arguments as user-defined function (UDF).static Column
Decodes a BASE64 encoded string column and returns it as a binary column.static Column
Inverse of hex.static Column
Returns the number of days since 1970-01-01.static Column
Returns the number of microseconds since 1970-01-01 00:00:00 UTC.static Column
Returns the number of milliseconds since 1970-01-01 00:00:00 UTC.static Column
Returns the number of seconds since 1970-01-01 00:00:00 UTC.static Column
Returns the current Unix timestamp (in seconds) as a long.static Column
Converts time string in format yyyy-MM-dd HH:mm:ss to Unix timestamp (in seconds), using the default timezone and the default locale.static Column
unix_timestamp
(Column s, String p) Converts time string with given pattern to Unix timestamp (in seconds).static Column
unwrap_udt
(Column column) Unwrap UDT data type column into its underlying type.static Column
Converts a string column to upper case.static Column
url_decode
(Column str) Decodes astr
in 'application/x-www-form-urlencoded' format using a specific encoding scheme.static Column
url_encode
(Column str) Translates a string into 'application/x-www-form-urlencoded' format using a specific encoding scheme.static Column
user()
Returns the user name of current execution context.static Column
uuid()
Returns an universally unique identifier (UUID) string.static Column
Aggregate function: returns the population variance of the values in a group.static Column
Aggregate function: returns the population variance of the values in a group.static Column
Aggregate function: returns the unbiased variance of the values in a group.static Column
Aggregate function: returns the unbiased variance of the values in a group.static Column
Aggregate function: alias forvar_samp
.static Column
Aggregate function: alias forvar_samp
.static Column
variant_get
(Column v, String path, String targetType) Extracts a sub-variant fromv
according topath
, and then cast the sub-variant totargetType
.static Column
version()
Returns the Spark version.static Column
Returns the day of the week for date/timestamp (0 = Monday, 1 = Tuesday, ..., 6 = Sunday).static Column
weekofyear
(Column e) Extracts the week number as an integer from a given date/timestamp/string.static Column
Evaluates a list of conditions and returns one of multiple possible result expressions.static Column
width_bucket
(Column v, Column min, Column max, Column numBucket) Returns the bucket number into which the value of this expression would fall after being evaluated.static Column
Generates tumbling time windows given a timestamp specifying column.static Column
Bucketize rows into one or more time windows given a timestamp specifying column.static Column
Bucketize rows into one or more time windows given a timestamp specifying column.static Column
window_time
(Column windowColumn) Extracts the event time from the window column.static Column
Returns a string array of values within the nodes of xml that match the XPath expression.static Column
xpath_boolean
(Column xml, Column path) Returns true if the XPath expression evaluates to true, or if a matching node is found.static Column
xpath_double
(Column xml, Column path) Returns a double value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.static Column
xpath_float
(Column xml, Column path) Returns a float value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.static Column
Returns an integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.static Column
xpath_long
(Column xml, Column path) Returns a long integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.static Column
xpath_number
(Column xml, Column path) Returns a double value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.static Column
xpath_short
(Column xml, Column path) Returns a short integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.static Column
xpath_string
(Column xml, Column path) Returns the text contents of the first xml node that matches the XPath expression.static Column
Calculates the hash code of given columns using the 64-bit variant of the xxHash algorithm, and returns the result as a long column.static Column
Calculates the hash code of given columns using the 64-bit variant of the xxHash algorithm, and returns the result as a long column.static Column
Extracts the year as an integer from a given date/timestamp/string.static Column
(Java-specific) A transform for timestamps and dates to partition data into years.static Column
zeroifnull
(Column col) Returns zero ifcol
is null, orcol
otherwise.static Column
Merge two given arrays, element-wise, into a single array using a function.
-
Constructor Details
-
functions
public functions()
-
-
Method Details
-
countDistinct
Aggregate function: returns the number of distinct items in a group.An alias of
count_distinct
, and it is encouraged to usecount_distinct
directly.- Parameters:
expr
- (undocumented)exprs
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
countDistinct
Aggregate function: returns the number of distinct items in a group.An alias of
count_distinct
, and it is encouraged to usecount_distinct
directly.- Parameters:
columnName
- (undocumented)columnNames
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
count_distinct
Aggregate function: returns the number of distinct items in a group.- Parameters:
expr
- (undocumented)exprs
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.2.0
-
array
Creates a new array column. The input columns must all have the same data type.- Parameters:
cols
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
array
Creates a new array column. The input columns must all have the same data type.- Parameters:
colName
- (undocumented)colNames
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
map
Creates a new map column. The input columns must be grouped as key-value pairs, e.g. (key1, value1, key2, value2, ...). The key columns must all have the same data type, and can't be null. The value columns must all have the same data type.- Parameters:
cols
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.0
-
coalesce
Returns the first column that is not null, or null if all inputs are null.For example,
coalesce(a, b, c)
will return a if a is not null, or b if a is null and b is not null, or c if both a and b are null but c is not null.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
struct
Creates a new struct column. If the input column is a column in aDataFrame
, or a derived column expression that is named (i.e. aliased), its name would be retained as the StructField's name, otherwise, the newly generated StructField's name would be auto generated ascol
with a suffixindex + 1
, i.e. col1, col2, col3, ...- Parameters:
cols
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
struct
Creates a new struct column that composes multiple input columns.- Parameters:
colName
- (undocumented)colNames
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
greatest
Returns the greatest value of the list of values, skipping null values. This function takes at least 2 parameters. It will return null iff all parameters are null.- Parameters:
exprs
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
greatest
Returns the greatest value of the list of column names, skipping null values. This function takes at least 2 parameters. It will return null iff all parameters are null.- Parameters:
columnName
- (undocumented)columnNames
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
least
Returns the least value of the list of values, skipping null values. This function takes at least 2 parameters. It will return null iff all parameters are null.- Parameters:
exprs
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
least
Returns the least value of the list of column names, skipping null values. This function takes at least 2 parameters. It will return null iff all parameters are null.- Parameters:
columnName
- (undocumented)columnNames
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
hash
Calculates the hash code of given columns, and returns the result as an int column.- Parameters:
cols
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.0.0
-
xxhash64
Calculates the hash code of given columns using the 64-bit variant of the xxHash algorithm, and returns the result as a long column. The hash computation uses an initial seed of 42.- Parameters:
cols
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.0.0
-
concat_ws
Concatenates multiple input string columns together into a single string column, using the given separator.- Parameters:
sep
- (undocumented)exprs
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
- Note:
- Input strings which are null are skipped.
-
format_string
Formats the arguments in printf-style and returns the result as a string column.- Parameters:
format
- (undocumented)arguments
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
elt
Returns then
-th input, e.g., returnsinput2
whenn
is 2. The function returns NULL if the index exceeds the length of the array andspark.sql.ansi.enabled
is set to false. Ifspark.sql.ansi.enabled
is set to true, it throws ArrayIndexOutOfBoundsException for invalid indices.- Parameters:
inputs
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
concat
Concatenates multiple input columns together into a single column. The function works with strings, binary and compatible array columns.- Parameters:
exprs
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
- Note:
- Returns null if any of the input columns are null.
-
json_tuple
Creates a new row for a json column according to the given field names.- Parameters:
json
- (undocumented)fields
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.6.0
-
arrays_zip
Returns a merged array of structs in which the N-th struct contains all N-th values of input arrays.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.4.0
-
map_concat
Returns the union of all the given maps.- Parameters:
cols
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.4.0
-
callUDF
Call an user-defined function.- Parameters:
udfName
- (undocumented)cols
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
call_udf
Call an user-defined function. Example:import org.apache.spark.sql._ val df = Seq(("id1", 1), ("id2", 4), ("id3", 5)).toDF("id", "value") val spark = df.sparkSession spark.udf.register("simpleUDF", (v: Int) => v * v) df.select($"id", call_udf("simpleUDF", $"value"))
- Parameters:
udfName
- (undocumented)cols
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.2.0
-
call_function
Call a SQL function.- Parameters:
funcName
- function name that follows the SQL identifier syntax (can be quoted, can be qualified)cols
- the expression parameters of function- Returns:
- (undocumented)
- Since:
- 3.5.0
-
col
Returns aColumn
based on the given column name.- Parameters:
colName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
column
Returns aColumn
based on the given column name. Alias ofcol(java.lang.String)
.- Parameters:
colName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
lit
Creates aColumn
of literal value.The passed in object is returned directly if it is already a
Column
. If the object is a Scala Symbol, it is converted into aColumn
also. Otherwise, a newColumn
is created to represent the literal value.- Parameters:
literal
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
typedLit
Creates aColumn
of literal value.An alias of
typedlit
, and it is encouraged to usetypedlit
directly.- Parameters:
literal
- (undocumented)evidence$1
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.2.0
-
typedlit
Creates aColumn
of literal value.The passed in object is returned directly if it is already a
Column
. If the object is a Scala Symbol, it is converted into aColumn
also. Otherwise, a newColumn
is created to represent the literal value. The difference between this function andlit(java.lang.Object)
is that this function can handle parameterized scala types e.g.: List, Seq and Map.- Parameters:
literal
- (undocumented)evidence$2
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.2.0
- Note:
typedlit
will call expensive Scala reflection APIs.lit
is preferred if parameterized Scala types are not used.
-
asc
Returns a sort expression based on ascending order of the column.df.sort(asc("dept"), desc("age"))
- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
asc_nulls_first
Returns a sort expression based on ascending order of the column, and null values return before non-null values.df.sort(asc_nulls_first("dept"), desc("age"))
- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.1.0
-
asc_nulls_last
Returns a sort expression based on ascending order of the column, and null values appear after non-null values.df.sort(asc_nulls_last("dept"), desc("age"))
- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.1.0
-
desc
Returns a sort expression based on the descending order of the column.df.sort(asc("dept"), desc("age"))
- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
desc_nulls_first
Returns a sort expression based on the descending order of the column, and null values appear before non-null values.df.sort(asc("dept"), desc_nulls_first("age"))
- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.1.0
-
desc_nulls_last
Returns a sort expression based on the descending order of the column, and null values appear after non-null values.df.sort(asc("dept"), desc_nulls_last("age"))
- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.1.0
-
approxCountDistinct
Deprecated.Use approx_count_distinct. Since 2.1.0.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
approxCountDistinct
Deprecated.Use approx_count_distinct. Since 2.1.0.- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
approxCountDistinct
Deprecated.Use approx_count_distinct. Since 2.1.0.- Parameters:
e
- (undocumented)rsd
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
approxCountDistinct
Deprecated.Use approx_count_distinct. Since 2.1.0.- Parameters:
columnName
- (undocumented)rsd
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
approx_count_distinct
Aggregate function: returns the approximate number of distinct items in a group.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.1.0
-
approx_count_distinct
Aggregate function: returns the approximate number of distinct items in a group.- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.1.0
-
approx_count_distinct
Aggregate function: returns the approximate number of distinct items in a group.- Parameters:
rsd
- maximum relative standard deviation allowed (default = 0.05)e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.1.0
-
approx_count_distinct
Aggregate function: returns the approximate number of distinct items in a group.- Parameters:
rsd
- maximum relative standard deviation allowed (default = 0.05)columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.1.0
-
avg
Aggregate function: returns the average of the values in a group.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
avg
Aggregate function: returns the average of the values in a group.- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
collect_list
Aggregate function: returns a list of objects with duplicates.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.6.0
- Note:
- The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle.
-
collect_list
Aggregate function: returns a list of objects with duplicates.- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.6.0
- Note:
- The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle.
-
collect_set
Aggregate function: returns a set of objects with duplicate elements eliminated.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.6.0
- Note:
- The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle.
-
collect_set
Aggregate function: returns a set of objects with duplicate elements eliminated.- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.6.0
- Note:
- The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle.
-
count_min_sketch
Returns a count-min sketch of a column with the given esp, confidence and seed. The result is an array of bytes, which can be deserialized to aCountMinSketch
before usage. Count-min sketch is a probabilistic data structure used for cardinality estimation using sub-linear space.- Parameters:
e
- (undocumented)eps
- (undocumented)confidence
- (undocumented)seed
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
corr
Aggregate function: returns the Pearson Correlation Coefficient for two columns.- Parameters:
column1
- (undocumented)column2
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.6.0
-
corr
Aggregate function: returns the Pearson Correlation Coefficient for two columns.- Parameters:
columnName1
- (undocumented)columnName2
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.6.0
-
count
Aggregate function: returns the number of items in a group.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
count
Aggregate function: returns the number of items in a group.- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
countDistinct
Aggregate function: returns the number of distinct items in a group.An alias of
count_distinct
, and it is encouraged to usecount_distinct
directly.- Parameters:
expr
- (undocumented)exprs
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
countDistinct
public static Column countDistinct(String columnName, scala.collection.immutable.Seq<String> columnNames) Aggregate function: returns the number of distinct items in a group.An alias of
count_distinct
, and it is encouraged to usecount_distinct
directly.- Parameters:
columnName
- (undocumented)columnNames
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
count_distinct
Aggregate function: returns the number of distinct items in a group.- Parameters:
expr
- (undocumented)exprs
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.2.0
-
covar_pop
Aggregate function: returns the population covariance for two columns.- Parameters:
column1
- (undocumented)column2
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.0.0
-
covar_pop
Aggregate function: returns the population covariance for two columns.- Parameters:
columnName1
- (undocumented)columnName2
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.0.0
-
covar_samp
Aggregate function: returns the sample covariance for two columns.- Parameters:
column1
- (undocumented)column2
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.0.0
-
covar_samp
Aggregate function: returns the sample covariance for two columns.- Parameters:
columnName1
- (undocumented)columnName2
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.0.0
-
first
Aggregate function: returns the first value in a group.The function by default returns the first values it sees. It will return the first non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
- Parameters:
e
- (undocumented)ignoreNulls
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.0.0
- Note:
- The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.
-
first
Aggregate function: returns the first value of a column in a group.The function by default returns the first values it sees. It will return the first non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
- Parameters:
columnName
- (undocumented)ignoreNulls
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.0.0
- Note:
- The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.
-
first
Aggregate function: returns the first value in a group.The function by default returns the first values it sees. It will return the first non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
- Note:
- The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.
-
first
Aggregate function: returns the first value of a column in a group.The function by default returns the first values it sees. It will return the first non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
- Note:
- The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.
-
first_value
Aggregate function: returns the first value in a group.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
- Note:
- The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.
-
first_value
Aggregate function: returns the first value in a group.The function by default returns the first values it sees. It will return the first non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
- Parameters:
e
- (undocumented)ignoreNulls
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
- Note:
- The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.
-
grouping
Aggregate function: indicates whether a specified column in a GROUP BY list is aggregated or not, returns 1 for aggregated or 0 for not aggregated in the result set.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.0.0
-
grouping
Aggregate function: indicates whether a specified column in a GROUP BY list is aggregated or not, returns 1 for aggregated or 0 for not aggregated in the result set.- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.0.0
-
grouping_id
Aggregate function: returns the level of grouping, equals to(grouping(c1) <<; (n-1)) + (grouping(c2) <<; (n-2)) + ... + grouping(cn)
- Parameters:
cols
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.0.0
- Note:
- The list of columns should match with grouping columns exactly, or empty (means all the grouping columns).
-
grouping_id
Aggregate function: returns the level of grouping, equals to(grouping(c1) <<; (n-1)) + (grouping(c2) <<; (n-2)) + ... + grouping(cn)
- Parameters:
colName
- (undocumented)colNames
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.0.0
- Note:
- The list of columns should match with grouping columns exactly.
-
hll_sketch_agg
Aggregate function: returns the updatable binary representation of the Datasketches HllSketch configured with lgConfigK arg.- Parameters:
e
- (undocumented)lgConfigK
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
hll_sketch_agg
Aggregate function: returns the updatable binary representation of the Datasketches HllSketch configured with lgConfigK arg.- Parameters:
e
- (undocumented)lgConfigK
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
hll_sketch_agg
Aggregate function: returns the updatable binary representation of the Datasketches HllSketch configured with lgConfigK arg.- Parameters:
columnName
- (undocumented)lgConfigK
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
hll_sketch_agg
Aggregate function: returns the updatable binary representation of the Datasketches HllSketch configured with default lgConfigK value.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
hll_sketch_agg
Aggregate function: returns the updatable binary representation of the Datasketches HllSketch configured with default lgConfigK value.- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
hll_union_agg
Aggregate function: returns the updatable binary representation of the Datasketches HllSketch, generated by merging previously created Datasketches HllSketch instances via a Datasketches Union instance. Throws an exception if sketches have different lgConfigK values and allowDifferentLgConfigK is set to false.- Parameters:
e
- (undocumented)allowDifferentLgConfigK
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
hll_union_agg
Aggregate function: returns the updatable binary representation of the Datasketches HllSketch, generated by merging previously created Datasketches HllSketch instances via a Datasketches Union instance. Throws an exception if sketches have different lgConfigK values and allowDifferentLgConfigK is set to false.- Parameters:
e
- (undocumented)allowDifferentLgConfigK
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
hll_union_agg
Aggregate function: returns the updatable binary representation of the Datasketches HllSketch, generated by merging previously created Datasketches HllSketch instances via a Datasketches Union instance. Throws an exception if sketches have different lgConfigK values and allowDifferentLgConfigK is set to false.- Parameters:
columnName
- (undocumented)allowDifferentLgConfigK
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
hll_union_agg
Aggregate function: returns the updatable binary representation of the Datasketches HllSketch, generated by merging previously created Datasketches HllSketch instances via a Datasketches Union instance. Throws an exception if sketches have different lgConfigK values.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
hll_union_agg
Aggregate function: returns the updatable binary representation of the Datasketches HllSketch, generated by merging previously created Datasketches HllSketch instances via a Datasketches Union instance. Throws an exception if sketches have different lgConfigK values.- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
kurtosis
Aggregate function: returns the kurtosis of the values in a group.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.6.0
-
kurtosis
Aggregate function: returns the kurtosis of the values in a group.- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.6.0
-
last
Aggregate function: returns the last value in a group.The function by default returns the last values it sees. It will return the last non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
- Parameters:
e
- (undocumented)ignoreNulls
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.0.0
- Note:
- The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.
-
last
Aggregate function: returns the last value of the column in a group.The function by default returns the last values it sees. It will return the last non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
- Parameters:
columnName
- (undocumented)ignoreNulls
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.0.0
- Note:
- The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.
-
last
Aggregate function: returns the last value in a group.The function by default returns the last values it sees. It will return the last non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
- Note:
- The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.
-
last
Aggregate function: returns the last value of the column in a group.The function by default returns the last values it sees. It will return the last non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
- Note:
- The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.
-
last_value
Aggregate function: returns the last value in a group.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
- Note:
- The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.
-
last_value
Aggregate function: returns the last value in a group.The function by default returns the last values it sees. It will return the last non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
- Parameters:
e
- (undocumented)ignoreNulls
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
- Note:
- The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.
-
mode
Aggregate function: returns the most frequent value in a group.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.4.0
-
mode
Aggregate function: returns the most frequent value in a group.When multiple values have the same greatest frequency then either any of values is returned if deterministic is false or is not defined, or the lowest value is returned if deterministic is true.
- Parameters:
e
- (undocumented)deterministic
- (undocumented)- Returns:
- (undocumented)
- Since:
- 4.0.0
-
max
Aggregate function: returns the maximum value of the expression in a group.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
max
Aggregate function: returns the maximum value of the column in a group.- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
max_by
Aggregate function: returns the value associated with the maximum value of ord.- Parameters:
e
- (undocumented)ord
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.3.0
- Note:
- The function is non-deterministic so the output order can be different for those associated
the same values of
e
.
-
mean
Aggregate function: returns the average of the values in a group. Alias for avg.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
mean
Aggregate function: returns the average of the values in a group. Alias for avg.- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
median
Aggregate function: returns the median of the values in a group.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.4.0
-
min
Aggregate function: returns the minimum value of the expression in a group.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
min
Aggregate function: returns the minimum value of the column in a group.- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
min_by
Aggregate function: returns the value associated with the minimum value of ord.- Parameters:
e
- (undocumented)ord
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.3.0
- Note:
- The function is non-deterministic so the output order can be different for those associated
the same values of
e
.
-
percentile
Aggregate function: returns the exact percentile(s) of numeric columnexpr
at the given percentage(s) with value range in [0.0, 1.0].- Parameters:
e
- (undocumented)percentage
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
percentile
Aggregate function: returns the exact percentile(s) of numeric columnexpr
at the given percentage(s) with value range in [0.0, 1.0].- Parameters:
e
- (undocumented)percentage
- (undocumented)frequency
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
percentile_approx
Aggregate function: returns the approximatepercentile
of the numeric columncol
which is the smallest value in the orderedcol
values (sorted from least to greatest) such that no more thanpercentage
ofcol
values is less than the value or equal to that value.If percentage is an array, each value must be between 0.0 and 1.0. If it is a single floating point value, it must be between 0.0 and 1.0.
The accuracy parameter is a positive numeric literal which controls approximation accuracy at the cost of memory. Higher value of accuracy yields better accuracy, 1.0/accuracy is the relative error of the approximation.
- Parameters:
e
- (undocumented)percentage
- (undocumented)accuracy
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.1.0
-
approx_percentile
Aggregate function: returns the approximatepercentile
of the numeric columncol
which is the smallest value in the orderedcol
values (sorted from least to greatest) such that no more thanpercentage
ofcol
values is less than the value or equal to that value.If percentage is an array, each value must be between 0.0 and 1.0. If it is a single floating point value, it must be between 0.0 and 1.0.
The accuracy parameter is a positive numeric literal which controls approximation accuracy at the cost of memory. Higher value of accuracy yields better accuracy, 1.0/accuracy is the relative error of the approximation.
- Parameters:
e
- (undocumented)percentage
- (undocumented)accuracy
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
product
Aggregate function: returns the product of all numerical elements in a group.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.2.0
-
skewness
Aggregate function: returns the skewness of the values in a group.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.6.0
-
skewness
Aggregate function: returns the skewness of the values in a group.- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.6.0
-
std
Aggregate function: alias forstddev_samp
.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
stddev
Aggregate function: alias forstddev_samp
.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.6.0
-
stddev
Aggregate function: alias forstddev_samp
.- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.6.0
-
stddev_samp
Aggregate function: returns the sample standard deviation of the expression in a group.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.6.0
-
stddev_samp
Aggregate function: returns the sample standard deviation of the expression in a group.- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.6.0
-
stddev_pop
Aggregate function: returns the population standard deviation of the expression in a group.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.6.0
-
stddev_pop
Aggregate function: returns the population standard deviation of the expression in a group.- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.6.0
-
sum
Aggregate function: returns the sum of all values in the expression.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
sum
Aggregate function: returns the sum of all values in the given column.- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
sumDistinct
Deprecated.Use sum_distinct. Since 3.2.0.Aggregate function: returns the sum of distinct values in the expression.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
sumDistinct
Deprecated.Use sum_distinct. Since 3.2.0.Aggregate function: returns the sum of distinct values in the expression.- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
sum_distinct
Aggregate function: returns the sum of distinct values in the expression.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.2.0
-
variance
Aggregate function: alias forvar_samp
.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.6.0
-
variance
Aggregate function: alias forvar_samp
.- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.6.0
-
var_samp
Aggregate function: returns the unbiased variance of the values in a group.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.6.0
-
var_samp
Aggregate function: returns the unbiased variance of the values in a group.- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.6.0
-
var_pop
Aggregate function: returns the population variance of the values in a group.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.6.0
-
var_pop
Aggregate function: returns the population variance of the values in a group.- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.6.0
-
regr_avgx
Aggregate function: returns the average of the independent variable for non-null pairs in a group, wherey
is the dependent variable andx
is the independent variable.- Parameters:
y
- (undocumented)x
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
regr_avgy
Aggregate function: returns the average of the independent variable for non-null pairs in a group, wherey
is the dependent variable andx
is the independent variable.- Parameters:
y
- (undocumented)x
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
regr_count
Aggregate function: returns the number of non-null number pairs in a group, wherey
is the dependent variable andx
is the independent variable.- Parameters:
y
- (undocumented)x
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
regr_intercept
Aggregate function: returns the intercept of the univariate linear regression line for non-null pairs in a group, wherey
is the dependent variable andx
is the independent variable.- Parameters:
y
- (undocumented)x
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
regr_r2
Aggregate function: returns the coefficient of determination for non-null pairs in a group, wherey
is the dependent variable andx
is the independent variable.- Parameters:
y
- (undocumented)x
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
regr_slope
Aggregate function: returns the slope of the linear regression line for non-null pairs in a group, wherey
is the dependent variable andx
is the independent variable.- Parameters:
y
- (undocumented)x
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
regr_sxx
Aggregate function: returns REGR_COUNT(y, x) * VAR_POP(x) for non-null pairs in a group, wherey
is the dependent variable andx
is the independent variable.- Parameters:
y
- (undocumented)x
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
regr_sxy
Aggregate function: returns REGR_COUNT(y, x) * COVAR_POP(y, x) for non-null pairs in a group, wherey
is the dependent variable andx
is the independent variable.- Parameters:
y
- (undocumented)x
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
regr_syy
Aggregate function: returns REGR_COUNT(y, x) * VAR_POP(y) for non-null pairs in a group, wherey
is the dependent variable andx
is the independent variable.- Parameters:
y
- (undocumented)x
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
any_value
Aggregate function: returns some value ofe
for a group of rows.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
any_value
Aggregate function: returns some value ofe
for a group of rows. IfisIgnoreNull
is true, returns only non-null values.- Parameters:
e
- (undocumented)ignoreNulls
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
count_if
Aggregate function: returns the number ofTRUE
values for the expression.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
histogram_numeric
Aggregate function: computes a histogram on numeric 'expr' using nb bins. The return value is an array of (x,y) pairs representing the centers of the histogram's bins. As the value of 'nb' is increased, the histogram approximation gets finer-grained, but may yield artifacts around outliers. In practice, 20-40 histogram bins appear to work well, with more bins being required for skewed or smaller datasets. Note that this function creates a histogram with non-uniform bin widths. It offers no guarantees in terms of the mean-squared-error of the histogram, but in practice is comparable to the histograms produced by the R/S-Plus statistical computing packages. Note: the output type of the 'x' field in the return value is propagated from the input value consumed in the aggregate function.- Parameters:
e
- (undocumented)nBins
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
every
Aggregate function: returns true if all values ofe
are true.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
bool_and
Aggregate function: returns true if all values ofe
are true.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
some
Aggregate function: returns true if at least one value ofe
is true.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
any
Aggregate function: returns true if at least one value ofe
is true.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
bool_or
Aggregate function: returns true if at least one value ofe
is true.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
bit_and
Aggregate function: returns the bitwise AND of all non-null input values, or null if none.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
bit_or
Aggregate function: returns the bitwise OR of all non-null input values, or null if none.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
bit_xor
Aggregate function: returns the bitwise XOR of all non-null input values, or null if none.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
cume_dist
Window function: returns the cumulative distribution of values within a window partition, i.e. the fraction of rows that are below the current row.N = total number of rows in the partition cumeDist(x) = number of values before (and including) x / N
- Returns:
- (undocumented)
- Since:
- 1.6.0
-
dense_rank
Window function: returns the rank of rows within a window partition, without any gaps.The difference between rank and dense_rank is that denseRank leaves no gaps in ranking sequence when there are ties. That is, if you were ranking a competition using dense_rank and had three people tie for second place, you would say that all three were in second place and that the next person came in third. Rank would give me sequential numbers, making the person that came in third place (after the ties) would register as coming in fifth.
This is equivalent to the DENSE_RANK function in SQL.
- Returns:
- (undocumented)
- Since:
- 1.6.0
-
lag
Window function: returns the value that isoffset
rows before the current row, andnull
if there is less thanoffset
rows before the current row. For example, anoffset
of one will return the previous row at any given point in the window partition.This is equivalent to the LAG function in SQL.
- Parameters:
e
- (undocumented)offset
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
lag
Window function: returns the value that isoffset
rows before the current row, andnull
if there is less thanoffset
rows before the current row. For example, anoffset
of one will return the previous row at any given point in the window partition.This is equivalent to the LAG function in SQL.
- Parameters:
columnName
- (undocumented)offset
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
lag
Window function: returns the value that isoffset
rows before the current row, anddefaultValue
if there is less thanoffset
rows before the current row. For example, anoffset
of one will return the previous row at any given point in the window partition.This is equivalent to the LAG function in SQL.
- Parameters:
columnName
- (undocumented)offset
- (undocumented)defaultValue
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
lag
Window function: returns the value that isoffset
rows before the current row, anddefaultValue
if there is less thanoffset
rows before the current row. For example, anoffset
of one will return the previous row at any given point in the window partition.This is equivalent to the LAG function in SQL.
- Parameters:
e
- (undocumented)offset
- (undocumented)defaultValue
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
lag
Window function: returns the value that isoffset
rows before the current row, anddefaultValue
if there is less thanoffset
rows before the current row.ignoreNulls
determines whether null values of row are included in or eliminated from the calculation. For example, anoffset
of one will return the previous row at any given point in the window partition.This is equivalent to the LAG function in SQL.
- Parameters:
e
- (undocumented)offset
- (undocumented)defaultValue
- (undocumented)ignoreNulls
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.2.0
-
lead
Window function: returns the value that isoffset
rows after the current row, andnull
if there is less thanoffset
rows after the current row. For example, anoffset
of one will return the next row at any given point in the window partition.This is equivalent to the LEAD function in SQL.
- Parameters:
columnName
- (undocumented)offset
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
lead
Window function: returns the value that isoffset
rows after the current row, andnull
if there is less thanoffset
rows after the current row. For example, anoffset
of one will return the next row at any given point in the window partition.This is equivalent to the LEAD function in SQL.
- Parameters:
e
- (undocumented)offset
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
lead
Window function: returns the value that isoffset
rows after the current row, anddefaultValue
if there is less thanoffset
rows after the current row. For example, anoffset
of one will return the next row at any given point in the window partition.This is equivalent to the LEAD function in SQL.
- Parameters:
columnName
- (undocumented)offset
- (undocumented)defaultValue
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
lead
Window function: returns the value that isoffset
rows after the current row, anddefaultValue
if there is less thanoffset
rows after the current row. For example, anoffset
of one will return the next row at any given point in the window partition.This is equivalent to the LEAD function in SQL.
- Parameters:
e
- (undocumented)offset
- (undocumented)defaultValue
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
lead
Window function: returns the value that isoffset
rows after the current row, anddefaultValue
if there is less thanoffset
rows after the current row.ignoreNulls
determines whether null values of row are included in or eliminated from the calculation. The default value ofignoreNulls
is false. For example, anoffset
of one will return the next row at any given point in the window partition.This is equivalent to the LEAD function in SQL.
- Parameters:
e
- (undocumented)offset
- (undocumented)defaultValue
- (undocumented)ignoreNulls
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.2.0
-
nth_value
Window function: returns the value that is theoffset
th row of the window frame (counting from 1), andnull
if the size of window frame is less thanoffset
rows.It will return the
offset
th non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.This is equivalent to the nth_value function in SQL.
- Parameters:
e
- (undocumented)offset
- (undocumented)ignoreNulls
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.1.0
-
nth_value
Window function: returns the value that is theoffset
th row of the window frame (counting from 1), andnull
if the size of window frame is less thanoffset
rows.This is equivalent to the nth_value function in SQL.
- Parameters:
e
- (undocumented)offset
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.1.0
-
ntile
Window function: returns the ntile group id (from 1 ton
inclusive) in an ordered window partition. For example, ifn
is 4, the first quarter of the rows will get value 1, the second quarter will get 2, the third quarter will get 3, and the last quarter will get 4.This is equivalent to the NTILE function in SQL.
- Parameters:
n
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
percent_rank
Window function: returns the relative rank (i.e. percentile) of rows within a window partition.This is computed by:
(rank of row in its partition - 1) / (number of rows in the partition - 1)
This is equivalent to the PERCENT_RANK function in SQL.
- Returns:
- (undocumented)
- Since:
- 1.6.0
-
rank
Window function: returns the rank of rows within a window partition.The difference between rank and dense_rank is that dense_rank leaves no gaps in ranking sequence when there are ties. That is, if you were ranking a competition using dense_rank and had three people tie for second place, you would say that all three were in second place and that the next person came in third. Rank would give me sequential numbers, making the person that came in third place (after the ties) would register as coming in fifth.
This is equivalent to the RANK function in SQL.
- Returns:
- (undocumented)
- Since:
- 1.4.0
-
row_number
Window function: returns a sequential number starting at 1 within a window partition.- Returns:
- (undocumented)
- Since:
- 1.6.0
-
array
Creates a new array column. The input columns must all have the same data type.- Parameters:
cols
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
array
Creates a new array column. The input columns must all have the same data type.- Parameters:
colName
- (undocumented)colNames
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
map
Creates a new map column. The input columns must be grouped as key-value pairs, e.g. (key1, value1, key2, value2, ...). The key columns must all have the same data type, and can't be null. The value columns must all have the same data type.- Parameters:
cols
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.0
-
named_struct
Creates a struct with the given field names and values.- Parameters:
cols
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
map_from_arrays
Creates a new map column. The array in the first column is used for keys. The array in the second column is used for values. All elements in the array for key should not be null.- Parameters:
keys
- (undocumented)values
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.4
-
str_to_map
Creates a map after splitting the text into key/value pairs using delimiters. BothpairDelim
andkeyValueDelim
are treated as regular expressions.- Parameters:
text
- (undocumented)pairDelim
- (undocumented)keyValueDelim
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
str_to_map
Creates a map after splitting the text into key/value pairs using delimiters. ThepairDelim
is treated as regular expressions.- Parameters:
text
- (undocumented)pairDelim
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
str_to_map
Creates a map after splitting the text into key/value pairs using delimiters.- Parameters:
text
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
broadcast
Marks a DataFrame as small enough for use in broadcast joins.The following example marks the right DataFrame for broadcast hash join using
joinKey
.// left and right are DataFrames left.join(broadcast(right), "joinKey")
- Parameters:
df
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
coalesce
Returns the first column that is not null, or null if all inputs are null.For example,
coalesce(a, b, c)
will return a if a is not null, or b if a is null and b is not null, or c if both a and b are null but c is not null.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
input_file_name
Creates a string column for the file name of the current Spark task.- Returns:
- (undocumented)
- Since:
- 1.6.0
-
isnan
Return true iff the column is NaN.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.6.0
-
isnull
Return true iff the column is null.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.6.0
-
monotonicallyIncreasingId
Deprecated.Use monotonically_increasing_id(). Since 2.0.0.A column expression that generates monotonically increasing 64-bit integers.The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive. The current implementation puts the partition ID in the upper 31 bits, and the record number within each partition in the lower 33 bits. The assumption is that the data frame has less than 1 billion partitions, and each partition has less than 8 billion records.
As an example, consider a
DataFrame
with two partitions, each with 3 records. This expression would return the following IDs:0, 1, 2, 8589934592 (1L << 33), 8589934593, 8589934594.
- Returns:
- (undocumented)
- Since:
- 1.4.0
-
monotonically_increasing_id
A column expression that generates monotonically increasing 64-bit integers.The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive. The current implementation puts the partition ID in the upper 31 bits, and the record number within each partition in the lower 33 bits. The assumption is that the data frame has less than 1 billion partitions, and each partition has less than 8 billion records.
As an example, consider a
DataFrame
with two partitions, each with 3 records. This expression would return the following IDs:0, 1, 2, 8589934592 (1L << 33), 8589934593, 8589934594.
- Returns:
- (undocumented)
- Since:
- 1.6.0
-
nanvl
Returns col1 if it is not NaN, or col2 if col1 is NaN.Both inputs should be floating point columns (DoubleType or FloatType).
- Parameters:
col1
- (undocumented)col2
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
negate
Unary minus, i.e. negate the expression.// Select the amount column and negates all values. // Scala: df.select( -df("amount") ) // Java: df.select( negate(df.col("amount")) );
- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
not
Inversion of boolean expression, i.e. NOT.// Scala: select rows that are not active (isActive === false) df.filter( !df("isActive") ) // Java: df.filter( not(df.col("isActive")) );
- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
rand
Generate a random column with independent and identically distributed (i.i.d.) samples uniformly distributed in [0.0, 1.0).- Parameters:
seed
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
- Note:
- The function is non-deterministic in general case.
-
rand
Generate a random column with independent and identically distributed (i.i.d.) samples uniformly distributed in [0.0, 1.0).- Returns:
- (undocumented)
- Since:
- 1.4.0
- Note:
- The function is non-deterministic in general case.
-
randn
Generate a column with independent and identically distributed (i.i.d.) samples from the standard normal distribution.- Parameters:
seed
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
- Note:
- The function is non-deterministic in general case.
-
randn
Generate a column with independent and identically distributed (i.i.d.) samples from the standard normal distribution.- Returns:
- (undocumented)
- Since:
- 1.4.0
- Note:
- The function is non-deterministic in general case.
-
spark_partition_id
Partition ID.- Returns:
- (undocumented)
- Since:
- 1.6.0
- Note:
- This is non-deterministic because it depends on data partitioning and task scheduling.
-
sqrt
Computes the square root of the specified float value.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
sqrt
Computes the square root of the specified float value.- Parameters:
colName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
try_add
Returns the sum ofleft
andright
and the result is null on overflow. The acceptable input types are the same with the+
operator.- Parameters:
left
- (undocumented)right
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
try_avg
Returns the mean calculated from values of a group and the result is null on overflow.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
try_divide
Returnsdividend
/
divisor
. It always performs floating point division. Its result is always null ifdivisor
is 0.- Parameters:
left
- (undocumented)right
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
try_mod
Returns the remainder ofdividend
/
divisor
. Its result is always null ifdivisor
is 0.- Parameters:
left
- (undocumented)right
- (undocumented)- Returns:
- (undocumented)
- Since:
- 4.0.0
-
try_multiply
Returnsleft
*
right
and the result is null on overflow. The acceptable input types are the same with the*
operator.- Parameters:
left
- (undocumented)right
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
try_subtract
Returnsleft
-
right
and the result is null on overflow. The acceptable input types are the same with the-
operator.- Parameters:
left
- (undocumented)right
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
try_sum
Returns the sum calculated from values of a group and the result is null on overflow.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
struct
Creates a new struct column. If the input column is a column in aDataFrame
, or a derived column expression that is named (i.e. aliased), its name would be retained as the StructField's name, otherwise, the newly generated StructField's name would be auto generated ascol
with a suffixindex + 1
, i.e. col1, col2, col3, ...- Parameters:
cols
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
struct
Creates a new struct column that composes multiple input columns.- Parameters:
colName
- (undocumented)colNames
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
when
Evaluates a list of conditions and returns one of multiple possible result expressions. If otherwise is not defined at the end, null is returned for unmatched conditions.// Example: encoding gender string column into integer. // Scala: people.select(when(people("gender") === "male", 0) .when(people("gender") === "female", 1) .otherwise(2)) // Java: people.select(when(col("gender").equalTo("male"), 0) .when(col("gender").equalTo("female"), 1) .otherwise(2))
- Parameters:
condition
- (undocumented)value
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
bitwiseNOT
Deprecated.Use bitwise_not. Since 3.2.0.Computes bitwise NOT (~) of a number.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
bitwise_not
Computes bitwise NOT (~) of a number.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.2.0
-
bit_count
Returns the number of bits that are set in the argument expr as an unsigned 64-bit integer, or NULL if the argument is NULL.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
bit_get
Returns the value of the bit (0 or 1) at the specified position. The positions are numbered from right to left, starting at zero. The position argument cannot be negative.- Parameters:
e
- (undocumented)pos
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
getbit
Returns the value of the bit (0 or 1) at the specified position. The positions are numbered from right to left, starting at zero. The position argument cannot be negative.- Parameters:
e
- (undocumented)pos
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
expr
Parses the expression string into the column that it represents, similar toDataset.selectExpr(java.lang.String...)
.// get the number of words of each length df.groupBy(expr("length(word)")).count()
- Parameters:
expr
- (undocumented)- Returns:
- (undocumented)
-
abs
Computes the absolute value of a numeric value.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
acos
- Parameters:
e
- (undocumented)- Returns:
- inverse cosine of
e
in radians, as if computed byjava.lang.Math.acos
- Since:
- 1.4.0
-
acos
- Parameters:
columnName
- (undocumented)- Returns:
- inverse cosine of
columnName
, as if computed byjava.lang.Math.acos
- Since:
- 1.4.0
-
acosh
- Parameters:
e
- (undocumented)- Returns:
- inverse hyperbolic cosine of
e
- Since:
- 3.1.0
-
acosh
- Parameters:
columnName
- (undocumented)- Returns:
- inverse hyperbolic cosine of
columnName
- Since:
- 3.1.0
-
asin
- Parameters:
e
- (undocumented)- Returns:
- inverse sine of
e
in radians, as if computed byjava.lang.Math.asin
- Since:
- 1.4.0
-
asin
- Parameters:
columnName
- (undocumented)- Returns:
- inverse sine of
columnName
, as if computed byjava.lang.Math.asin
- Since:
- 1.4.0
-
asinh
- Parameters:
e
- (undocumented)- Returns:
- inverse hyperbolic sine of
e
- Since:
- 3.1.0
-
asinh
- Parameters:
columnName
- (undocumented)- Returns:
- inverse hyperbolic sine of
columnName
- Since:
- 3.1.0
-
atan
- Parameters:
e
- (undocumented)- Returns:
- inverse tangent of
e
as if computed byjava.lang.Math.atan
- Since:
- 1.4.0
-
atan
- Parameters:
columnName
- (undocumented)- Returns:
- inverse tangent of
columnName
, as if computed byjava.lang.Math.atan
- Since:
- 1.4.0
-
atan2
- Parameters:
y
- coordinate on y-axisx
- coordinate on x-axis- Returns:
- the theta component of the point (r, theta) in polar coordinates that
corresponds to the point (x, y) in Cartesian coordinates, as if computed by
java.lang.Math.atan2
- Since:
- 1.4.0
-
atan2
- Parameters:
y
- coordinate on y-axisxName
- coordinate on x-axis- Returns:
- the theta component of the point (r, theta) in polar coordinates that
corresponds to the point (x, y) in Cartesian coordinates, as if computed by
java.lang.Math.atan2
- Since:
- 1.4.0
-
atan2
- Parameters:
yName
- coordinate on y-axisx
- coordinate on x-axis- Returns:
- the theta component of the point (r, theta) in polar coordinates that
corresponds to the point (x, y) in Cartesian coordinates, as if computed by
java.lang.Math.atan2
- Since:
- 1.4.0
-
atan2
- Parameters:
yName
- coordinate on y-axisxName
- coordinate on x-axis- Returns:
- the theta component of the point (r, theta) in polar coordinates that
corresponds to the point (x, y) in Cartesian coordinates, as if computed by
java.lang.Math.atan2
- Since:
- 1.4.0
-
atan2
- Parameters:
y
- coordinate on y-axisxValue
- coordinate on x-axis- Returns:
- the theta component of the point (r, theta) in polar coordinates that
corresponds to the point (x, y) in Cartesian coordinates, as if computed by
java.lang.Math.atan2
- Since:
- 1.4.0
-
atan2
- Parameters:
yName
- coordinate on y-axisxValue
- coordinate on x-axis- Returns:
- the theta component of the point (r, theta) in polar coordinates that
corresponds to the point (x, y) in Cartesian coordinates, as if computed by
java.lang.Math.atan2
- Since:
- 1.4.0
-
atan2
- Parameters:
yValue
- coordinate on y-axisx
- coordinate on x-axis- Returns:
- the theta component of the point (r, theta) in polar coordinates that
corresponds to the point (x, y) in Cartesian coordinates, as if computed by
java.lang.Math.atan2
- Since:
- 1.4.0
-
atan2
- Parameters:
yValue
- coordinate on y-axisxName
- coordinate on x-axis- Returns:
- the theta component of the point (r, theta) in polar coordinates that
corresponds to the point (x, y) in Cartesian coordinates, as if computed by
java.lang.Math.atan2
- Since:
- 1.4.0
-
atanh
- Parameters:
e
- (undocumented)- Returns:
- inverse hyperbolic tangent of
e
- Since:
- 3.1.0
-
atanh
- Parameters:
columnName
- (undocumented)- Returns:
- inverse hyperbolic tangent of
columnName
- Since:
- 3.1.0
-
bin
An expression that returns the string representation of the binary value of the given long column. For example, bin("12") returns "1100".- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
bin
An expression that returns the string representation of the binary value of the given long column. For example, bin("12") returns "1100".- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
cbrt
Computes the cube-root of the given value.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
cbrt
Computes the cube-root of the given column.- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
ceil
Computes the ceiling of the given value ofe
toscale
decimal places.- Parameters:
e
- (undocumented)scale
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.3.0
-
ceil
Computes the ceiling of the given value ofe
to 0 decimal places.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
ceil
Computes the ceiling of the given value ofe
to 0 decimal places.- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
ceiling
Computes the ceiling of the given value ofe
toscale
decimal places.- Parameters:
e
- (undocumented)scale
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
ceiling
Computes the ceiling of the given value ofe
to 0 decimal places.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
conv
Convert a number in a string column from one base to another.- Parameters:
num
- (undocumented)fromBase
- (undocumented)toBase
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
cos
- Parameters:
e
- angle in radians- Returns:
- cosine of the angle, as if computed by
java.lang.Math.cos
- Since:
- 1.4.0
-
cos
- Parameters:
columnName
- angle in radians- Returns:
- cosine of the angle, as if computed by
java.lang.Math.cos
- Since:
- 1.4.0
-
cosh
- Parameters:
e
- hyperbolic angle- Returns:
- hyperbolic cosine of the angle, as if computed by
java.lang.Math.cosh
- Since:
- 1.4.0
-
cosh
- Parameters:
columnName
- hyperbolic angle- Returns:
- hyperbolic cosine of the angle, as if computed by
java.lang.Math.cosh
- Since:
- 1.4.0
-
cot
- Parameters:
e
- angle in radians- Returns:
- cotangent of the angle
- Since:
- 3.3.0
-
csc
- Parameters:
e
- angle in radians- Returns:
- cosecant of the angle
- Since:
- 3.3.0
-
e
Returns Euler's number.- Returns:
- (undocumented)
- Since:
- 3.5.0
-
exp
Computes the exponential of the given value.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
exp
Computes the exponential of the given column.- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
expm1
Computes the exponential of the given value minus one.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
expm1
Computes the exponential of the given column minus one.- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
factorial
Computes the factorial of the given value.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
floor
Computes the floor of the given value ofe
toscale
decimal places.- Parameters:
e
- (undocumented)scale
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.3.0
-
floor
Computes the floor of the given value ofe
to 0 decimal places.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
floor
Computes the floor of the given column value to 0 decimal places.- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
greatest
Returns the greatest value of the list of values, skipping null values. This function takes at least 2 parameters. It will return null iff all parameters are null.- Parameters:
exprs
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
greatest
public static Column greatest(String columnName, scala.collection.immutable.Seq<String> columnNames) Returns the greatest value of the list of column names, skipping null values. This function takes at least 2 parameters. It will return null iff all parameters are null.- Parameters:
columnName
- (undocumented)columnNames
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
hex
Computes hex value of the given column.- Parameters:
column
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
unhex
Inverse of hex. Interprets each pair of characters as a hexadecimal number and converts to the byte representation of number.- Parameters:
column
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
hypot
Computessqrt(a^2^ + b^2^)
without intermediate overflow or underflow.- Parameters:
l
- (undocumented)r
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
hypot
Computessqrt(a^2^ + b^2^)
without intermediate overflow or underflow.- Parameters:
l
- (undocumented)rightName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
hypot
Computessqrt(a^2^ + b^2^)
without intermediate overflow or underflow.- Parameters:
leftName
- (undocumented)r
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
hypot
Computessqrt(a^2^ + b^2^)
without intermediate overflow or underflow.- Parameters:
leftName
- (undocumented)rightName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
hypot
Computessqrt(a^2^ + b^2^)
without intermediate overflow or underflow.- Parameters:
l
- (undocumented)r
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
hypot
Computessqrt(a^2^ + b^2^)
without intermediate overflow or underflow.- Parameters:
leftName
- (undocumented)r
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
hypot
Computessqrt(a^2^ + b^2^)
without intermediate overflow or underflow.- Parameters:
l
- (undocumented)r
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
hypot
Computessqrt(a^2^ + b^2^)
without intermediate overflow or underflow.- Parameters:
l
- (undocumented)rightName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
least
Returns the least value of the list of values, skipping null values. This function takes at least 2 parameters. It will return null iff all parameters are null.- Parameters:
exprs
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
least
Returns the least value of the list of column names, skipping null values. This function takes at least 2 parameters. It will return null iff all parameters are null.- Parameters:
columnName
- (undocumented)columnNames
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
ln
Computes the natural logarithm of the given value.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
log
Computes the natural logarithm of the given value.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
log
Computes the natural logarithm of the given column.- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
log
Returns the first argument-base logarithm of the second argument.- Parameters:
base
- (undocumented)a
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
log
Returns the first argument-base logarithm of the second argument.- Parameters:
base
- (undocumented)columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
log10
Computes the logarithm of the given value in base 10.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
log10
Computes the logarithm of the given value in base 10.- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
log1p
Computes the natural logarithm of the given value plus one.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
log1p
Computes the natural logarithm of the given column plus one.- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
log2
Computes the logarithm of the given column in base 2.- Parameters:
expr
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
log2
Computes the logarithm of the given value in base 2.- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
negative
Returns the negated value.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
pi
Returns Pi.- Returns:
- (undocumented)
- Since:
- 3.5.0
-
positive
Returns the value.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
pow
Returns the value of the first argument raised to the power of the second argument.- Parameters:
l
- (undocumented)r
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
pow
Returns the value of the first argument raised to the power of the second argument.- Parameters:
l
- (undocumented)rightName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
pow
Returns the value of the first argument raised to the power of the second argument.- Parameters:
leftName
- (undocumented)r
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
pow
Returns the value of the first argument raised to the power of the second argument.- Parameters:
leftName
- (undocumented)rightName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
pow
Returns the value of the first argument raised to the power of the second argument.- Parameters:
l
- (undocumented)r
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
pow
Returns the value of the first argument raised to the power of the second argument.- Parameters:
leftName
- (undocumented)r
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
pow
Returns the value of the first argument raised to the power of the second argument.- Parameters:
l
- (undocumented)r
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
pow
Returns the value of the first argument raised to the power of the second argument.- Parameters:
l
- (undocumented)rightName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
power
Returns the value of the first argument raised to the power of the second argument.- Parameters:
l
- (undocumented)r
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
pmod
Returns the positive value of dividend mod divisor.- Parameters:
dividend
- (undocumented)divisor
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
rint
Returns the double value that is closest in value to the argument and is equal to a mathematical integer.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
rint
Returns the double value that is closest in value to the argument and is equal to a mathematical integer.- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
round
Returns the value of the columne
rounded to 0 decimal places with HALF_UP round mode.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
round
Round the value ofe
toscale
decimal places with HALF_UP round mode ifscale
is greater than or equal to 0 or at integral part whenscale
is less than 0.- Parameters:
e
- (undocumented)scale
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
round
Round the value ofe
toscale
decimal places with HALF_UP round mode ifscale
is greater than or equal to 0 or at integral part whenscale
is less than 0.- Parameters:
e
- (undocumented)scale
- (undocumented)- Returns:
- (undocumented)
- Since:
- 4.0.0
-
bround
Returns the value of the columne
rounded to 0 decimal places with HALF_EVEN round mode.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.0.0
-
bround
Round the value ofe
toscale
decimal places with HALF_EVEN round mode ifscale
is greater than or equal to 0 or at integral part whenscale
is less than 0.- Parameters:
e
- (undocumented)scale
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.0.0
-
bround
Round the value ofe
toscale
decimal places with HALF_EVEN round mode ifscale
is greater than or equal to 0 or at integral part whenscale
is less than 0.- Parameters:
e
- (undocumented)scale
- (undocumented)- Returns:
- (undocumented)
- Since:
- 4.0.0
-
sec
- Parameters:
e
- angle in radians- Returns:
- secant of the angle
- Since:
- 3.3.0
-
shiftLeft
Deprecated.Use shiftleft. Since 3.2.0.Shift the given value numBits left. If the given value is a long value, this function will return a long value else it will return an integer value.- Parameters:
e
- (undocumented)numBits
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
shiftleft
Shift the given value numBits left. If the given value is a long value, this function will return a long value else it will return an integer value.- Parameters:
e
- (undocumented)numBits
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.2.0
-
shiftRight
Deprecated.Use shiftright. Since 3.2.0.(Signed) shift the given value numBits right. If the given value is a long value, it will return a long value else it will return an integer value.- Parameters:
e
- (undocumented)numBits
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
shiftright
(Signed) shift the given value numBits right. If the given value is a long value, it will return a long value else it will return an integer value.- Parameters:
e
- (undocumented)numBits
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.2.0
-
shiftRightUnsigned
Deprecated.Use shiftrightunsigned. Since 3.2.0.Unsigned shift the given value numBits right. If the given value is a long value, it will return a long value else it will return an integer value.- Parameters:
e
- (undocumented)numBits
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
shiftrightunsigned
Unsigned shift the given value numBits right. If the given value is a long value, it will return a long value else it will return an integer value.- Parameters:
e
- (undocumented)numBits
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.2.0
-
sign
Computes the signum of the given value.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
signum
Computes the signum of the given value.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
signum
Computes the signum of the given column.- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
sin
- Parameters:
e
- angle in radians- Returns:
- sine of the angle, as if computed by
java.lang.Math.sin
- Since:
- 1.4.0
-
sin
- Parameters:
columnName
- angle in radians- Returns:
- sine of the angle, as if computed by
java.lang.Math.sin
- Since:
- 1.4.0
-
sinh
- Parameters:
e
- hyperbolic angle- Returns:
- hyperbolic sine of the given value, as if computed by
java.lang.Math.sinh
- Since:
- 1.4.0
-
sinh
- Parameters:
columnName
- hyperbolic angle- Returns:
- hyperbolic sine of the given value, as if computed by
java.lang.Math.sinh
- Since:
- 1.4.0
-
tan
- Parameters:
e
- angle in radians- Returns:
- tangent of the given value, as if computed by
java.lang.Math.tan
- Since:
- 1.4.0
-
tan
- Parameters:
columnName
- angle in radians- Returns:
- tangent of the given value, as if computed by
java.lang.Math.tan
- Since:
- 1.4.0
-
tanh
- Parameters:
e
- hyperbolic angle- Returns:
- hyperbolic tangent of the given value, as if computed by
java.lang.Math.tanh
- Since:
- 1.4.0
-
tanh
- Parameters:
columnName
- hyperbolic angle- Returns:
- hyperbolic tangent of the given value, as if computed by
java.lang.Math.tanh
- Since:
- 1.4.0
-
toDegrees
Deprecated.Use degrees. Since 2.1.0.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
toDegrees
Deprecated.Use degrees. Since 2.1.0.- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
degrees
Converts an angle measured in radians to an approximately equivalent angle measured in degrees.- Parameters:
e
- angle in radians- Returns:
- angle in degrees, as if computed by
java.lang.Math.toDegrees
- Since:
- 2.1.0
-
degrees
Converts an angle measured in radians to an approximately equivalent angle measured in degrees.- Parameters:
columnName
- angle in radians- Returns:
- angle in degrees, as if computed by
java.lang.Math.toDegrees
- Since:
- 2.1.0
-
toRadians
Deprecated.Use radians. Since 2.1.0.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
toRadians
Deprecated.Use radians. Since 2.1.0.- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.4.0
-
radians
Converts an angle measured in degrees to an approximately equivalent angle measured in radians.- Parameters:
e
- angle in degrees- Returns:
- angle in radians, as if computed by
java.lang.Math.toRadians
- Since:
- 2.1.0
-
radians
Converts an angle measured in degrees to an approximately equivalent angle measured in radians.- Parameters:
columnName
- angle in degrees- Returns:
- angle in radians, as if computed by
java.lang.Math.toRadians
- Since:
- 2.1.0
-
width_bucket
Returns the bucket number into which the value of this expression would fall after being evaluated. Note that input arguments must follow conditions listed below; otherwise, the method will return null.- Parameters:
v
- value to compute a bucket number in the histogrammin
- minimum value of the histogrammax
- maximum value of the histogramnumBucket
- the number of buckets- Returns:
- the bucket number into which the value would fall after being evaluated
- Since:
- 3.5.0
-
current_catalog
Returns the current catalog.- Returns:
- (undocumented)
- Since:
- 3.5.0
-
current_database
Returns the current database.- Returns:
- (undocumented)
- Since:
- 3.5.0
-
current_schema
Returns the current schema.- Returns:
- (undocumented)
- Since:
- 3.5.0
-
current_user
Returns the user name of current execution context.- Returns:
- (undocumented)
- Since:
- 3.5.0
-
md5
Calculates the MD5 digest of a binary column and returns the value as a 32 character hex string.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
sha1
Calculates the SHA-1 digest of a binary column and returns the value as a 40 character hex string.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
sha2
Calculates the SHA-2 family of hash functions of a binary column and returns the value as a hex string.- Parameters:
e
- column to compute SHA-2 on.numBits
- one of 224, 256, 384, or 512.- Returns:
- (undocumented)
- Since:
- 1.5.0
-
crc32
Calculates the cyclic redundancy check value (CRC32) of a binary column and returns the value as a bigint.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
hash
Calculates the hash code of given columns, and returns the result as an int column.- Parameters:
cols
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.0.0
-
xxhash64
Calculates the hash code of given columns using the 64-bit variant of the xxHash algorithm, and returns the result as a long column. The hash computation uses an initial seed of 42.- Parameters:
cols
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.0.0
-
assert_true
Returns null if the condition is true, and throws an exception otherwise.- Parameters:
c
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.1.0
-
assert_true
Returns null if the condition is true; throws an exception with the error message otherwise.- Parameters:
c
- (undocumented)e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.1.0
-
raise_error
Throws an exception with the provided error message.- Parameters:
c
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.1.0
-
hll_sketch_estimate
Returns the estimated number of unique values given the binary representation of a Datasketches HllSketch.- Parameters:
c
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
hll_sketch_estimate
Returns the estimated number of unique values given the binary representation of a Datasketches HllSketch.- Parameters:
columnName
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
hll_union
Merges two binary representations of Datasketches HllSketch objects, using a Datasketches Union object. Throws an exception if sketches have different lgConfigK values.- Parameters:
c1
- (undocumented)c2
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
hll_union
Merges two binary representations of Datasketches HllSketch objects, using a Datasketches Union object. Throws an exception if sketches have different lgConfigK values.- Parameters:
columnName1
- (undocumented)columnName2
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
hll_union
Merges two binary representations of Datasketches HllSketch objects, using a Datasketches Union object. Throws an exception if sketches have different lgConfigK values and allowDifferentLgConfigK is set to false.- Parameters:
c1
- (undocumented)c2
- (undocumented)allowDifferentLgConfigK
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
hll_union
public static Column hll_union(String columnName1, String columnName2, boolean allowDifferentLgConfigK) Merges two binary representations of Datasketches HllSketch objects, using a Datasketches Union object. Throws an exception if sketches have different lgConfigK values and allowDifferentLgConfigK is set to false.- Parameters:
columnName1
- (undocumented)columnName2
- (undocumented)allowDifferentLgConfigK
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
user
Returns the user name of current execution context.- Returns:
- (undocumented)
- Since:
- 3.5.0
-
session_user
Returns the user name of current execution context.- Returns:
- (undocumented)
- Since:
- 4.0.0
-
uuid
Returns an universally unique identifier (UUID) string. The value is returned as a canonical UUID 36-character string.- Returns:
- (undocumented)
- Since:
- 3.5.0
-
aes_encrypt
public static Column aes_encrypt(Column input, Column key, Column mode, Column padding, Column iv, Column aad) Returns an encrypted value ofinput
using AES in givenmode
with the specifiedpadding
. Key lengths of 16, 24 and 32 bits are supported. Supported combinations of (mode
,padding
) are ('ECB', 'PKCS'), ('GCM', 'NONE') and ('CBC', 'PKCS'). Optional initialization vectors (IVs) are only supported for CBC and GCM modes. These must be 16 bytes for CBC and 12 bytes for GCM. If not provided, a random vector will be generated and prepended to the output. Optional additional authenticated data (AAD) is only supported for GCM. If provided for encryption, the identical AAD value must be provided for decryption. The default mode is GCM.- Parameters:
input
- The binary value to encrypt.key
- The passphrase to use to encrypt the data.mode
- Specifies which block cipher mode should be used to encrypt messages. Valid modes: ECB, GCM, CBC.padding
- Specifies how to pad messages whose length is not a multiple of the block size. Valid values: PKCS, NONE, DEFAULT. The DEFAULT padding means PKCS for ECB, NONE for GCM and PKCS for CBC.iv
- Optional initialization vector. Only supported for CBC and GCM modes. Valid values: None or "". 16-byte array for CBC mode. 12-byte array for GCM mode.aad
- Optional additional authenticated data. Only supported for GCM mode. This can be any free-form input and must be provided for both encryption and decryption.- Returns:
- (undocumented)
- Since:
- 3.5.0
-
aes_encrypt
Returns an encrypted value ofinput
.- Parameters:
input
- (undocumented)key
- (undocumented)mode
- (undocumented)padding
- (undocumented)iv
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
- See Also:
-
org.apache.spark.sql.functions.aes_encrypt(Column, Column, Column, Column, Column, Column)
-
aes_encrypt
Returns an encrypted value ofinput
.- Parameters:
input
- (undocumented)key
- (undocumented)mode
- (undocumented)padding
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
- See Also:
-
org.apache.spark.sql.functions.aes_encrypt(Column, Column, Column, Column, Column, Column)
-
aes_encrypt
Returns an encrypted value ofinput
.- Parameters:
input
- (undocumented)key
- (undocumented)mode
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
- See Also:
-
org.apache.spark.sql.functions.aes_encrypt(Column, Column, Column, Column, Column, Column)
-
aes_encrypt
Returns an encrypted value ofinput
.- Parameters:
input
- (undocumented)key
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
- See Also:
-
org.apache.spark.sql.functions.aes_encrypt(Column, Column, Column, Column, Column, Column)
-
aes_decrypt
Returns a decrypted value ofinput
using AES inmode
withpadding
. Key lengths of 16, 24 and 32 bits are supported. Supported combinations of (mode
,padding
) are ('ECB', 'PKCS'), ('GCM', 'NONE') and ('CBC', 'PKCS'). Optional additional authenticated data (AAD) is only supported for GCM. If provided for encryption, the identical AAD value must be provided for decryption. The default mode is GCM.- Parameters:
input
- The binary value to decrypt.key
- The passphrase to use to decrypt the data.mode
- Specifies which block cipher mode should be used to decrypt messages. Valid modes: ECB, GCM, CBC.padding
- Specifies how to pad messages whose length is not a multiple of the block size. Valid values: PKCS, NONE, DEFAULT. The DEFAULT padding means PKCS for ECB, NONE for GCM and PKCS for CBC.aad
- Optional additional authenticated data. Only supported for GCM mode. This can be any free-form input and must be provided for both encryption and decryption.- Returns:
- (undocumented)
- Since:
- 3.5.0
-
aes_decrypt
Returns a decrypted value ofinput
.- Parameters:
input
- (undocumented)key
- (undocumented)mode
- (undocumented)padding
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
- See Also:
-
org.apache.spark.sql.functions.aes_decrypt(Column, Column, Column, Column, Column)
-
aes_decrypt
Returns a decrypted value ofinput
.- Parameters:
input
- (undocumented)key
- (undocumented)mode
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
- See Also:
-
org.apache.spark.sql.functions.aes_decrypt(Column, Column, Column, Column, Column)
-
aes_decrypt
Returns a decrypted value ofinput
.- Parameters:
input
- (undocumented)key
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
- See Also:
-
org.apache.spark.sql.functions.aes_decrypt(Column, Column, Column, Column, Column)
-
try_aes_decrypt
public static Column try_aes_decrypt(Column input, Column key, Column mode, Column padding, Column aad) This is a special version ofaes_decrypt
that performs the same operation, but returns a NULL value instead of raising an error if the decryption cannot be performed.- Parameters:
input
- The binary value to decrypt.key
- The passphrase to use to decrypt the data.mode
- Specifies which block cipher mode should be used to decrypt messages. Valid modes: ECB, GCM, CBC.padding
- Specifies how to pad messages whose length is not a multiple of the block size. Valid values: PKCS, NONE, DEFAULT. The DEFAULT padding means PKCS for ECB, NONE for GCM and PKCS for CBC.aad
- Optional additional authenticated data. Only supported for GCM mode. This can be any free-form input and must be provided for both encryption and decryption.- Returns:
- (undocumented)
- Since:
- 3.5.0
-
try_aes_decrypt
Returns a decrypted value ofinput
.- Parameters:
input
- (undocumented)key
- (undocumented)mode
- (undocumented)padding
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
- See Also:
-
org.apache.spark.sql.functions.try_aes_decrypt(Column, Column, Column, Column, Column)
-
try_aes_decrypt
Returns a decrypted value ofinput
.- Parameters:
input
- (undocumented)key
- (undocumented)mode
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
- See Also:
-
org.apache.spark.sql.functions.try_aes_decrypt(Column, Column, Column, Column, Column)
-
try_aes_decrypt
Returns a decrypted value ofinput
.- Parameters:
input
- (undocumented)key
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
- See Also:
-
org.apache.spark.sql.functions.try_aes_decrypt(Column, Column, Column, Column, Column)
-
sha
Returns a sha1 hash value as a hex string of thecol
.- Parameters:
col
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
input_file_block_length
Returns the length of the block being read, or -1 if not available.- Returns:
- (undocumented)
- Since:
- 3.5.0
-
input_file_block_start
Returns the start offset of the block being read, or -1 if not available.- Returns:
- (undocumented)
- Since:
- 3.5.0
-
reflect
Calls a method with reflection.- Parameters:
cols
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
java_method
Calls a method with reflection.- Parameters:
cols
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
try_reflect
This is a special version ofreflect
that performs the same operation, but returns a NULL value instead of raising an error if the invoke method thrown exception.- Parameters:
cols
- (undocumented)- Returns:
- (undocumented)
- Since:
- 4.0.0
-
version
Returns the Spark version. The string contains 2 fields, the first being a release version and the second being a git revision.- Returns:
- (undocumented)
- Since:
- 3.5.0
-
typeof
Return DDL-formatted type string for the data type of the input.- Parameters:
col
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
stack
Separatescol1
, ...,colk
inton
rows. Uses column names col0, col1, etc. by default unless specified otherwise.- Parameters:
cols
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
random
Returns a random value with independent and identically distributed (i.i.d.) uniformly distributed values in [0, 1).- Parameters:
seed
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
random
Returns a random value with independent and identically distributed (i.i.d.) uniformly distributed values in [0, 1).- Returns:
- (undocumented)
- Since:
- 3.5.0
-
bitmap_bit_position
Returns the bucket number for the given input column.- Parameters:
col
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
bitmap_bucket_number
Returns the bit position for the given input column.- Parameters:
col
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
bitmap_construct_agg
Returns a bitmap with the positions of the bits set from all the values from the input column. The input column will most likely be bitmap_bit_position().- Parameters:
col
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
bitmap_count
Returns the number of set bits in the input bitmap.- Parameters:
col
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
bitmap_or_agg
Returns a bitmap that is the bitwise OR of all of the bitmaps from the input column. The input column should be bitmaps created from bitmap_construct_agg().- Parameters:
col
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
ascii
Computes the numeric value of the first character of the string column, and returns the result as an int column.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
base64
Computes the BASE64 encoding of a binary column and returns it as a string column. This is the reverse of unbase64.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
bit_length
Calculates the bit length for the specified string column.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.3.0
-
concat_ws
Concatenates multiple input string columns together into a single string column, using the given separator.- Parameters:
sep
- (undocumented)exprs
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
- Note:
- Input strings which are null are skipped.
-
decode
Computes the first argument into a string from a binary using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16', 'UTF-32'). If either argument is null, the result will also be null.- Parameters:
value
- (undocumented)charset
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
encode
Computes the first argument into a binary from a string using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16', 'UTF-32'). If either argument is null, the result will also be null.- Parameters:
value
- (undocumented)charset
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
format_number
Formats numeric column x to a format like '#,###,###.##', rounded to d decimal places with HALF_EVEN round mode, and returns the result as a string column.If d is 0, the result has no decimal point or fractional part. If d is less than 0, the result will be null.
- Parameters:
x
- (undocumented)d
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
format_string
Formats the arguments in printf-style and returns the result as a string column.- Parameters:
format
- (undocumented)arguments
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
initcap
Returns a new string column by converting the first letter of each word to uppercase. Words are delimited by whitespace.For example, "hello world" will become "Hello World".
- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
instr
Locate the position of the first occurrence of substr column in the given string. Returns null if either of the arguments are null.- Parameters:
str
- (undocumented)substring
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
- Note:
- The position is not zero based, but 1 based index. Returns 0 if substr could not be found in str.
-
length
Computes the character length of a given string or number of bytes of a binary string. The length of character strings include the trailing spaces. The length of binary strings includes binary zeros.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
len
Computes the character length of a given string or number of bytes of a binary string. The length of character strings include the trailing spaces. The length of binary strings includes binary zeros.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
lower
Converts a string column to lower case.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
levenshtein
Computes the Levenshtein distance of the two given string columns if it's less than or equal to a given threshold.- Parameters:
l
- (undocumented)r
- (undocumented)threshold
- (undocumented)- Returns:
- result distance, or -1
- Since:
- 3.5.0
-
levenshtein
Computes the Levenshtein distance of the two given string columns.- Parameters:
l
- (undocumented)r
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
locate
Locate the position of the first occurrence of substr.- Parameters:
substr
- (undocumented)str
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
- Note:
- The position is not zero based, but 1 based index. Returns 0 if substr could not be found in str.
-
locate
Locate the position of the first occurrence of substr in a string column, after position pos.- Parameters:
substr
- (undocumented)str
- (undocumented)pos
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
- Note:
- The position is not zero based, but 1 based index. returns 0 if substr could not be found in str.
-
lpad
Left-pad the string column with pad to a length of len. If the string column is longer than len, the return value is shortened to len characters.- Parameters:
str
- (undocumented)len
- (undocumented)pad
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
lpad
Left-pad the binary column with pad to a byte length of len. If the binary column is longer than len, the return value is shortened to len bytes.- Parameters:
str
- (undocumented)len
- (undocumented)pad
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.3.0
-
ltrim
Trim the spaces from left end for the specified string value.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
ltrim
Trim the specified character string from left end for the specified string column.- Parameters:
e
- (undocumented)trimString
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.3.0
-
octet_length
Calculates the byte length for the specified string column.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.3.0
-
collate
Marks a given column with specified collation.- Parameters:
e
- (undocumented)collation
- (undocumented)- Returns:
- (undocumented)
- Since:
- 4.0.0
-
collation
Returns the collation name of a given column.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 4.0.0
-
rlike
Returns true ifstr
matchesregexp
, or false otherwise.- Parameters:
str
- (undocumented)regexp
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
regexp
Returns true ifstr
matchesregexp
, or false otherwise.- Parameters:
str
- (undocumented)regexp
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
regexp_like
Returns true ifstr
matchesregexp
, or false otherwise.- Parameters:
str
- (undocumented)regexp
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
regexp_count
Returns a count of the number of times that the regular expression patternregexp
is matched in the stringstr
.- Parameters:
str
- (undocumented)regexp
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
regexp_extract
Extract a specific group matched by a Java regex, from the specified string column. If the regex did not match, or the specified group did not match, an empty string is returned. if the specified group index exceeds the group count of regex, an IllegalArgumentException will be thrown.- Parameters:
e
- (undocumented)exp
- (undocumented)groupIdx
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
regexp_extract_all
Extract all strings in thestr
that match theregexp
expression and corresponding to the first regex group index.- Parameters:
str
- (undocumented)regexp
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
regexp_extract_all
Extract all strings in thestr
that match theregexp
expression and corresponding to the regex group index.- Parameters:
str
- (undocumented)regexp
- (undocumented)idx
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
regexp_replace
Replace all substrings of the specified string value that match regexp with rep.- Parameters:
e
- (undocumented)pattern
- (undocumented)replacement
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
regexp_replace
Replace all substrings of the specified string value that match regexp with rep.- Parameters:
e
- (undocumented)pattern
- (undocumented)replacement
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.1.0
-
regexp_substr
Returns the substring that matches the regular expressionregexp
within the stringstr
. If the regular expression is not found, the result is null.- Parameters:
str
- (undocumented)regexp
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
regexp_instr
Searches a string for a regular expression and returns an integer that indicates the beginning position of the matched substring. Positions are 1-based, not 0-based. If no match is found, returns 0.- Parameters:
str
- (undocumented)regexp
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
regexp_instr
Searches a string for a regular expression and returns an integer that indicates the beginning position of the matched substring. Positions are 1-based, not 0-based. If no match is found, returns 0.- Parameters:
str
- (undocumented)regexp
- (undocumented)idx
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
unbase64
Decodes a BASE64 encoded string column and returns it as a binary column. This is the reverse of base64.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
rpad
Right-pad the string column with pad to a length of len. If the string column is longer than len, the return value is shortened to len characters.- Parameters:
str
- (undocumented)len
- (undocumented)pad
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
rpad
Right-pad the binary column with pad to a byte length of len. If the binary column is longer than len, the return value is shortened to len bytes.- Parameters:
str
- (undocumented)len
- (undocumented)pad
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.3.0
-
repeat
Repeats a string column n times, and returns it as a new string column.- Parameters:
str
- (undocumented)n
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
repeat
Repeats a string column n times, and returns it as a new string column.- Parameters:
str
- (undocumented)n
- (undocumented)- Returns:
- (undocumented)
- Since:
- 4.0.0
-
rtrim
Trim the spaces from right end for the specified string value.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
rtrim
Trim the specified character string from right end for the specified string column.- Parameters:
e
- (undocumented)trimString
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.3.0
-
soundex
Returns the soundex code for the specified expression.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
split
Splits str around matches of the given pattern.- Parameters:
str
- a string expression to splitpattern
- a string representing a regular expression. The regex string should be a Java regular expression.- Returns:
- (undocumented)
- Since:
- 1.5.0
-
split
Splits str around matches of the given pattern.- Parameters:
str
- a string expression to splitpattern
- a column of string representing a regular expression. The regex string should be a Java regular expression.- Returns:
- (undocumented)
- Since:
- 4.0.0
-
split
Splits str around matches of the given pattern.- Parameters:
str
- a string expression to splitpattern
- a string representing a regular expression. The regex string should be a Java regular expression.limit
- an integer expression which controls the number of times the regex is applied.- limit greater than 0: The resulting array's length will not be more than limit, and the resulting array's last entry will contain all input beyond the last matched regex.
- limit less than or equal to 0:
regex
will be applied as many times as possible, and the resulting array can be of any size.
- Returns:
- (undocumented)
- Since:
- 3.0.0
-
split
Splits str around matches of the given pattern.- Parameters:
str
- a string expression to splitpattern
- a column of string representing a regular expression. The regex string should be a Java regular expression.limit
- a column of integer expression which controls the number of times the regex is applied.- limit greater than 0: The resulting array's length will not be more than limit, and the resulting array's last entry will contain all input beyond the last matched regex.
- limit less than or equal to 0:
regex
will be applied as many times as possible, and the resulting array can be of any size.
- Returns:
- (undocumented)
- Since:
- 4.0.0
-
substring
Substring starts atpos
and is of lengthlen
when str is String type or returns the slice of byte array that starts atpos
in byte and is of lengthlen
when str is Binary type- Parameters:
str
- (undocumented)pos
- (undocumented)len
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
- Note:
- The position is not zero based, but 1 based index.
-
substring
Substring starts atpos
and is of lengthlen
when str is String type or returns the slice of byte array that starts atpos
in byte and is of lengthlen
when str is Binary type- Parameters:
str
- (undocumented)pos
- (undocumented)len
- (undocumented)- Returns:
- (undocumented)
- Since:
- 4.0.0
- Note:
- The position is not zero based, but 1 based index.
-
substring_index
Returns the substring from string str before count occurrences of the delimiter delim. If count is positive, everything the left of the final delimiter (counting from left) is returned. If count is negative, every to the right of the final delimiter (counting from the right) is returned. substring_index performs a case-sensitive match when searching for delim.- Parameters:
str
- (undocumented)delim
- (undocumented)count
- (undocumented)- Returns:
- (undocumented)
-
overlay
Overlay the specified portion ofsrc
withreplace
, starting from byte positionpos
ofsrc
and proceeding forlen
bytes.- Parameters:
src
- (undocumented)replace
- (undocumented)pos
- (undocumented)len
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.0.0
-
overlay
Overlay the specified portion ofsrc
withreplace
, starting from byte positionpos
ofsrc
.- Parameters:
src
- (undocumented)replace
- (undocumented)pos
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.0.0
-
sentences
Splits a string into arrays of sentences, where each sentence is an array of words.- Parameters:
string
- (undocumented)language
- (undocumented)country
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.2.0
-
sentences
Splits a string into arrays of sentences, where each sentence is an array of words. The defaultcountry
('') is used.- Parameters:
string
- (undocumented)language
- (undocumented)- Returns:
- (undocumented)
- Since:
- 4.0.0
-
sentences
Splits a string into arrays of sentences, where each sentence is an array of words. The default locale is used.- Parameters:
string
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.2.0
-
translate
Translate any character in the src by a character in replaceString. The characters in replaceString correspond to the characters in matchingString. The translate will happen when any character in the string matches the character in thematchingString
.- Parameters:
src
- (undocumented)matchingString
- (undocumented)replaceString
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
trim
Trim the spaces from both ends for the specified string column.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
trim
Trim the specified character from both ends for the specified string column.- Parameters:
e
- (undocumented)trimString
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.3.0
-
upper
Converts a string column to upper case.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
to_binary
Converts the inpute
to a binary value based on the suppliedformat
. Theformat
can be a case-insensitive string literal of "hex", "utf-8", "utf8", or "base64". By default, the binary format for conversion is "hex" ifformat
is omitted. The function returns NULL if at least one of the input parameters is NULL.- Parameters:
e
- (undocumented)f
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
to_binary
Converts the inpute
to a binary value based on the default format "hex". The function returns NULL if at least one of the input parameters is NULL.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
to_char
Converte
to a string based on theformat
. Throws an exception if the conversion fails. The format can consist of the following characters, case insensitive: '0' or '9': Specifies an expected digit between 0 and 9. A sequence of 0 or 9 in the format string matches a sequence of digits in the input value, generating a result string of the same length as the corresponding sequence in the format string. The result string is left-padded with zeros if the 0/9 sequence comprises more digits than the matching part of the decimal value, starts with 0, and is before the decimal point. Otherwise, it is padded with spaces. '.' or 'D': Specifies the position of the decimal point (optional, only allowed once). ',' or 'G': Specifies the position of the grouping (thousands) separator (,). There must be a 0 or 9 to the left and right of each grouping separator. '$': Specifies the location of the $ currency sign. This character may only be specified once. 'S' or 'MI': Specifies the position of a '-' or '+' sign (optional, only allowed once at the beginning or end of the format string). Note that 'S' prints '+' for positive values but 'MI' prints a space. 'PR': Only allowed at the end of the format string; specifies that the result string will be wrapped by angle brackets if the input value is negative.If
e
is a datetime,format
shall be a valid datetime pattern, see Datetime Patterns. Ife
is a binary, it is converted to a string in one of the formats: 'base64': a base 64 string. 'hex': a string in the hexadecimal format. 'utf-8': the input binary is decoded to UTF-8 string.- Parameters:
e
- (undocumented)format
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
to_varchar
Converte
to a string based on theformat
. Throws an exception if the conversion fails. The format can consist of the following characters, case insensitive: '0' or '9': Specifies an expected digit between 0 and 9. A sequence of 0 or 9 in the format string matches a sequence of digits in the input value, generating a result string of the same length as the corresponding sequence in the format string. The result string is left-padded with zeros if the 0/9 sequence comprises more digits than the matching part of the decimal value, starts with 0, and is before the decimal point. Otherwise, it is padded with spaces. '.' or 'D': Specifies the position of the decimal point (optional, only allowed once). ',' or 'G': Specifies the position of the grouping (thousands) separator (,). There must be a 0 or 9 to the left and right of each grouping separator. '$': Specifies the location of the $ currency sign. This character may only be specified once. 'S' or 'MI': Specifies the position of a '-' or '+' sign (optional, only allowed once at the beginning or end of the format string). Note that 'S' prints '+' for positive values but 'MI' prints a space. 'PR': Only allowed at the end of the format string; specifies that the result string will be wrapped by angle brackets if the input value is negative.If
e
is a datetime,format
shall be a valid datetime pattern, see Datetime Patterns. Ife
is a binary, it is converted to a string in one of the formats: 'base64': a base 64 string. 'hex': a string in the hexadecimal format. 'utf-8': the input binary is decoded to UTF-8 string.- Parameters:
e
- (undocumented)format
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
to_number
Convert string 'e' to a number based on the string format 'format'. Throws an exception if the conversion fails. The format can consist of the following characters, case insensitive: '0' or '9': Specifies an expected digit between 0 and 9. A sequence of 0 or 9 in the format string matches a sequence of digits in the input string. If the 0/9 sequence starts with 0 and is before the decimal point, it can only match a digit sequence of the same size. Otherwise, if the sequence starts with 9 or is after the decimal point, it can match a digit sequence that has the same or smaller size. '.' or 'D': Specifies the position of the decimal point (optional, only allowed once). ',' or 'G': Specifies the position of the grouping (thousands) separator (,). There must be a 0 or 9 to the left and right of each grouping separator. 'expr' must match the grouping separator relevant for the size of the number. '$': Specifies the location of the $ currency sign. This character may only be specified once. 'S' or 'MI': Specifies the position of a '-' or '+' sign (optional, only allowed once at the beginning or end of the format string). Note that 'S' allows '-' but 'MI' does not. 'PR': Only allowed at the end of the format string; specifies that 'expr' indicates a negative number with wrapping angled brackets.- Parameters:
e
- (undocumented)format
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
replace
Replaces all occurrences ofsearch
withreplace
.- Parameters:
src
- A column of string to be replacedsearch
- A column of string, Ifsearch
is not found instr
,str
is returned unchanged.replace
- A column of string, Ifreplace
is not specified or is an empty string, nothing replaces the string that is removed fromstr
.- Returns:
- (undocumented)
- Since:
- 3.5.0
-
replace
Replaces all occurrences ofsearch
withreplace
.- Parameters:
src
- A column of string to be replacedsearch
- A column of string, Ifsearch
is not found insrc
,src
is returned unchanged.- Returns:
- (undocumented)
- Since:
- 3.5.0
-
split_part
Splitsstr
by delimiter and return requested part of the split (1-based). If any input is null, returns null. ifpartNum
is out of range of split parts, returns empty string. IfpartNum
is 0, throws an error. IfpartNum
is negative, the parts are counted backward from the end of the string. If thedelimiter
is an empty string, thestr
is not split.- Parameters:
str
- (undocumented)delimiter
- (undocumented)partNum
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
substr
Returns the substring ofstr
that starts atpos
and is of lengthlen
, or the slice of byte array that starts atpos
and is of lengthlen
.- Parameters:
str
- (undocumented)pos
- (undocumented)len
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
substr
Returns the substring ofstr
that starts atpos
, or the slice of byte array that starts atpos
.- Parameters:
str
- (undocumented)pos
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
parse_url
Extracts a part from a URL.- Parameters:
url
- (undocumented)partToExtract
- (undocumented)key
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
parse_url
Extracts a part from a URL.- Parameters:
url
- (undocumented)partToExtract
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
printf
Formats the arguments in printf-style and returns the result as a string column.- Parameters:
format
- (undocumented)arguments
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
url_decode
Decodes astr
in 'application/x-www-form-urlencoded' format using a specific encoding scheme.- Parameters:
str
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
try_url_decode
This is a special version ofurl_decode
that performs the same operation, but returns a NULL value instead of raising an error if the decoding cannot be performed.- Parameters:
str
- (undocumented)- Returns:
- (undocumented)
- Since:
- 4.0.0
-
url_encode
Translates a string into 'application/x-www-form-urlencoded' format using a specific encoding scheme.- Parameters:
str
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
position
Returns the position of the first occurrence ofsubstr
instr
after positionstart
. The givenstart
and return value are 1-based.- Parameters:
substr
- (undocumented)str
- (undocumented)start
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
position
Returns the position of the first occurrence ofsubstr
instr
after position1
. The return value are 1-based.- Parameters:
substr
- (undocumented)str
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
endswith
Returns a boolean. The value is True if str ends with suffix. Returns NULL if either input expression is NULL. Otherwise, returns False. Both str or suffix must be of STRING or BINARY type.- Parameters:
str
- (undocumented)suffix
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
startswith
Returns a boolean. The value is True if str starts with prefix. Returns NULL if either input expression is NULL. Otherwise, returns False. Both str or prefix must be of STRING or BINARY type.- Parameters:
str
- (undocumented)prefix
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
btrim
Removes the leading and trailing space characters fromstr
.- Parameters:
str
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
btrim
Remove the leading and trailingtrim
characters fromstr
.- Parameters:
str
- (undocumented)trim
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
try_to_binary
This is a special version ofto_binary
that performs the same operation, but returns a NULL value instead of raising an error if the conversion cannot be performed.- Parameters:
e
- (undocumented)f
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
try_to_binary
This is a special version ofto_binary
that performs the same operation, but returns a NULL value instead of raising an error if the conversion cannot be performed.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
try_to_number
Convert stringe
to a number based on the string formatformat
. Returns NULL if the stringe
does not match the expected format. The format follows the same semantics as the to_number function.- Parameters:
e
- (undocumented)format
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
char_length
Returns the character length of string data or number of bytes of binary data. The length of string data includes the trailing spaces. The length of binary data includes binary zeros.- Parameters:
str
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
character_length
Returns the character length of string data or number of bytes of binary data. The length of string data includes the trailing spaces. The length of binary data includes binary zeros.- Parameters:
str
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
chr
Returns the ASCII character having the binary equivalent ton
. If n is larger than 256 the result is equivalent to chr(n % 256)- Parameters:
n
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
contains
Returns a boolean. The value is True if right is found inside left. Returns NULL if either input expression is NULL. Otherwise, returns False. Both left or right must be of STRING or BINARY type.- Parameters:
left
- (undocumented)right
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
elt
Returns then
-th input, e.g., returnsinput2
whenn
is 2. The function returns NULL if the index exceeds the length of the array andspark.sql.ansi.enabled
is set to false. Ifspark.sql.ansi.enabled
is set to true, it throws ArrayIndexOutOfBoundsException for invalid indices.- Parameters:
inputs
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
find_in_set
Returns the index (1-based) of the given string (str
) in the comma-delimited list (strArray
). Returns 0, if the string was not found or if the given string (str
) contains a comma.- Parameters:
str
- (undocumented)strArray
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
like
Returns true if str matchespattern
withescapeChar
, null if any arguments are null, false otherwise.- Parameters:
str
- (undocumented)pattern
- (undocumented)escapeChar
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
like
Returns true if str matchespattern
withescapeChar
('\'), null if any arguments are null, false otherwise.- Parameters:
str
- (undocumented)pattern
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
ilike
Returns true if str matchespattern
withescapeChar
case-insensitively, null if any arguments are null, false otherwise.- Parameters:
str
- (undocumented)pattern
- (undocumented)escapeChar
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
ilike
Returns true if str matchespattern
withescapeChar
('\') case-insensitively, null if any arguments are null, false otherwise.- Parameters:
str
- (undocumented)pattern
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
lcase
Returnsstr
with all characters changed to lowercase.- Parameters:
str
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
ucase
Returnsstr
with all characters changed to uppercase.- Parameters:
str
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
left
Returns the leftmostlen
(len
can be string type) characters from the stringstr
, iflen
is less or equal than 0 the result is an empty string.- Parameters:
str
- (undocumented)len
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
right
Returns the rightmostlen
(len
can be string type) characters from the stringstr
, iflen
is less or equal than 0 the result is an empty string.- Parameters:
str
- (undocumented)len
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
add_months
Returns the date that isnumMonths
afterstartDate
.- Parameters:
startDate
- A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such asyyyy-MM-dd
oryyyy-MM-dd HH:mm:ss.SSSS
numMonths
- The number of months to add tostartDate
, can be negative to subtract months- Returns:
- A date, or null if
startDate
was a string that could not be cast to a date - Since:
- 1.5.0
-
add_months
Returns the date that isnumMonths
afterstartDate
.- Parameters:
startDate
- A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such asyyyy-MM-dd
oryyyy-MM-dd HH:mm:ss.SSSS
numMonths
- A column of the number of months to add tostartDate
, can be negative to subtract months- Returns:
- A date, or null if
startDate
was a string that could not be cast to a date - Since:
- 3.0.0
-
curdate
Returns the current date at the start of query evaluation as a date column. All calls of current_date within the same query return the same value.- Returns:
- (undocumented)
- Since:
- 3.5.0
-
current_date
Returns the current date at the start of query evaluation as a date column. All calls of current_date within the same query return the same value.- Returns:
- (undocumented)
- Since:
- 1.5.0
-
current_timezone
Returns the current session local timezone.- Returns:
- (undocumented)
- Since:
- 3.5.0
-
current_timestamp
Returns the current timestamp at the start of query evaluation as a timestamp column. All calls of current_timestamp within the same query return the same value.- Returns:
- (undocumented)
- Since:
- 1.5.0
-
now
Returns the current timestamp at the start of query evaluation.- Returns:
- (undocumented)
- Since:
- 3.5.0
-
localtimestamp
Returns the current timestamp without time zone at the start of query evaluation as a timestamp without time zone column. All calls of localtimestamp within the same query return the same value.- Returns:
- (undocumented)
- Since:
- 3.3.0
-
date_format
Converts a date/timestamp/string to a value of string in the format specified by the date format given by the second argument.See Datetime Patterns for valid date and time format patterns
- Parameters:
dateExpr
- A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such asyyyy-MM-dd
oryyyy-MM-dd HH:mm:ss.SSSS
format
- A patterndd.MM.yyyy
would return a string like18.03.1993
- Returns:
- A string, or null if
dateExpr
was a string that could not be cast to a timestamp - Throws:
IllegalArgumentException
- if theformat
pattern is invalid- Since:
- 1.5.0
- Note:
- Use specialized functions like
year(org.apache.spark.sql.Column)
whenever possible as they benefit from a specialized implementation.
-
date_add
Returns the date that isdays
days afterstart
- Parameters:
start
- A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such asyyyy-MM-dd
oryyyy-MM-dd HH:mm:ss.SSSS
days
- The number of days to add tostart
, can be negative to subtract days- Returns:
- A date, or null if
start
was a string that could not be cast to a date - Since:
- 1.5.0
-
date_add
Returns the date that isdays
days afterstart
- Parameters:
start
- A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such asyyyy-MM-dd
oryyyy-MM-dd HH:mm:ss.SSSS
days
- A column of the number of days to add tostart
, can be negative to subtract days- Returns:
- A date, or null if
start
was a string that could not be cast to a date - Since:
- 3.0.0
-
dateadd
Returns the date that isdays
days afterstart
- Parameters:
start
- A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such asyyyy-MM-dd
oryyyy-MM-dd HH:mm:ss.SSSS
days
- A column of the number of days to add tostart
, can be negative to subtract days- Returns:
- A date, or null if
start
was a string that could not be cast to a date - Since:
- 3.5.0
-
date_sub
Returns the date that isdays
days beforestart
- Parameters:
start
- A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such asyyyy-MM-dd
oryyyy-MM-dd HH:mm:ss.SSSS
days
- The number of days to subtract fromstart
, can be negative to add days- Returns:
- A date, or null if
start
was a string that could not be cast to a date - Since:
- 1.5.0
-
date_sub
Returns the date that isdays
days beforestart
- Parameters:
start
- A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such asyyyy-MM-dd
oryyyy-MM-dd HH:mm:ss.SSSS
days
- A column of the number of days to subtract fromstart
, can be negative to add days- Returns:
- A date, or null if
start
was a string that could not be cast to a date - Since:
- 3.0.0
-
datediff
Returns the number of days fromstart
toend
.Only considers the date part of the input. For example:
dateddiff("2018-01-10 00:00:00", "2018-01-09 23:59:59") // returns 1
- Parameters:
end
- A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such asyyyy-MM-dd
oryyyy-MM-dd HH:mm:ss.SSSS
start
- A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such asyyyy-MM-dd
oryyyy-MM-dd HH:mm:ss.SSSS
- Returns:
- An integer, or null if either
end
orstart
were strings that could not be cast to a date. Negative ifend
is beforestart
- Since:
- 1.5.0
-
date_diff
Returns the number of days fromstart
toend
.Only considers the date part of the input. For example:
dateddiff("2018-01-10 00:00:00", "2018-01-09 23:59:59") // returns 1
- Parameters:
end
- A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such asyyyy-MM-dd
oryyyy-MM-dd HH:mm:ss.SSSS
start
- A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such asyyyy-MM-dd
oryyyy-MM-dd HH:mm:ss.SSSS
- Returns:
- An integer, or null if either
end
orstart
were strings that could not be cast to a date. Negative ifend
is beforestart
- Since:
- 3.5.0
-
date_from_unix_date
Create date from the number ofdays
since 1970-01-01.- Parameters:
days
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
year
Extracts the year as an integer from a given date/timestamp/string.- Parameters:
e
- (undocumented)- Returns:
- An integer, or null if the input was a string that could not be cast to a date
- Since:
- 1.5.0
-
quarter
Extracts the quarter as an integer from a given date/timestamp/string.- Parameters:
e
- (undocumented)- Returns:
- An integer, or null if the input was a string that could not be cast to a date
- Since:
- 1.5.0
-
month
Extracts the month as an integer from a given date/timestamp/string.- Parameters:
e
- (undocumented)- Returns:
- An integer, or null if the input was a string that could not be cast to a date
- Since:
- 1.5.0
-
dayofweek
Extracts the day of the week as an integer from a given date/timestamp/string. Ranges from 1 for a Sunday through to 7 for a Saturday- Parameters:
e
- (undocumented)- Returns:
- An integer, or null if the input was a string that could not be cast to a date
- Since:
- 2.3.0
-
dayofmonth
Extracts the day of the month as an integer from a given date/timestamp/string.- Parameters:
e
- (undocumented)- Returns:
- An integer, or null if the input was a string that could not be cast to a date
- Since:
- 1.5.0
-
day
Extracts the day of the month as an integer from a given date/timestamp/string.- Parameters:
e
- (undocumented)- Returns:
- An integer, or null if the input was a string that could not be cast to a date
- Since:
- 3.5.0
-
dayofyear
Extracts the day of the year as an integer from a given date/timestamp/string.- Parameters:
e
- (undocumented)- Returns:
- An integer, or null if the input was a string that could not be cast to a date
- Since:
- 1.5.0
-
hour
Extracts the hours as an integer from a given date/timestamp/string.- Parameters:
e
- (undocumented)- Returns:
- An integer, or null if the input was a string that could not be cast to a date
- Since:
- 1.5.0
-
extract
Extracts a part of the date/timestamp or interval source.- Parameters:
field
- selects which part of the source should be extracted.source
- a date/timestamp or interval column from wherefield
should be extracted.- Returns:
- a part of the date/timestamp or interval source
- Since:
- 3.5.0
-
date_part
Extracts a part of the date/timestamp or interval source.- Parameters:
field
- selects which part of the source should be extracted, and supported string values are as same as the fields of the equivalent functionextract
.source
- a date/timestamp or interval column from wherefield
should be extracted.- Returns:
- a part of the date/timestamp or interval source
- Since:
- 3.5.0
-
datepart
Extracts a part of the date/timestamp or interval source.- Parameters:
field
- selects which part of the source should be extracted, and supported string values are as same as the fields of the equivalent functionEXTRACT
.source
- a date/timestamp or interval column from wherefield
should be extracted.- Returns:
- a part of the date/timestamp or interval source
- Since:
- 3.5.0
-
last_day
Returns the last day of the month which the given date belongs to. For example, input "2015-07-27" returns "2015-07-31" since July 31 is the last day of the month in July 2015.- Parameters:
e
- A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such asyyyy-MM-dd
oryyyy-MM-dd HH:mm:ss.SSSS
- Returns:
- A date, or null if the input was a string that could not be cast to a date
- Since:
- 1.5.0
-
minute
Extracts the minutes as an integer from a given date/timestamp/string.- Parameters:
e
- (undocumented)- Returns:
- An integer, or null if the input was a string that could not be cast to a date
- Since:
- 1.5.0
-
weekday
Returns the day of the week for date/timestamp (0 = Monday, 1 = Tuesday, ..., 6 = Sunday).- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
make_date
- Parameters:
year
- (undocumented)month
- (undocumented)day
- (undocumented)- Returns:
- A date created from year, month and day fields.
- Since:
- 3.3.0
-
months_between
Returns number of months between datesstart
andend
.A whole number is returned if both inputs have the same day of month or both are the last day of their respective months. Otherwise, the difference is calculated assuming 31 days per month.
For example:
months_between("2017-11-14", "2017-07-14") // returns 4.0 months_between("2017-01-01", "2017-01-10") // returns 0.29032258 months_between("2017-06-01", "2017-06-16 12:00:00") // returns -0.5
- Parameters:
end
- A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such asyyyy-MM-dd
oryyyy-MM-dd HH:mm:ss.SSSS
start
- A date, timestamp or string. If a string, the data must be in a format that can cast to a timestamp, such asyyyy-MM-dd
oryyyy-MM-dd HH:mm:ss.SSSS
- Returns:
- A double, or null if either
end
orstart
were strings that could not be cast to a timestamp. Negative ifend
is beforestart
- Since:
- 1.5.0
-
months_between
Returns number of months between datesend
andstart
. IfroundOff
is set to true, the result is rounded off to 8 digits; it is not rounded otherwise.- Parameters:
end
- (undocumented)start
- (undocumented)roundOff
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.4.0
-
next_day
Returns the first date which is later than the value of thedate
column that is on the specified day of the week.For example,
next_day('2015-07-27', "Sunday")
returns 2015-08-02 because that is the first Sunday after 2015-07-27.- Parameters:
date
- A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such asyyyy-MM-dd
oryyyy-MM-dd HH:mm:ss.SSSS
dayOfWeek
- Case insensitive, and accepts: "Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"- Returns:
- A date, or null if
date
was a string that could not be cast to a date or ifdayOfWeek
was an invalid value - Since:
- 1.5.0
-
next_day
Returns the first date which is later than the value of thedate
column that is on the specified day of the week.For example,
next_day('2015-07-27', "Sunday")
returns 2015-08-02 because that is the first Sunday after 2015-07-27.- Parameters:
date
- A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such asyyyy-MM-dd
oryyyy-MM-dd HH:mm:ss.SSSS
dayOfWeek
- A column of the day of week. Case insensitive, and accepts: "Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"- Returns:
- A date, or null if
date
was a string that could not be cast to a date or ifdayOfWeek
was an invalid value - Since:
- 3.2.0
-
second
Extracts the seconds as an integer from a given date/timestamp/string.- Parameters:
e
- (undocumented)- Returns:
- An integer, or null if the input was a string that could not be cast to a timestamp
- Since:
- 1.5.0
-
weekofyear
Extracts the week number as an integer from a given date/timestamp/string.A week is considered to start on a Monday and week 1 is the first week with more than 3 days, as defined by ISO 8601
- Parameters:
e
- (undocumented)- Returns:
- An integer, or null if the input was a string that could not be cast to a date
- Since:
- 1.5.0
-
from_unixtime
Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the yyyy-MM-dd HH:mm:ss format.- Parameters:
ut
- A number of a type that is castable to a long, such as string or integer. Can be negative for timestamps before the unix epoch- Returns:
- A string, or null if the input was a string that could not be cast to a long
- Since:
- 1.5.0
-
from_unixtime
Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the given format.See Datetime Patterns for valid date and time format patterns
- Parameters:
ut
- A number of a type that is castable to a long, such as string or integer. Can be negative for timestamps before the unix epochf
- A date time pattern that the input will be formatted to- Returns:
- A string, or null if
ut
was a string that could not be cast to a long orf
was an invalid date time pattern - Since:
- 1.5.0
-
unix_timestamp
Returns the current Unix timestamp (in seconds) as a long.- Returns:
- (undocumented)
- Since:
- 1.5.0
- Note:
- All calls of
unix_timestamp
within the same query return the same value (i.e. the current timestamp is calculated at the start of query evaluation).
-
unix_timestamp
Converts time string in format yyyy-MM-dd HH:mm:ss to Unix timestamp (in seconds), using the default timezone and the default locale.- Parameters:
s
- A date, timestamp or string. If a string, the data must be in theyyyy-MM-dd HH:mm:ss
format- Returns:
- A long, or null if the input was a string not of the correct format
- Since:
- 1.5.0
-
unix_timestamp
Converts time string with given pattern to Unix timestamp (in seconds).See Datetime Patterns for valid date and time format patterns
- Parameters:
s
- A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such asyyyy-MM-dd
oryyyy-MM-dd HH:mm:ss.SSSS
p
- A date time pattern detailing the format ofs
whens
is a string- Returns:
- A long, or null if
s
was a string that could not be cast to a date orp
was an invalid format - Since:
- 1.5.0
-
to_timestamp
Converts to a timestamp by casting rules toTimestampType
.- Parameters:
s
- A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such asyyyy-MM-dd
oryyyy-MM-dd HH:mm:ss.SSSS
- Returns:
- A timestamp, or null if the input was a string that could not be cast to a timestamp
- Since:
- 2.2.0
-
to_timestamp
Converts time string with the given pattern to timestamp.See Datetime Patterns for valid date and time format patterns
- Parameters:
s
- A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such asyyyy-MM-dd
oryyyy-MM-dd HH:mm:ss.SSSS
fmt
- A date time pattern detailing the format ofs
whens
is a string- Returns:
- A timestamp, or null if
s
was a string that could not be cast to a timestamp orfmt
was an invalid format - Since:
- 2.2.0
-
try_to_timestamp
Parses thes
with theformat
to a timestamp. The function always returns null on an invalid input with/
without ANSI SQL mode enabled. The result data type is consistent with the value of configurationspark.sql.timestampType
.- Parameters:
s
- (undocumented)format
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
try_to_timestamp
Parses thes
to a timestamp. The function always returns null on an invalid input with/
without ANSI SQL mode enabled. It follows casting rules to a timestamp. The result data type is consistent with the value of configurationspark.sql.timestampType
.- Parameters:
s
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
to_date
Converts the column intoDateType
by casting rules toDateType
.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
to_date
Converts the column into aDateType
with a specified formatSee Datetime Patterns for valid date and time format patterns
- Parameters:
e
- A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such asyyyy-MM-dd
oryyyy-MM-dd HH:mm:ss.SSSS
fmt
- A date time pattern detailing the format ofe
whene
is a string- Returns:
- A date, or null if
e
was a string that could not be cast to a date orfmt
was an invalid format - Since:
- 2.2.0
-
unix_date
Returns the number of days since 1970-01-01.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
unix_micros
Returns the number of microseconds since 1970-01-01 00:00:00 UTC.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
unix_millis
Returns the number of milliseconds since 1970-01-01 00:00:00 UTC. Truncates higher levels of precision.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
unix_seconds
Returns the number of seconds since 1970-01-01 00:00:00 UTC. Truncates higher levels of precision.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
trunc
Returns date truncated to the unit specified by the format.For example,
trunc("2018-11-19 12:01:19", "year")
returns 2018-01-01- Parameters:
date
- A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such asyyyy-MM-dd
oryyyy-MM-dd HH:mm:ss.SSSS
format
- : 'year', 'yyyy', 'yy' to truncate by year, or 'month', 'mon', 'mm' to truncate by month Other options are: 'week', 'quarter'- Returns:
- A date, or null if
date
was a string that could not be cast to a date orformat
was an invalid value - Since:
- 1.5.0
-
date_trunc
Returns timestamp truncated to the unit specified by the format.For example,
date_trunc("year", "2018-11-19 12:01:19")
returns 2018-01-01 00:00:00- Parameters:
format
- : 'year', 'yyyy', 'yy' to truncate by year, 'month', 'mon', 'mm' to truncate by month, 'day', 'dd' to truncate by day, Other options are: 'microsecond', 'millisecond', 'second', 'minute', 'hour', 'week', 'quarter'timestamp
- A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such asyyyy-MM-dd
oryyyy-MM-dd HH:mm:ss.SSSS
- Returns:
- A timestamp, or null if
timestamp
was a string that could not be cast to a timestamp orformat
was an invalid value - Since:
- 2.3.0
-
from_utc_timestamp
Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders that time as a timestamp in the given time zone. For example, 'GMT+1' would yield '2017-07-14 03:40:00.0'.- Parameters:
ts
- A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such asyyyy-MM-dd
oryyyy-MM-dd HH:mm:ss.SSSS
tz
- A string detailing the time zone ID that the input should be adjusted to. It should be in the format of either region-based zone IDs or zone offsets. Region IDs must have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'. Other short names are not recommended to use because they can be ambiguous.- Returns:
- A timestamp, or null if
ts
was a string that could not be cast to a timestamp ortz
was an invalid value - Since:
- 1.5.0
-
from_utc_timestamp
Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders that time as a timestamp in the given time zone. For example, 'GMT+1' would yield '2017-07-14 03:40:00.0'.- Parameters:
ts
- (undocumented)tz
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.4.0
-
to_utc_timestamp
Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTC. For example, 'GMT+1' would yield '2017-07-14 01:40:00.0'.- Parameters:
ts
- A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such asyyyy-MM-dd
oryyyy-MM-dd HH:mm:ss.SSSS
tz
- A string detailing the time zone ID that the input should be adjusted to. It should be in the format of either region-based zone IDs or zone offsets. Region IDs must have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'. Other short names are not recommended to use because they can be ambiguous.- Returns:
- A timestamp, or null if
ts
was a string that could not be cast to a timestamp ortz
was an invalid value - Since:
- 1.5.0
-
to_utc_timestamp
Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTC. For example, 'GMT+1' would yield '2017-07-14 01:40:00.0'.- Parameters:
ts
- (undocumented)tz
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.4.0
-
window
public static Column window(Column timeColumn, String windowDuration, String slideDuration, String startTime) Bucketize rows into one or more time windows given a timestamp specifying column. Window starts are inclusive but the window ends are exclusive, e.g. 12:05 will be in the window [12:05,12:10) but not in [12:00,12:05). Windows can support microsecond precision. Windows in the order of months are not supported. The following example takes the average stock price for a one minute window every 10 seconds starting 5 seconds after the hour:val df = ... // schema => timestamp: TimestampType, stockId: StringType, price: DoubleType df.groupBy(window($"timestamp", "1 minute", "10 seconds", "5 seconds"), $"stockId") .agg(mean("price"))
The windows will look like:
09:00:05-09:01:05 09:00:15-09:01:15 09:00:25-09:01:25 ...
For a streaming query, you may use the function
current_timestamp
to generate windows on processing time.- Parameters:
timeColumn
- The column or the expression to use as the timestamp for windowing by time. The time column must be of TimestampType or TimestampNTZType.windowDuration
- A string specifying the width of the window, e.g.10 minutes
,1 second
. Checkorg.apache.spark.unsafe.types.CalendarInterval
for valid duration identifiers. Note that the duration is a fixed length of time, and does not vary over time according to a calendar. For example,1 day
always means 86,400,000 milliseconds, not a calendar day.slideDuration
- A string specifying the sliding interval of the window, e.g.1 minute
. A new window will be generated everyslideDuration
. Must be less than or equal to thewindowDuration
. Checkorg.apache.spark.unsafe.types.CalendarInterval
for valid duration identifiers. This duration is likewise absolute, and does not vary according to a calendar.startTime
- The offset with respect to 1970-01-01 00:00:00 UTC with which to start window intervals. For example, in order to have hourly tumbling windows that start 15 minutes past the hour, e.g. 12:15-13:15, 13:15-14:15... providestartTime
as15 minutes
.- Returns:
- (undocumented)
- Since:
- 2.0.0
-
window
Bucketize rows into one or more time windows given a timestamp specifying column. Window starts are inclusive but the window ends are exclusive, e.g. 12:05 will be in the window [12:05,12:10) but not in [12:00,12:05). Windows can support microsecond precision. Windows in the order of months are not supported. The windows start beginning at 1970-01-01 00:00:00 UTC. The following example takes the average stock price for a one minute window every 10 seconds:val df = ... // schema => timestamp: TimestampType, stockId: StringType, price: DoubleType df.groupBy(window($"timestamp", "1 minute", "10 seconds"), $"stockId") .agg(mean("price"))
The windows will look like:
09:00:00-09:01:00 09:00:10-09:01:10 09:00:20-09:01:20 ...
For a streaming query, you may use the function
current_timestamp
to generate windows on processing time.- Parameters:
timeColumn
- The column or the expression to use as the timestamp for windowing by time. The time column must be of TimestampType or TimestampNTZType.windowDuration
- A string specifying the width of the window, e.g.10 minutes
,1 second
. Checkorg.apache.spark.unsafe.types.CalendarInterval
for valid duration identifiers. Note that the duration is a fixed length of time, and does not vary over time according to a calendar. For example,1 day
always means 86,400,000 milliseconds, not a calendar day.slideDuration
- A string specifying the sliding interval of the window, e.g.1 minute
. A new window will be generated everyslideDuration
. Must be less than or equal to thewindowDuration
. Checkorg.apache.spark.unsafe.types.CalendarInterval
for valid duration identifiers. This duration is likewise absolute, and does not vary according to a calendar.- Returns:
- (undocumented)
- Since:
- 2.0.0
-
window
Generates tumbling time windows given a timestamp specifying column. Window starts are inclusive but the window ends are exclusive, e.g. 12:05 will be in the window [12:05,12:10) but not in [12:00,12:05). Windows can support microsecond precision. Windows in the order of months are not supported. The windows start beginning at 1970-01-01 00:00:00 UTC. The following example takes the average stock price for a one minute tumbling window:val df = ... // schema => timestamp: TimestampType, stockId: StringType, price: DoubleType df.groupBy(window($"timestamp", "1 minute"), $"stockId") .agg(mean("price"))
The windows will look like:
09:00:00-09:01:00 09:01:00-09:02:00 09:02:00-09:03:00 ...
For a streaming query, you may use the function
current_timestamp
to generate windows on processing time.- Parameters:
timeColumn
- The column or the expression to use as the timestamp for windowing by time. The time column must be of TimestampType or TimestampNTZType.windowDuration
- A string specifying the width of the window, e.g.10 minutes
,1 second
. Checkorg.apache.spark.unsafe.types.CalendarInterval
for valid duration identifiers.- Returns:
- (undocumented)
- Since:
- 2.0.0
-
window_time
Extracts the event time from the window column.The window column is of StructType { start: Timestamp, end: Timestamp } where start is inclusive and end is exclusive. Since event time can support microsecond precision, window_time(window) = window.end - 1 microsecond.
- Parameters:
windowColumn
- The window column (typically produced by window aggregation) of type StructType { start: Timestamp, end: Timestamp }- Returns:
- (undocumented)
- Since:
- 3.4.0
-
session_window
Generates session window given a timestamp specifying column.Session window is one of dynamic windows, which means the length of window is varying according to the given inputs. The length of session window is defined as "the timestamp of latest input of the session + gap duration", so when the new inputs are bound to the current session window, the end time of session window can be expanded according to the new inputs.
Windows can support microsecond precision. gapDuration in the order of months are not supported.
For a streaming query, you may use the function
current_timestamp
to generate windows on processing time.- Parameters:
timeColumn
- The column or the expression to use as the timestamp for windowing by time. The time column must be of TimestampType or TimestampNTZType.gapDuration
- A string specifying the timeout of the session, e.g.10 minutes
,1 second
. Checkorg.apache.spark.unsafe.types.CalendarInterval
for valid duration identifiers.- Returns:
- (undocumented)
- Since:
- 3.2.0
-
session_window
Generates session window given a timestamp specifying column.Session window is one of dynamic windows, which means the length of window is varying according to the given inputs. For static gap duration, the length of session window is defined as "the timestamp of latest input of the session + gap duration", so when the new inputs are bound to the current session window, the end time of session window can be expanded according to the new inputs.
Besides a static gap duration value, users can also provide an expression to specify gap duration dynamically based on the input row. With dynamic gap duration, the closing of a session window does not depend on the latest input anymore. A session window's range is the union of all events' ranges which are determined by event start time and evaluated gap duration during the query execution. Note that the rows with negative or zero gap duration will be filtered out from the aggregation.
Windows can support microsecond precision. gapDuration in the order of months are not supported.
For a streaming query, you may use the function
current_timestamp
to generate windows on processing time.- Parameters:
timeColumn
- The column or the expression to use as the timestamp for windowing by time. The time column must be of TimestampType or TimestampNTZType.gapDuration
- A column specifying the timeout of the session. It could be static value, e.g.10 minutes
,1 second
, or an expression/UDF that specifies gap duration dynamically based on the input row.- Returns:
- (undocumented)
- Since:
- 3.2.0
-
timestamp_seconds
Converts the number of seconds from the Unix epoch (1970-01-01T00:00:00Z) to a timestamp.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.1.0
-
timestamp_millis
Creates timestamp from the number of milliseconds since UTC epoch.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
timestamp_micros
Creates timestamp from the number of microseconds since UTC epoch.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
timestamp_diff
Gets the difference between the timestamps in the specified units by truncating the fraction part.- Parameters:
unit
- (undocumented)start
- (undocumented)end
- (undocumented)- Returns:
- (undocumented)
- Since:
- 4.0.0
-
timestamp_add
Adds the specified number of units to the given timestamp.- Parameters:
unit
- (undocumented)quantity
- (undocumented)ts
- (undocumented)- Returns:
- (undocumented)
- Since:
- 4.0.0
-
to_timestamp_ltz
Parses thetimestamp
expression with theformat
expression to a timestamp without time zone. Returns null with invalid input.- Parameters:
timestamp
- (undocumented)format
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
to_timestamp_ltz
Parses thetimestamp
expression with the default format to a timestamp without time zone. The default format follows casting rules to a timestamp. Returns null with invalid input.- Parameters:
timestamp
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
to_timestamp_ntz
Parses thetimestamp_str
expression with theformat
expression to a timestamp without time zone. Returns null with invalid input.- Parameters:
timestamp
- (undocumented)format
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
to_timestamp_ntz
Parses thetimestamp
expression with the default format to a timestamp without time zone. The default format follows casting rules to a timestamp. Returns null with invalid input.- Parameters:
timestamp
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
to_unix_timestamp
Returns the UNIX timestamp of the given time.- Parameters:
timeExp
- (undocumented)format
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
to_unix_timestamp
Returns the UNIX timestamp of the given time.- Parameters:
timeExp
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
monthname
Extracts the three-letter abbreviated month name from a given date/timestamp/string.- Parameters:
timeExp
- (undocumented)- Returns:
- (undocumented)
- Since:
- 4.0.0
-
dayname
Extracts the three-letter abbreviated day name from a given date/timestamp/string.- Parameters:
timeExp
- (undocumented)- Returns:
- (undocumented)
- Since:
- 4.0.0
-
array_contains
Returns null if the array is null, true if the array containsvalue
, and false otherwise.- Parameters:
column
- (undocumented)value
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
array_append
Returns an ARRAY containing all elements from the source ARRAY as well as the new element. The new element/column is located at end of the ARRAY.- Parameters:
column
- (undocumented)element
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.4.0
-
arrays_overlap
Returnstrue
ifa1
anda2
have at least one non-null element in common. If not and both the arrays are non-empty and any of them contains anull
, it returnsnull
. It returnsfalse
otherwise.- Parameters:
a1
- (undocumented)a2
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.4.0
-
slice
Returns an array containing all the elements inx
from indexstart
(or starting from the end ifstart
is negative) with the specifiedlength
.- Parameters:
x
- the array column to be slicedstart
- the starting indexlength
- the length of the slice- Returns:
- (undocumented)
- Since:
- 2.4.0
-
slice
Returns an array containing all the elements inx
from indexstart
(or starting from the end ifstart
is negative) with the specifiedlength
.- Parameters:
x
- the array column to be slicedstart
- the starting indexlength
- the length of the slice- Returns:
- (undocumented)
- Since:
- 3.1.0
-
array_join
Concatenates the elements ofcolumn
using thedelimiter
. Null values are replaced withnullReplacement
.- Parameters:
column
- (undocumented)delimiter
- (undocumented)nullReplacement
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.4.0
-
array_join
Concatenates the elements ofcolumn
using thedelimiter
.- Parameters:
column
- (undocumented)delimiter
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.4.0
-
concat
Concatenates multiple input columns together into a single column. The function works with strings, binary and compatible array columns.- Parameters:
exprs
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
- Note:
- Returns null if any of the input columns are null.
-
array_position
Locates the position of the first occurrence of the value in the given array as long. Returns null if either of the arguments are null.- Parameters:
column
- (undocumented)value
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.4.0
- Note:
- The position is not zero based, but 1 based index. Returns 0 if value could not be found in array.
-
element_at
Returns element of array at given index in value if column is array. Returns value for the given key in value if column is map.- Parameters:
column
- (undocumented)value
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.4.0
-
try_element_at
(array, index) - Returns element of array at given (1-based) index. If Index is 0, Spark will throw an error. If index < 0, accesses elements from the last to the first. The function always returns NULL if the index exceeds the length of the array.(map, key) - Returns value for given key. The function always returns NULL if the key is not contained in the map.
- Parameters:
column
- (undocumented)value
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
get
Returns element of array at given (0-based) index. If the index points outside of the array boundaries, then this function returns NULL.- Parameters:
column
- (undocumented)index
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.4.0
-
array_sort
Sorts the input array in ascending order. The elements of the input array must be orderable. NaN is greater than any non-NaN elements for double/float type. Null elements will be placed at the end of the returned array.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.4.0
-
array_sort
Sorts the input array based on the given comparator function. The comparator will take two arguments representing two elements of the array. It returns a negative integer, 0, or a positive integer as the first element is less than, equal to, or greater than the second element. If the comparator function returns null, the function will fail and raise an error.- Parameters:
e
- (undocumented)comparator
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.4.0
-
array_remove
Remove all elements that equal to element from the given array.- Parameters:
column
- (undocumented)element
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.4.0
-
array_compact
Remove all null elements from the given array.- Parameters:
column
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.4.0
-
array_prepend
Returns an array containing value as well as all elements from array. The new element is positioned at the beginning of the array.- Parameters:
column
- (undocumented)element
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
array_distinct
Removes duplicate values from the array.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.4.0
-
array_intersect
Returns an array of the elements in the intersection of the given two arrays, without duplicates.- Parameters:
col1
- (undocumented)col2
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.4.0
-
array_insert
Adds an item into a given array at a specified position- Parameters:
arr
- (undocumented)pos
- (undocumented)value
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.4.0
-
array_union
Returns an array of the elements in the union of the given two arrays, without duplicates.- Parameters:
col1
- (undocumented)col2
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.4.0
-
array_except
Returns an array of the elements in the first array but not in the second array, without duplicates. The order of elements in the result is not determined- Parameters:
col1
- (undocumented)col2
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.4.0
-
transform
Returns an array of elements after applying a transformation to each element in the input array.df.select(transform(col("i"), x => x + 1))
- Parameters:
column
- the input array columnf
- col => transformed_col, the lambda function to transform the input column- Returns:
- (undocumented)
- Since:
- 3.0.0
-
transform
Returns an array of elements after applying a transformation to each element in the input array.df.select(transform(col("i"), (x, i) => x + i))
- Parameters:
column
- the input array columnf
- (col, index) => transformed_col, the lambda function to transform the input column given the index. Indices start at 0.- Returns:
- (undocumented)
- Since:
- 3.0.0
-
exists
Returns whether a predicate holds for one or more elements in the array.df.select(exists(col("i"), _ % 2 === 0))
- Parameters:
column
- the input array columnf
- col => predicate, the Boolean predicate to check the input column- Returns:
- (undocumented)
- Since:
- 3.0.0
-
forall
Returns whether a predicate holds for every element in the array.df.select(forall(col("i"), x => x % 2 === 0))
- Parameters:
column
- the input array columnf
- col => predicate, the Boolean predicate to check the input column- Returns:
- (undocumented)
- Since:
- 3.0.0
-
filter
Returns an array of elements for which a predicate holds in a given array.df.select(filter(col("s"), x => x % 2 === 0))
- Parameters:
column
- the input array columnf
- col => predicate, the Boolean predicate to filter the input column- Returns:
- (undocumented)
- Since:
- 3.0.0
-
filter
Returns an array of elements for which a predicate holds in a given array.df.select(filter(col("s"), (x, i) => i % 2 === 0))
- Parameters:
column
- the input array columnf
- (col, index) => predicate, the Boolean predicate to filter the input column given the index. Indices start at 0.- Returns:
- (undocumented)
- Since:
- 3.0.0
-
aggregate
public static Column aggregate(Column expr, Column initialValue, scala.Function2<Column, Column, Column> merge, scala.Function1<Column, Column> finish) Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state. The final state is converted into the final result by applying a finish function.df.select(aggregate(col("i"), lit(0), (acc, x) => acc + x, _ * 10))
- Parameters:
expr
- the input array columninitialValue
- the initial valuemerge
- (combined_value, input_value) => combined_value, the merge function to merge an input value to the combined_valuefinish
- combined_value => final_value, the lambda function to convert the combined value of all inputs to final result- Returns:
- (undocumented)
- Since:
- 3.0.0
-
aggregate
public static Column aggregate(Column expr, Column initialValue, scala.Function2<Column, Column, Column> merge) Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state.df.select(aggregate(col("i"), lit(0), (acc, x) => acc + x))
- Parameters:
expr
- the input array columninitialValue
- the initial valuemerge
- (combined_value, input_value) => combined_value, the merge function to merge an input value to the combined_value- Returns:
- (undocumented)
- Since:
- 3.0.0
-
reduce
public static Column reduce(Column expr, Column initialValue, scala.Function2<Column, Column, Column> merge, scala.Function1<Column, Column> finish) Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state. The final state is converted into the final result by applying a finish function.df.select(aggregate(col("i"), lit(0), (acc, x) => acc + x, _ * 10))
- Parameters:
expr
- the input array columninitialValue
- the initial valuemerge
- (combined_value, input_value) => combined_value, the merge function to merge an input value to the combined_valuefinish
- combined_value => final_value, the lambda function to convert the combined value of all inputs to final result- Returns:
- (undocumented)
- Since:
- 3.5.0
-
reduce
public static Column reduce(Column expr, Column initialValue, scala.Function2<Column, Column, Column> merge) Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state.df.select(aggregate(col("i"), lit(0), (acc, x) => acc + x))
- Parameters:
expr
- the input array columninitialValue
- the initial valuemerge
- (combined_value, input_value) => combined_value, the merge function to merge an input value to the combined_value- Returns:
- (undocumented)
- Since:
- 3.5.0
-
zip_with
Merge two given arrays, element-wise, into a single array using a function. If one array is shorter, nulls are appended at the end to match the length of the longer array, before applying the function.df.select(zip_with(df1("val1"), df1("val2"), (x, y) => x + y))
- Parameters:
left
- the left input array columnright
- the right input array columnf
- (lCol, rCol) => col, the lambda function to merge two input columns into one column- Returns:
- (undocumented)
- Since:
- 3.0.0
-
transform_keys
Applies a function to every key-value pair in a map and returns a map with the results of those applications as the new keys for the pairs.df.select(transform_keys(col("i"), (k, v) => k + v))
- Parameters:
expr
- the input map columnf
- (key, value) => new_key, the lambda function to transform the key of input map column- Returns:
- (undocumented)
- Since:
- 3.0.0
-
transform_values
Applies a function to every key-value pair in a map and returns a map with the results of those applications as the new values for the pairs.df.select(transform_values(col("i"), (k, v) => k + v))
- Parameters:
expr
- the input map columnf
- (key, value) => new_value, the lambda function to transform the value of input map column- Returns:
- (undocumented)
- Since:
- 3.0.0
-
map_filter
Returns a map whose key-value pairs satisfy a predicate.df.select(map_filter(col("m"), (k, v) => k * 10 === v))
- Parameters:
expr
- the input map columnf
- (key, value) => predicate, the Boolean predicate to filter the input map column- Returns:
- (undocumented)
- Since:
- 3.0.0
-
map_zip_with
public static Column map_zip_with(Column left, Column right, scala.Function3<Column, Column, Column, Column> f) Merge two given maps, key-wise into a single map using a function.df.select(map_zip_with(df("m1"), df("m2"), (k, v1, v2) => k === v1 + v2))
- Parameters:
left
- the left input map columnright
- the right input map columnf
- (key, value1, value2) => new_value, the lambda function to merge the map values- Returns:
- (undocumented)
- Since:
- 3.0.0
-
explode
Creates a new row for each element in the given array or map column. Uses the default column namecol
for elements in the array andkey
andvalue
for elements in the map unless specified otherwise.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
explode_outer
Creates a new row for each element in the given array or map column. Uses the default column namecol
for elements in the array andkey
andvalue
for elements in the map unless specified otherwise. Unlike explode, if the array/map is null or empty then null is produced.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.2.0
-
posexplode
Creates a new row for each element with position in the given array or map column. Uses the default column namepos
for position, andcol
for elements in the array andkey
andvalue
for elements in the map unless specified otherwise.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.1.0
-
posexplode_outer
Creates a new row for each element with position in the given array or map column. Uses the default column namepos
for position, andcol
for elements in the array andkey
andvalue
for elements in the map unless specified otherwise. Unlike posexplode, if the array/map is null or empty then the row (null, null) is produced.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.2.0
-
inline
Creates a new row for each element in the given array of structs.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.4.0
-
inline_outer
Creates a new row for each element in the given array of structs. Unlike inline, if the array is null or empty then null is produced for each nested column.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.4.0
-
get_json_object
Extracts json object from a json string based on json path specified, and returns json string of the extracted json object. It will return null if the input json string is invalid.- Parameters:
e
- (undocumented)path
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.6.0
-
json_tuple
Creates a new row for a json column according to the given field names.- Parameters:
json
- (undocumented)fields
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.6.0
-
from_json
public static Column from_json(Column e, StructType schema, scala.collection.immutable.Map<String, String> options) (Scala-specific) Parses a column containing a JSON string into aStructType
with the specified schema. Returnsnull
, in the case of an unparseable string.- Parameters:
e
- a string column containing JSON data.schema
- the schema to use when parsing the json stringoptions
- options to control how the json is parsed. Accepts the same options as the json data source. See Data Source Option in the version you use.- Returns:
- (undocumented)
- Since:
- 2.1.0
-
from_json
public static Column from_json(Column e, DataType schema, scala.collection.immutable.Map<String, String> options) (Scala-specific) Parses a column containing a JSON string into aMapType
withStringType
as keys type,StructType
orArrayType
with the specified schema. Returnsnull
, in the case of an unparseable string.- Parameters:
e
- a string column containing JSON data.schema
- the schema to use when parsing the json stringoptions
- options to control how the json is parsed. accepts the same options and the json data source. See Data Source Option in the version you use.- Returns:
- (undocumented)
- Since:
- 2.2.0
-
from_json
(Java-specific) Parses a column containing a JSON string into aStructType
with the specified schema. Returnsnull
, in the case of an unparseable string.- Parameters:
e
- a string column containing JSON data.schema
- the schema to use when parsing the json stringoptions
- options to control how the json is parsed. accepts the same options and the json data source. See Data Source Option in the version you use.- Returns:
- (undocumented)
- Since:
- 2.1.0
-
from_json
(Java-specific) Parses a column containing a JSON string into aMapType
withStringType
as keys type,StructType
orArrayType
with the specified schema. Returnsnull
, in the case of an unparseable string.- Parameters:
e
- a string column containing JSON data.schema
- the schema to use when parsing the json stringoptions
- options to control how the json is parsed. accepts the same options and the json data source. See Data Source Option in the version you use.- Returns:
- (undocumented)
- Since:
- 2.2.0
-
from_json
Parses a column containing a JSON string into aStructType
with the specified schema. Returnsnull
, in the case of an unparseable string.- Parameters:
e
- a string column containing JSON data.schema
- the schema to use when parsing the json string- Returns:
- (undocumented)
- Since:
- 2.1.0
-
from_json
Parses a column containing a JSON string into aMapType
withStringType
as keys type,StructType
orArrayType
with the specified schema. Returnsnull
, in the case of an unparseable string.- Parameters:
e
- a string column containing JSON data.schema
- the schema to use when parsing the json string- Returns:
- (undocumented)
- Since:
- 2.2.0
-
from_json
(Java-specific) Parses a column containing a JSON string into aMapType
withStringType
as keys type,StructType
orArrayType
with the specified schema. Returnsnull
, in the case of an unparseable string.- Parameters:
e
- a string column containing JSON data.schema
- the schema as a DDL-formatted string.options
- options to control how the json is parsed. accepts the same options and the json data source. See Data Source Option in the version you use.- Returns:
- (undocumented)
- Since:
- 2.1.0
-
from_json
public static Column from_json(Column e, String schema, scala.collection.immutable.Map<String, String> options) (Scala-specific) Parses a column containing a JSON string into aMapType
withStringType
as keys type,StructType
orArrayType
with the specified schema. Returnsnull
, in the case of an unparseable string.- Parameters:
e
- a string column containing JSON data.schema
- the schema as a DDL-formatted string.options
- options to control how the json is parsed. accepts the same options and the json data source. See Data Source Option in the version you use.- Returns:
- (undocumented)
- Since:
- 2.3.0
-
from_json
(Scala-specific) Parses a column containing a JSON string into aMapType
withStringType
as keys type,StructType
orArrayType
ofStructType
s with the specified schema. Returnsnull
, in the case of an unparseable string.- Parameters:
e
- a string column containing JSON data.schema
- the schema to use when parsing the json string- Returns:
- (undocumented)
- Since:
- 2.4.0
-
from_json
(Java-specific) Parses a column containing a JSON string into aMapType
withStringType
as keys type,StructType
orArrayType
ofStructType
s with the specified schema. Returnsnull
, in the case of an unparseable string.- Parameters:
e
- a string column containing JSON data.schema
- the schema to use when parsing the json stringoptions
- options to control how the json is parsed. accepts the same options and the json data source. See Data Source Option in the version you use.- Returns:
- (undocumented)
- Since:
- 2.4.0
-
try_parse_json
Parses a JSON string and constructs a Variant value. Returns null if the input string is not a valid JSON value.- Parameters:
json
- a string column that contains JSON data.- Returns:
- (undocumented)
- Since:
- 4.0.0
-
parse_json
Parses a JSON string and constructs a Variant value.- Parameters:
json
- a string column that contains JSON data.- Returns:
- (undocumented)
- Since:
- 4.0.0
-
to_variant_object
Converts a column containing nested inputs (array/map/struct) into a variants where maps and structs are converted to variant objects which are unordered unlike SQL structs. Input maps can only have string keys.- Parameters:
col
- a column with a nested schema or column name.- Returns:
- (undocumented)
- Since:
- 4.0.0
-
is_variant_null
Check if a variant value is a variant null. Returns true if and only if the input is a variant null and false otherwise (including in the case of SQL NULL).- Parameters:
v
- a variant column.- Returns:
- (undocumented)
- Since:
- 4.0.0
-
variant_get
Extracts a sub-variant fromv
according topath
, and then cast the sub-variant totargetType
. Returns null if the path does not exist. Throws an exception if the cast fails.- Parameters:
v
- a variant column.path
- the extraction path. A valid path should start with$
and is followed by zero or more segments like[123]
,.name
,['name']
, or["name"]
.targetType
- the target data type to cast into, in a DDL-formatted string.- Returns:
- (undocumented)
- Since:
- 4.0.0
-
try_variant_get
Extracts a sub-variant fromv
according topath
, and then cast the sub-variant totargetType
. Returns null if the path does not exist or the cast fails..- Parameters:
v
- a variant column.path
- the extraction path. A valid path should start with$
and is followed by zero or more segments like[123]
,.name
,['name']
, or["name"]
.targetType
- the target data type to cast into, in a DDL-formatted string.- Returns:
- (undocumented)
- Since:
- 4.0.0
-
schema_of_variant
Returns schema in the SQL format of a variant.- Parameters:
v
- a variant column.- Returns:
- (undocumented)
- Since:
- 4.0.0
-
schema_of_variant_agg
Returns the merged schema in the SQL format of a variant column.- Parameters:
v
- a variant column.- Returns:
- (undocumented)
- Since:
- 4.0.0
-
schema_of_json
Parses a JSON string and infers its schema in DDL format.- Parameters:
json
- a JSON string.- Returns:
- (undocumented)
- Since:
- 2.4.0
-
schema_of_json
Parses a JSON string and infers its schema in DDL format.- Parameters:
json
- a foldable string column containing a JSON string.- Returns:
- (undocumented)
- Since:
- 2.4.0
-
schema_of_json
Parses a JSON string and infers its schema in DDL format using options.- Parameters:
json
- a foldable string column containing JSON data.options
- options to control how the json is parsed. accepts the same options and the json data source. See Data Source Option in the version you use.- Returns:
- a column with string literal containing schema in DDL format.
- Since:
- 3.0.0
-
json_array_length
Returns the number of elements in the outermost JSON array.NULL
is returned in case of any other valid JSON string,NULL
or an invalid JSON.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
json_object_keys
Returns all the keys of the outermost JSON object as an array. If a valid JSON object is given, all the keys of the outermost object will be returned as an array. If it is any other valid JSON string, an invalid JSON string or an empty string, the function returns null.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
to_json
(Scala-specific) Converts a column containing aStructType
,ArrayType
or aMapType
into a JSON string with the specified schema. Throws an exception, in the case of an unsupported type.- Parameters:
e
- a column containing a struct, an array or a map.options
- options to control how the struct column is converted into a json string. accepts the same options and the json data source. See Data Source Option in the version you use. Additionally the function supports thepretty
option which enables pretty JSON generation.- Returns:
- (undocumented)
- Since:
- 2.1.0
-
to_json
(Java-specific) Converts a column containing aStructType
,ArrayType
or aMapType
into a JSON string with the specified schema. Throws an exception, in the case of an unsupported type.- Parameters:
e
- a column containing a struct, an array or a map.options
- options to control how the struct column is converted into a json string. accepts the same options and the json data source. See Data Source Option in the version you use. Additionally the function supports thepretty
option which enables pretty JSON generation.- Returns:
- (undocumented)
- Since:
- 2.1.0
-
to_json
Converts a column containing aStructType
,ArrayType
or aMapType
into a JSON string with the specified schema. Throws an exception, in the case of an unsupported type.- Parameters:
e
- a column containing a struct, an array or a map.- Returns:
- (undocumented)
- Since:
- 2.1.0
-
mask
Masks the given string value. The function replaces characters with 'X' or 'x', and numbers with 'n'. This can be useful for creating copies of tables with sensitive information removed.- Parameters:
input
- string value to mask. Supported types: STRING, VARCHAR, CHAR- Returns:
- (undocumented)
- Since:
- 3.5.0
-
mask
Masks the given string value. The function replaces upper-case characters with specific character, lower-case characters with 'x', and numbers with 'n'. This can be useful for creating copies of tables with sensitive information removed.- Parameters:
input
- string value to mask. Supported types: STRING, VARCHAR, CHARupperChar
- character to replace upper-case characters with. Specify NULL to retain original character.- Returns:
- (undocumented)
- Since:
- 3.5.0
-
mask
Masks the given string value. The function replaces upper-case and lower-case characters with the characters specified respectively, and numbers with 'n'. This can be useful for creating copies of tables with sensitive information removed.- Parameters:
input
- string value to mask. Supported types: STRING, VARCHAR, CHARupperChar
- character to replace upper-case characters with. Specify NULL to retain original character.lowerChar
- character to replace lower-case characters with. Specify NULL to retain original character.- Returns:
- (undocumented)
- Since:
- 3.5.0
-
mask
Masks the given string value. The function replaces upper-case, lower-case characters and numbers with the characters specified respectively. This can be useful for creating copies of tables with sensitive information removed.- Parameters:
input
- string value to mask. Supported types: STRING, VARCHAR, CHARupperChar
- character to replace upper-case characters with. Specify NULL to retain original character.lowerChar
- character to replace lower-case characters with. Specify NULL to retain original character.digitChar
- character to replace digit characters with. Specify NULL to retain original character.- Returns:
- (undocumented)
- Since:
- 3.5.0
-
mask
public static Column mask(Column input, Column upperChar, Column lowerChar, Column digitChar, Column otherChar) Masks the given string value. This can be useful for creating copies of tables with sensitive information removed.- Parameters:
input
- string value to mask. Supported types: STRING, VARCHAR, CHARupperChar
- character to replace upper-case characters with. Specify NULL to retain original character.lowerChar
- character to replace lower-case characters with. Specify NULL to retain original character.digitChar
- character to replace digit characters with. Specify NULL to retain original character.otherChar
- character to replace all other characters with. Specify NULL to retain original character.- Returns:
- (undocumented)
- Since:
- 3.5.0
-
size
Returns length of array or map.This function returns -1 for null input only if spark.sql.ansi.enabled is false and spark.sql.legacy.sizeOfNull is true. Otherwise, it returns null for null input. With the default settings, the function returns null for null input.
- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
cardinality
Returns length of array or map. This is an alias ofsize
function.This function returns -1 for null input only if spark.sql.ansi.enabled is false and spark.sql.legacy.sizeOfNull is true. Otherwise, it returns null for null input. With the default settings, the function returns null for null input.
- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
sort_array
Sorts the input array for the given column in ascending order, according to the natural ordering of the array elements. Null elements will be placed at the beginning of the returned array.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
sort_array
Sorts the input array for the given column in ascending or descending order, according to the natural ordering of the array elements. NaN is greater than any non-NaN elements for double/float type. Null elements will be placed at the beginning of the returned array in ascending order or at the end of the returned array in descending order.- Parameters:
e
- (undocumented)asc
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
array_min
Returns the minimum value in the array. NaN is greater than any non-NaN elements for double/float type. NULL elements are skipped.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.4.0
-
array_max
Returns the maximum value in the array. NaN is greater than any non-NaN elements for double/float type. NULL elements are skipped.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.4.0
-
array_size
Returns the total number of elements in the array. The function returns null for null input.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
array_agg
Aggregate function: returns a list of objects with duplicates.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
- Note:
- The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle.
-
shuffle
Returns a random permutation of the given array.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.4.0
- Note:
- The function is non-deterministic.
-
reverse
Returns a reversed string or an array with reverse order of elements.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
flatten
Creates a single array from an array of arrays. If a structure of nested arrays is deeper than two levels, only one level of nesting is removed.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.4.0
-
sequence
Generate a sequence of integers from start to stop, incrementing by step.- Parameters:
start
- (undocumented)stop
- (undocumented)step
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.4.0
-
sequence
Generate a sequence of integers from start to stop, incrementing by 1 if start is less than or equal to stop, otherwise -1.- Parameters:
start
- (undocumented)stop
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.4.0
-
array_repeat
Creates an array containing the left argument repeated the number of times given by the right argument.- Parameters:
left
- (undocumented)right
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.4.0
-
array_repeat
Creates an array containing the left argument repeated the number of times given by the right argument.- Parameters:
e
- (undocumented)count
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.4.0
-
map_contains_key
Returns true if the map contains the key.- Parameters:
column
- (undocumented)key
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.3.0
-
map_keys
Returns an unordered array containing the keys of the map.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.3.0
-
map_values
Returns an unordered array containing the values of the map.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.3.0
-
map_entries
Returns an unordered array of all entries in the given map.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.0.0
-
map_from_entries
Returns a map created from the given array of entries.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.4.0
-
arrays_zip
Returns a merged array of structs in which the N-th struct contains all N-th values of input arrays.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.4.0
-
map_concat
Returns the union of all the given maps.- Parameters:
cols
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.4.0
-
from_csv
public static Column from_csv(Column e, StructType schema, scala.collection.immutable.Map<String, String> options) Parses a column containing a CSV string into aStructType
with the specified schema. Returnsnull
, in the case of an unparseable string.- Parameters:
e
- a string column containing CSV data.schema
- the schema to use when parsing the CSV stringoptions
- options to control how the CSV is parsed. accepts the same options and the CSV data source. See Data Source Option in the version you use.- Returns:
- (undocumented)
- Since:
- 3.0.0
-
from_csv
(Java-specific) Parses a column containing a CSV string into aStructType
with the specified schema. Returnsnull
, in the case of an unparseable string.- Parameters:
e
- a string column containing CSV data.schema
- the schema to use when parsing the CSV stringoptions
- options to control how the CSV is parsed. accepts the same options and the CSV data source. See Data Source Option in the version you use.- Returns:
- (undocumented)
- Since:
- 3.0.0
-
schema_of_csv
Parses a CSV string and infers its schema in DDL format.- Parameters:
csv
- a CSV string.- Returns:
- (undocumented)
- Since:
- 3.0.0
-
schema_of_csv
Parses a CSV string and infers its schema in DDL format.- Parameters:
csv
- a foldable string column containing a CSV string.- Returns:
- (undocumented)
- Since:
- 3.0.0
-
schema_of_csv
Parses a CSV string and infers its schema in DDL format using options.- Parameters:
csv
- a foldable string column containing a CSV string.options
- options to control how the CSV is parsed. accepts the same options and the CSV data source. See Data Source Option in the version you use.- Returns:
- a column with string literal containing schema in DDL format.
- Since:
- 3.0.0
-
to_csv
(Java-specific) Converts a column containing aStructType
into a CSV string with the specified schema. Throws an exception, in the case of an unsupported type.- Parameters:
e
- a column containing a struct.options
- options to control how the struct column is converted into a CSV string. It accepts the same options and the CSV data source. See Data Source Option in the version you use.- Returns:
- (undocumented)
- Since:
- 3.0.0
-
to_csv
Converts a column containing aStructType
into a CSV string with the specified schema. Throws an exception, in the case of an unsupported type.- Parameters:
e
- a column containing a struct.- Returns:
- (undocumented)
- Since:
- 3.0.0
-
from_xml
Parses a column containing a XML string into the data type corresponding to the specified schema. Returnsnull
, in the case of an unparseable string.- Parameters:
e
- a string column containing XML data.schema
- the schema to use when parsing the XML stringoptions
- options to control how the XML is parsed. accepts the same options and the XML data source. See Data Source Option in the version you use.- Returns:
- (undocumented)
- Since:
- 4.0.0
-
from_xml
(Java-specific) Parses a column containing a XML string into aStructType
with the specified schema. Returnsnull
, in the case of an unparseable string.- Parameters:
e
- a string column containing XML data.schema
- the schema as a DDL-formatted string.options
- options to control how the XML is parsed. accepts the same options and the xml data source. See Data Source Option in the version you use.- Returns:
- (undocumented)
- Since:
- 4.0.0
-
from_xml
(Java-specific) Parses a column containing a XML string into aStructType
with the specified schema. Returnsnull
, in the case of an unparseable string.- Parameters:
e
- a string column containing XML data.schema
- the schema to use when parsing the XML string- Returns:
- (undocumented)
- Since:
- 4.0.0
-
from_xml
(Java-specific) Parses a column containing a XML string into aStructType
with the specified schema. Returnsnull
, in the case of an unparseable string.- Parameters:
e
- a string column containing XML data.schema
- the schema to use when parsing the XML stringoptions
- options to control how the XML is parsed. accepts the same options and the XML data source. See Data Source Option in the version you use.- Returns:
- (undocumented)
- Since:
- 4.0.0
-
from_xml
Parses a column containing a XML string into the data type corresponding to the specified schema. Returnsnull
, in the case of an unparseable string.- Parameters:
e
- a string column containing XML data.schema
- the schema to use when parsing the XML string- Returns:
- (undocumented)
- Since:
- 4.0.0
-
schema_of_xml
Parses a XML string and infers its schema in DDL format.- Parameters:
xml
- a XML string.- Returns:
- (undocumented)
- Since:
- 4.0.0
-
schema_of_xml
Parses a XML string and infers its schema in DDL format.- Parameters:
xml
- a foldable string column containing a XML string.- Returns:
- (undocumented)
- Since:
- 4.0.0
-
schema_of_xml
Parses a XML string and infers its schema in DDL format using options.- Parameters:
xml
- a foldable string column containing XML data.options
- options to control how the xml is parsed. accepts the same options and the XML data source. See Data Source Option in the version you use.- Returns:
- a column with string literal containing schema in DDL format.
- Since:
- 4.0.0
-
to_xml
(Java-specific) Converts a column containing aStructType
into a XML string with the specified schema. Throws an exception, in the case of an unsupported type.- Parameters:
e
- a column containing a struct.options
- options to control how the struct column is converted into a XML string. It accepts the same options as the XML data source. See Data Source Option in the version you use.- Returns:
- (undocumented)
- Since:
- 4.0.0
-
to_xml
Converts a column containing aStructType
into a XML string with the specified schema. Throws an exception, in the case of an unsupported type.- Parameters:
e
- a column containing a struct.- Returns:
- (undocumented)
- Since:
- 4.0.0
-
years
(Java-specific) A transform for timestamps and dates to partition data into years.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.0.0
-
months
(Java-specific) A transform for timestamps and dates to partition data into months.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.0.0
-
days
(Java-specific) A transform for timestamps and dates to partition data into days.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.0.0
-
xpath
Returns a string array of values within the nodes of xml that match the XPath expression.- Parameters:
xml
- (undocumented)path
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
xpath_boolean
Returns true if the XPath expression evaluates to true, or if a matching node is found.- Parameters:
xml
- (undocumented)path
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
xpath_double
Returns a double value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.- Parameters:
xml
- (undocumented)path
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
xpath_number
Returns a double value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.- Parameters:
xml
- (undocumented)path
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
xpath_float
Returns a float value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.- Parameters:
xml
- (undocumented)path
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
xpath_int
Returns an integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.- Parameters:
xml
- (undocumented)path
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
xpath_long
Returns a long integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.- Parameters:
xml
- (undocumented)path
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
xpath_short
Returns a short integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.- Parameters:
xml
- (undocumented)path
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
xpath_string
Returns the text contents of the first xml node that matches the XPath expression.- Parameters:
xml
- (undocumented)path
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
hours
(Java-specific) A transform for timestamps to partition data into hours.- Parameters:
e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.0.0
-
convert_timezone
Converts the timestamp without time zonesourceTs
from thesourceTz
time zone totargetTz
.- Parameters:
sourceTz
- the time zone for the input timestamp. If it is missed, the current session time zone is used as the source time zone.targetTz
- the time zone to which the input timestamp should be converted.sourceTs
- a timestamp without time zone.- Returns:
- (undocumented)
- Since:
- 3.5.0
-
convert_timezone
Converts the timestamp without time zonesourceTs
from the current time zone totargetTz
.- Parameters:
targetTz
- the time zone to which the input timestamp should be converted.sourceTs
- a timestamp without time zone.- Returns:
- (undocumented)
- Since:
- 3.5.0
-
make_dt_interval
Make DayTimeIntervalType duration from days, hours, mins and secs.- Parameters:
days
- (undocumented)hours
- (undocumented)mins
- (undocumented)secs
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
make_dt_interval
Make DayTimeIntervalType duration from days, hours and mins.- Parameters:
days
- (undocumented)hours
- (undocumented)mins
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
make_dt_interval
Make DayTimeIntervalType duration from days and hours.- Parameters:
days
- (undocumented)hours
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
make_dt_interval
Make DayTimeIntervalType duration from days.- Parameters:
days
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
make_dt_interval
Make DayTimeIntervalType duration.- Returns:
- (undocumented)
- Since:
- 3.5.0
-
make_interval
public static Column make_interval(Column years, Column months, Column weeks, Column days, Column hours, Column mins, Column secs) Make interval from years, months, weeks, days, hours, mins and secs.- Parameters:
years
- (undocumented)months
- (undocumented)weeks
- (undocumented)days
- (undocumented)hours
- (undocumented)mins
- (undocumented)secs
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
make_interval
public static Column make_interval(Column years, Column months, Column weeks, Column days, Column hours, Column mins) Make interval from years, months, weeks, days, hours and mins.- Parameters:
years
- (undocumented)months
- (undocumented)weeks
- (undocumented)days
- (undocumented)hours
- (undocumented)mins
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
make_interval
public static Column make_interval(Column years, Column months, Column weeks, Column days, Column hours) Make interval from years, months, weeks, days and hours.- Parameters:
years
- (undocumented)months
- (undocumented)weeks
- (undocumented)days
- (undocumented)hours
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
make_interval
Make interval from years, months, weeks and days.- Parameters:
years
- (undocumented)months
- (undocumented)weeks
- (undocumented)days
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
make_interval
Make interval from years, months and weeks.- Parameters:
years
- (undocumented)months
- (undocumented)weeks
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
make_interval
Make interval from years and months.- Parameters:
years
- (undocumented)months
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
make_interval
Make interval from years.- Parameters:
years
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
make_interval
Make interval.- Returns:
- (undocumented)
- Since:
- 3.5.0
-
make_timestamp
public static Column make_timestamp(Column years, Column months, Column days, Column hours, Column mins, Column secs, Column timezone) Create timestamp from years, months, days, hours, mins, secs and timezone fields. The result data type is consistent with the value of configurationspark.sql.timestampType
. If the configurationspark.sql.ansi.enabled
is false, the function returns NULL on invalid inputs. Otherwise, it will throw an error instead.- Parameters:
years
- (undocumented)months
- (undocumented)days
- (undocumented)hours
- (undocumented)mins
- (undocumented)secs
- (undocumented)timezone
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
make_timestamp
public static Column make_timestamp(Column years, Column months, Column days, Column hours, Column mins, Column secs) Create timestamp from years, months, days, hours, mins and secs fields. The result data type is consistent with the value of configurationspark.sql.timestampType
. If the configurationspark.sql.ansi.enabled
is false, the function returns NULL on invalid inputs. Otherwise, it will throw an error instead.- Parameters:
years
- (undocumented)months
- (undocumented)days
- (undocumented)hours
- (undocumented)mins
- (undocumented)secs
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
make_timestamp_ltz
public static Column make_timestamp_ltz(Column years, Column months, Column days, Column hours, Column mins, Column secs, Column timezone) Create the current timestamp with local time zone from years, months, days, hours, mins, secs and timezone fields. If the configurationspark.sql.ansi.enabled
is false, the function returns NULL on invalid inputs. Otherwise, it will throw an error instead.- Parameters:
years
- (undocumented)months
- (undocumented)days
- (undocumented)hours
- (undocumented)mins
- (undocumented)secs
- (undocumented)timezone
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
make_timestamp_ltz
public static Column make_timestamp_ltz(Column years, Column months, Column days, Column hours, Column mins, Column secs) Create the current timestamp with local time zone from years, months, days, hours, mins and secs fields. If the configurationspark.sql.ansi.enabled
is false, the function returns NULL on invalid inputs. Otherwise, it will throw an error instead.- Parameters:
years
- (undocumented)months
- (undocumented)days
- (undocumented)hours
- (undocumented)mins
- (undocumented)secs
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
make_timestamp_ntz
public static Column make_timestamp_ntz(Column years, Column months, Column days, Column hours, Column mins, Column secs) Create local date-time from years, months, days, hours, mins, secs fields. If the configurationspark.sql.ansi.enabled
is false, the function returns NULL on invalid inputs. Otherwise, it will throw an error instead.- Parameters:
years
- (undocumented)months
- (undocumented)days
- (undocumented)hours
- (undocumented)mins
- (undocumented)secs
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
make_ym_interval
Make year-month interval from years, months.- Parameters:
years
- (undocumented)months
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
make_ym_interval
Make year-month interval from years.- Parameters:
years
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
make_ym_interval
Make year-month interval.- Returns:
- (undocumented)
- Since:
- 3.5.0
-
bucket
(Java-specific) A transform for any type that partitions by a hash of the input column.- Parameters:
numBuckets
- (undocumented)e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.0.0
-
bucket
(Java-specific) A transform for any type that partitions by a hash of the input column.- Parameters:
numBuckets
- (undocumented)e
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.0.0
-
ifnull
Returnscol2
ifcol1
is null, orcol1
otherwise.- Parameters:
col1
- (undocumented)col2
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
isnotnull
Returns true ifcol
is not null, or false otherwise.- Parameters:
col
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
equal_null
Returns same result as the EQUAL(=) operator for non-null operands, but returns true if both are null, false if one of the them is null.- Parameters:
col1
- (undocumented)col2
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
nullif
Returns null ifcol1
equals tocol2
, orcol1
otherwise.- Parameters:
col1
- (undocumented)col2
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
nullifzero
Returns null ifcol
is equal to zero, orcol
otherwise.- Parameters:
col
- (undocumented)- Returns:
- (undocumented)
- Since:
- 4.0.0
-
nvl
Returnscol2
ifcol1
is null, orcol1
otherwise.- Parameters:
col1
- (undocumented)col2
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
nvl2
Returnscol2
ifcol1
is not null, orcol3
otherwise.- Parameters:
col1
- (undocumented)col2
- (undocumented)col3
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.5.0
-
zeroifnull
Returns zero ifcol
is null, orcol
otherwise.- Parameters:
col
- (undocumented)- Returns:
- (undocumented)
- Since:
- 4.0.0
-
udaf
public static <IN,BUF, UserDefinedFunction udafOUT> (Aggregator<IN, BUF, OUT> agg, scala.reflect.api.TypeTags.TypeTag<IN> evidence$3) Obtains aUserDefinedFunction
that wraps the givenAggregator
so that it may be used with untyped Data Frames.val agg = // Aggregator[IN, BUF, OUT] // declare a UDF based on agg val aggUDF = udaf(agg) val aggData = df.agg(aggUDF($"colname")) // register agg as a named function spark.udf.register("myAggName", udaf(agg))
- Parameters:
agg
- the typed Aggregatorevidence$3
- (undocumented)- Returns:
- a UserDefinedFunction that can be used as an aggregating expression.
- Note:
- The input encoder is inferred from the input type IN.
-
udaf
public static <IN,BUF, UserDefinedFunction udafOUT> (Aggregator<IN, BUF, OUT> agg, Encoder<IN> inputEncoder) Obtains aUserDefinedFunction
that wraps the givenAggregator
so that it may be used with untyped Data Frames.Aggregator<IN, BUF, OUT> agg = // custom Aggregator Encoder<IN> enc = // input encoder // declare a UDF based on agg UserDefinedFunction aggUDF = udaf(agg, enc) DataFrame aggData = df.agg(aggUDF($"colname")) // register agg as a named function spark.udf.register("myAggName", udaf(agg, enc))
- Parameters:
agg
- the typed AggregatorinputEncoder
- a specific input encoder to use- Returns:
- a UserDefinedFunction that can be used as an aggregating expression
- Note:
- This overloading takes an explicit input encoder, to support UDAF declarations in Java.
-
udf
public static <RT> UserDefinedFunction udf(scala.Function0<RT> f, scala.reflect.api.TypeTags.TypeTag<RT> evidence$4) Defines a Scala closure of 0 arguments as user-defined function (UDF). The data types are automatically inferred based on the Scala closure's signature. By default the returned UDF is deterministic. To change it to nondeterministic, call the APIUserDefinedFunction.asNondeterministic()
.- Parameters:
f
- (undocumented)evidence$4
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
udf
public static <RT,A1> UserDefinedFunction udf(scala.Function1<A1, RT> f, scala.reflect.api.TypeTags.TypeTag<RT> evidence$5, scala.reflect.api.TypeTags.TypeTag<A1> evidence$6) Defines a Scala closure of 1 arguments as user-defined function (UDF). The data types are automatically inferred based on the Scala closure's signature. By default the returned UDF is deterministic. To change it to nondeterministic, call the APIUserDefinedFunction.asNondeterministic()
.- Parameters:
f
- (undocumented)evidence$5
- (undocumented)evidence$6
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
udf
public static <RT,A1, UserDefinedFunction udfA2> (scala.Function2<A1, A2, RT> f, scala.reflect.api.TypeTags.TypeTag<RT> evidence$7, scala.reflect.api.TypeTags.TypeTag<A1> evidence$8, scala.reflect.api.TypeTags.TypeTag<A2> evidence$9) Defines a Scala closure of 2 arguments as user-defined function (UDF). The data types are automatically inferred based on the Scala closure's signature. By default the returned UDF is deterministic. To change it to nondeterministic, call the APIUserDefinedFunction.asNondeterministic()
.- Parameters:
f
- (undocumented)evidence$7
- (undocumented)evidence$8
- (undocumented)evidence$9
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
udf
public static <RT,A1, UserDefinedFunction udfA2, A3> (scala.Function3<A1, A2, A3, RT> f, scala.reflect.api.TypeTags.TypeTag<RT> evidence$10, scala.reflect.api.TypeTags.TypeTag<A1> evidence$11, scala.reflect.api.TypeTags.TypeTag<A2> evidence$12, scala.reflect.api.TypeTags.TypeTag<A3> evidence$13) Defines a Scala closure of 3 arguments as user-defined function (UDF). The data types are automatically inferred based on the Scala closure's signature. By default the returned UDF is deterministic. To change it to nondeterministic, call the APIUserDefinedFunction.asNondeterministic()
.- Parameters:
f
- (undocumented)evidence$10
- (undocumented)evidence$11
- (undocumented)evidence$12
- (undocumented)evidence$13
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
udf
public static <RT,A1, UserDefinedFunction udfA2, A3, A4> (scala.Function4<A1, A2, A3, A4, RT> f, scala.reflect.api.TypeTags.TypeTag<RT> evidence$14, scala.reflect.api.TypeTags.TypeTag<A1> evidence$15, scala.reflect.api.TypeTags.TypeTag<A2> evidence$16, scala.reflect.api.TypeTags.TypeTag<A3> evidence$17, scala.reflect.api.TypeTags.TypeTag<A4> evidence$18) Defines a Scala closure of 4 arguments as user-defined function (UDF). The data types are automatically inferred based on the Scala closure's signature. By default the returned UDF is deterministic. To change it to nondeterministic, call the APIUserDefinedFunction.asNondeterministic()
.- Parameters:
f
- (undocumented)evidence$14
- (undocumented)evidence$15
- (undocumented)evidence$16
- (undocumented)evidence$17
- (undocumented)evidence$18
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
udf
public static <RT,A1, UserDefinedFunction udfA2, A3, A4, A5> (scala.Function5<A1, A2, A3, A4, A5, RT> f, scala.reflect.api.TypeTags.TypeTag<RT> evidence$19, scala.reflect.api.TypeTags.TypeTag<A1> evidence$20, scala.reflect.api.TypeTags.TypeTag<A2> evidence$21, scala.reflect.api.TypeTags.TypeTag<A3> evidence$22, scala.reflect.api.TypeTags.TypeTag<A4> evidence$23, scala.reflect.api.TypeTags.TypeTag<A5> evidence$24) Defines a Scala closure of 5 arguments as user-defined function (UDF). The data types are automatically inferred based on the Scala closure's signature. By default the returned UDF is deterministic. To change it to nondeterministic, call the APIUserDefinedFunction.asNondeterministic()
.- Parameters:
f
- (undocumented)evidence$19
- (undocumented)evidence$20
- (undocumented)evidence$21
- (undocumented)evidence$22
- (undocumented)evidence$23
- (undocumented)evidence$24
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
udf
public static <RT,A1, UserDefinedFunction udfA2, A3, A4, A5, A6> (scala.Function6<A1, A2, A3, A4, A5, A6, RT> f, scala.reflect.api.TypeTags.TypeTag<RT> evidence$25, scala.reflect.api.TypeTags.TypeTag<A1> evidence$26, scala.reflect.api.TypeTags.TypeTag<A2> evidence$27, scala.reflect.api.TypeTags.TypeTag<A3> evidence$28, scala.reflect.api.TypeTags.TypeTag<A4> evidence$29, scala.reflect.api.TypeTags.TypeTag<A5> evidence$30, scala.reflect.api.TypeTags.TypeTag<A6> evidence$31) Defines a Scala closure of 6 arguments as user-defined function (UDF). The data types are automatically inferred based on the Scala closure's signature. By default the returned UDF is deterministic. To change it to nondeterministic, call the APIUserDefinedFunction.asNondeterministic()
.- Parameters:
f
- (undocumented)evidence$25
- (undocumented)evidence$26
- (undocumented)evidence$27
- (undocumented)evidence$28
- (undocumented)evidence$29
- (undocumented)evidence$30
- (undocumented)evidence$31
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
udf
public static <RT,A1, UserDefinedFunction udfA2, A3, A4, A5, A6, A7> (scala.Function7<A1, A2, A3, A4, A5, A6, A7, RT> f, scala.reflect.api.TypeTags.TypeTag<RT> evidence$32, scala.reflect.api.TypeTags.TypeTag<A1> evidence$33, scala.reflect.api.TypeTags.TypeTag<A2> evidence$34, scala.reflect.api.TypeTags.TypeTag<A3> evidence$35, scala.reflect.api.TypeTags.TypeTag<A4> evidence$36, scala.reflect.api.TypeTags.TypeTag<A5> evidence$37, scala.reflect.api.TypeTags.TypeTag<A6> evidence$38, scala.reflect.api.TypeTags.TypeTag<A7> evidence$39) Defines a Scala closure of 7 arguments as user-defined function (UDF). The data types are automatically inferred based on the Scala closure's signature. By default the returned UDF is deterministic. To change it to nondeterministic, call the APIUserDefinedFunction.asNondeterministic()
.- Parameters:
f
- (undocumented)evidence$32
- (undocumented)evidence$33
- (undocumented)evidence$34
- (undocumented)evidence$35
- (undocumented)evidence$36
- (undocumented)evidence$37
- (undocumented)evidence$38
- (undocumented)evidence$39
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
udf
public static <RT,A1, UserDefinedFunction udfA2, A3, A4, A5, A6, A7, A8> (scala.Function8<A1, A2, A3, A4, A5, A6, A7, A8, RT> f, scala.reflect.api.TypeTags.TypeTag<RT> evidence$40, scala.reflect.api.TypeTags.TypeTag<A1> evidence$41, scala.reflect.api.TypeTags.TypeTag<A2> evidence$42, scala.reflect.api.TypeTags.TypeTag<A3> evidence$43, scala.reflect.api.TypeTags.TypeTag<A4> evidence$44, scala.reflect.api.TypeTags.TypeTag<A5> evidence$45, scala.reflect.api.TypeTags.TypeTag<A6> evidence$46, scala.reflect.api.TypeTags.TypeTag<A7> evidence$47, scala.reflect.api.TypeTags.TypeTag<A8> evidence$48) Defines a Scala closure of 8 arguments as user-defined function (UDF). The data types are automatically inferred based on the Scala closure's signature. By default the returned UDF is deterministic. To change it to nondeterministic, call the APIUserDefinedFunction.asNondeterministic()
.- Parameters:
f
- (undocumented)evidence$40
- (undocumented)evidence$41
- (undocumented)evidence$42
- (undocumented)evidence$43
- (undocumented)evidence$44
- (undocumented)evidence$45
- (undocumented)evidence$46
- (undocumented)evidence$47
- (undocumented)evidence$48
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
udf
public static <RT,A1, UserDefinedFunction udfA2, A3, A4, A5, A6, A7, A8, A9> (scala.Function9<A1, A2, A3, A4, A5, A6, A7, A8, A9, RT> f, scala.reflect.api.TypeTags.TypeTag<RT> evidence$49, scala.reflect.api.TypeTags.TypeTag<A1> evidence$50, scala.reflect.api.TypeTags.TypeTag<A2> evidence$51, scala.reflect.api.TypeTags.TypeTag<A3> evidence$52, scala.reflect.api.TypeTags.TypeTag<A4> evidence$53, scala.reflect.api.TypeTags.TypeTag<A5> evidence$54, scala.reflect.api.TypeTags.TypeTag<A6> evidence$55, scala.reflect.api.TypeTags.TypeTag<A7> evidence$56, scala.reflect.api.TypeTags.TypeTag<A8> evidence$57, scala.reflect.api.TypeTags.TypeTag<A9> evidence$58) Defines a Scala closure of 9 arguments as user-defined function (UDF). The data types are automatically inferred based on the Scala closure's signature. By default the returned UDF is deterministic. To change it to nondeterministic, call the APIUserDefinedFunction.asNondeterministic()
.- Parameters:
f
- (undocumented)evidence$49
- (undocumented)evidence$50
- (undocumented)evidence$51
- (undocumented)evidence$52
- (undocumented)evidence$53
- (undocumented)evidence$54
- (undocumented)evidence$55
- (undocumented)evidence$56
- (undocumented)evidence$57
- (undocumented)evidence$58
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
udf
public static <RT,A1, UserDefinedFunction udfA2, A3, A4, A5, A6, A7, A8, A9, A10> (scala.Function10<A1, A2, A3, A4, A5, A6, A7, A8, A9, A10, RT> f, scala.reflect.api.TypeTags.TypeTag<RT> evidence$59, scala.reflect.api.TypeTags.TypeTag<A1> evidence$60, scala.reflect.api.TypeTags.TypeTag<A2> evidence$61, scala.reflect.api.TypeTags.TypeTag<A3> evidence$62, scala.reflect.api.TypeTags.TypeTag<A4> evidence$63, scala.reflect.api.TypeTags.TypeTag<A5> evidence$64, scala.reflect.api.TypeTags.TypeTag<A6> evidence$65, scala.reflect.api.TypeTags.TypeTag<A7> evidence$66, scala.reflect.api.TypeTags.TypeTag<A8> evidence$67, scala.reflect.api.TypeTags.TypeTag<A9> evidence$68, scala.reflect.api.TypeTags.TypeTag<A10> evidence$69) Defines a Scala closure of 10 arguments as user-defined function (UDF). The data types are automatically inferred based on the Scala closure's signature. By default the returned UDF is deterministic. To change it to nondeterministic, call the APIUserDefinedFunction.asNondeterministic()
.- Parameters:
f
- (undocumented)evidence$59
- (undocumented)evidence$60
- (undocumented)evidence$61
- (undocumented)evidence$62
- (undocumented)evidence$63
- (undocumented)evidence$64
- (undocumented)evidence$65
- (undocumented)evidence$66
- (undocumented)evidence$67
- (undocumented)evidence$68
- (undocumented)evidence$69
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.3.0
-
udf
Defines a Java UDF0 instance as user-defined function (UDF). The caller must specify the output data type, and there is no automatic input type coercion. By default the returned UDF is deterministic. To change it to nondeterministic, call the APIUserDefinedFunction.asNondeterministic()
.- Parameters:
f
- (undocumented)returnType
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.3.0
-
udf
Defines a Java UDF1 instance as user-defined function (UDF). The caller must specify the output data type, and there is no automatic input type coercion. By default the returned UDF is deterministic. To change it to nondeterministic, call the APIUserDefinedFunction.asNondeterministic()
.- Parameters:
f
- (undocumented)returnType
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.3.0
-
udf
Defines a Java UDF2 instance as user-defined function (UDF). The caller must specify the output data type, and there is no automatic input type coercion. By default the returned UDF is deterministic. To change it to nondeterministic, call the APIUserDefinedFunction.asNondeterministic()
.- Parameters:
f
- (undocumented)returnType
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.3.0
-
udf
Defines a Java UDF3 instance as user-defined function (UDF). The caller must specify the output data type, and there is no automatic input type coercion. By default the returned UDF is deterministic. To change it to nondeterministic, call the APIUserDefinedFunction.asNondeterministic()
.- Parameters:
f
- (undocumented)returnType
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.3.0
-
udf
Defines a Java UDF4 instance as user-defined function (UDF). The caller must specify the output data type, and there is no automatic input type coercion. By default the returned UDF is deterministic. To change it to nondeterministic, call the APIUserDefinedFunction.asNondeterministic()
.- Parameters:
f
- (undocumented)returnType
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.3.0
-
udf
Defines a Java UDF5 instance as user-defined function (UDF). The caller must specify the output data type, and there is no automatic input type coercion. By default the returned UDF is deterministic. To change it to nondeterministic, call the APIUserDefinedFunction.asNondeterministic()
.- Parameters:
f
- (undocumented)returnType
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.3.0
-
udf
Defines a Java UDF6 instance as user-defined function (UDF). The caller must specify the output data type, and there is no automatic input type coercion. By default the returned UDF is deterministic. To change it to nondeterministic, call the APIUserDefinedFunction.asNondeterministic()
.- Parameters:
f
- (undocumented)returnType
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.3.0
-
udf
Defines a Java UDF7 instance as user-defined function (UDF). The caller must specify the output data type, and there is no automatic input type coercion. By default the returned UDF is deterministic. To change it to nondeterministic, call the APIUserDefinedFunction.asNondeterministic()
.- Parameters:
f
- (undocumented)returnType
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.3.0
-
udf
Defines a Java UDF8 instance as user-defined function (UDF). The caller must specify the output data type, and there is no automatic input type coercion. By default the returned UDF is deterministic. To change it to nondeterministic, call the APIUserDefinedFunction.asNondeterministic()
.- Parameters:
f
- (undocumented)returnType
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.3.0
-
udf
Defines a Java UDF9 instance as user-defined function (UDF). The caller must specify the output data type, and there is no automatic input type coercion. By default the returned UDF is deterministic. To change it to nondeterministic, call the APIUserDefinedFunction.asNondeterministic()
.- Parameters:
f
- (undocumented)returnType
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.3.0
-
udf
Defines a Java UDF10 instance as user-defined function (UDF). The caller must specify the output data type, and there is no automatic input type coercion. By default the returned UDF is deterministic. To change it to nondeterministic, call the APIUserDefinedFunction.asNondeterministic()
.- Parameters:
f
- (undocumented)returnType
- (undocumented)- Returns:
- (undocumented)
- Since:
- 2.3.0
-
udf
Deprecated.Scala `udf` method with return type parameter is deprecated. Please use Scala `udf` method without return type parameter. Since 3.0.0.Defines a deterministic user-defined function (UDF) using a Scala closure. For this variant, the caller must specify the output data type, and there is no automatic input type coercion. By default the returned UDF is deterministic. To change it to nondeterministic, call the APIUserDefinedFunction.asNondeterministic()
.Note that, although the Scala closure can have primitive-type function argument, it doesn't work well with null values. Because the Scala closure is passed in as Any type, there is no type information for the function arguments. Without the type information, Spark may blindly pass null to the Scala closure with primitive-type argument, and the closure will see the default value of the Java type for the null argument, e.g.
udf((x: Int) => x, IntegerType)
, the result is 0 for null input.- Parameters:
f
- A closure in ScaladataType
- The output data type of the UDF- Returns:
- (undocumented)
- Since:
- 2.0.0
-
callUDF
Deprecated.Use call_udf.Call an user-defined function.- Parameters:
udfName
- (undocumented)cols
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.5.0
-
call_udf
Call an user-defined function. Example:import org.apache.spark.sql._ val df = Seq(("id1", 1), ("id2", 4), ("id3", 5)).toDF("id", "value") val spark = df.sparkSession spark.udf.register("simpleUDF", (v: Int) => v * v) df.select($"id", call_udf("simpleUDF", $"value"))
- Parameters:
udfName
- (undocumented)cols
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.2.0
-
call_function
Call a SQL function.- Parameters:
funcName
- function name that follows the SQL identifier syntax (can be quoted, can be qualified)cols
- the expression parameters of function- Returns:
- (undocumented)
- Since:
- 3.5.0
-
unwrap_udt
Unwrap UDT data type column into its underlying type.- Parameters:
column
- (undocumented)- Returns:
- (undocumented)
- Since:
- 3.4.0
-