pyspark.sql.functions.xxhash64

pyspark.sql.functions.xxhash64(*cols: ColumnOrName) → pyspark.sql.column.Column[source]

Calculates the hash code of given columns using the 64-bit variant of the xxHash algorithm, and returns the result as a long column. The hash computation uses an initial seed of 42.

New in version 3.0.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
colsColumn or str

one or more columns to compute on.

Returns
Column

hash value as long column.

Examples

>>> df = spark.createDataFrame([('ABC', 'DEF')], ['c1', 'c2'])

Hash for one column

>>> df.select(xxhash64('c1').alias('hash')).show()
+-------------------+
|               hash|
+-------------------+
|4105715581806190027|
+-------------------+

Two or more columns

>>> df.select(xxhash64('c1', 'c2').alias('hash')).show()
+-------------------+
|               hash|
+-------------------+
|3233247871021311208|
+-------------------+