pyspark.sql.functions.hll_sketch_estimate

pyspark.sql.functions.hll_sketch_estimate(col: ColumnOrName) → pyspark.sql.column.Column[source]

Returns the estimated number of unique values given the binary representation of a Datasketches HllSketch.

New in version 3.5.0.

Parameters
colColumn or str
Returns
Column

The estimated number of unique values for the HllSketch.

Examples

>>> df = spark.createDataFrame([1,2,2,3], "INT")
>>> df = df.agg(hll_sketch_estimate(hll_sketch_agg("value")).alias("distinct_cnt"))
>>> df.show()
+------------+
|distinct_cnt|
+------------+
|           3|
+------------+