pyspark.sql.functions.theta_intersection_agg#

pyspark.sql.functions.theta_intersection_agg(col)[source]#

Aggregate function: returns the compact binary representation of the Datasketches ThetaSketch that is the intersection of the Theta sketches in the input column

New in version 4.1.0.

Parameters
colColumn or column name
Returns
Column

The binary representation of the intersected ThetaSketch.

Examples

>>> from pyspark.sql import functions as sf
>>> df1 = spark.createDataFrame([1,2,2,3], "INT")
>>> df1 = df1.agg(sf.theta_sketch_agg("value").alias("sketch"))
>>> df2 = spark.createDataFrame([2,3,3,4], "INT")
>>> df2 = df2.agg(sf.theta_sketch_agg("value").alias("sketch"))
>>> df3 = df1.union(df2)
>>> df3.agg(sf.theta_sketch_estimate(sf.theta_intersection_agg("sketch"))).show()
+-----------------------------------------------------+
|theta_sketch_estimate(theta_intersection_agg(sketch))|
+-----------------------------------------------------+
|                                                    2|
+-----------------------------------------------------+