pyspark.sql.functions.theta_intersection_agg#
- pyspark.sql.functions.theta_intersection_agg(col)[source]#
- Aggregate function: returns the compact binary representation of the Datasketches ThetaSketch that is the intersection of the Theta sketches in the input column - New in version 4.1.0. - Parameters
- colColumnor column name
 
- col
- Returns
- Column
- The binary representation of the intersected ThetaSketch. 
 
 - See also - Examples - >>> from pyspark.sql import functions as sf >>> df1 = spark.createDataFrame([1,2,2,3], "INT") >>> df1 = df1.agg(sf.theta_sketch_agg("value").alias("sketch")) >>> df2 = spark.createDataFrame([2,3,3,4], "INT") >>> df2 = df2.agg(sf.theta_sketch_agg("value").alias("sketch")) >>> df3 = df1.union(df2) >>> df3.agg(sf.theta_sketch_estimate(sf.theta_intersection_agg("sketch"))).show() +-----------------------------------------------------+ |theta_sketch_estimate(theta_intersection_agg(sketch))| +-----------------------------------------------------+ | 2| +-----------------------------------------------------+