pyspark.sql.functions.theta_difference#
- pyspark.sql.functions.theta_difference(col1, col2)[source]#
- Returns the set difference of two binary representations of Datasketches ThetaSketch objects (elements in first sketch but not in second), using a Datasketches ANotB object. - New in version 4.1.0. - Parameters
- Returns
- Column
- The binary representation of the difference ThetaSketch. 
 
 - See also - Examples - >>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([(1,4),(2,4),(3,5),(4,5)], "struct<v1:int,v2:int>") >>> df = df.agg( ... sf.theta_sketch_agg("v1").alias("sketch1"), ... sf.theta_sketch_agg("v2").alias("sketch2") ... ) >>> df.select(sf.theta_sketch_estimate(sf.theta_difference(df.sketch1, "sketch2"))).show() +---------------------------------------------------------+ |theta_sketch_estimate(theta_difference(sketch1, sketch2))| +---------------------------------------------------------+ | 3| +---------------------------------------------------------+