pyspark.sql.functions.tuple_difference_theta_double#
- pyspark.sql.functions.tuple_difference_theta_double(col1, col2)[source]#
Subtracts a Datasketches ThetaSketch from a TupleSketch with double summaries (elements in TupleSketch but not in ThetaSketch).
New in version 4.2.0.
- Parameters
- Returns
ColumnThe binary representation of the difference TupleSketch.
See also
Examples
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([(5, 5.0, 4), (1, 1.0, 4), (2, 2.0, 5), (3, 3.0, 1)], ["key1", "v1", "key2"]) # noqa >>> df = df.agg( ... sf.tuple_sketch_agg_double("key1", "v1").alias("sketch1"), ... sf.theta_sketch_agg("key2").alias("sketch2") ... ) >>> df.select(sf.tuple_sketch_estimate_double(sf.tuple_difference_theta_double(df.sketch1, "sketch2"))).show() # noqa +-----------------------------------------------------------------------------+ |tuple_sketch_estimate_double(tuple_difference_theta_double(sketch1, sketch2))| +-----------------------------------------------------------------------------+ | 2.0| +-----------------------------------------------------------------------------+