pyspark.sql.functions.tuple_difference_theta_double#

pyspark.sql.functions.tuple_difference_theta_double(col1, col2)[source]#

Subtracts a Datasketches ThetaSketch from a TupleSketch with double summaries (elements in TupleSketch but not in ThetaSketch).

New in version 4.2.0.

Parameters
col1Column or column name

The TupleSketch column with double summaries

col2Column or column name

The ThetaSketch column

Returns
Column

The binary representation of the difference TupleSketch.

Examples

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([(5, 5.0, 4), (1, 1.0, 4), (2, 2.0, 5), (3, 3.0, 1)], ["key1", "v1", "key2"])  # noqa
>>> df = df.agg(
...     sf.tuple_sketch_agg_double("key1", "v1").alias("sketch1"),
...     sf.theta_sketch_agg("key2").alias("sketch2")
... )
>>> df.select(sf.tuple_sketch_estimate_double(sf.tuple_difference_theta_double(df.sketch1, "sketch2"))).show()  # noqa
+-----------------------------------------------------------------------------+
|tuple_sketch_estimate_double(tuple_difference_theta_double(sketch1, sketch2))|
+-----------------------------------------------------------------------------+
|                                                                          2.0|
+-----------------------------------------------------------------------------+