pyspark.sql.functions.tuple_intersection_agg_integer#
- pyspark.sql.functions.tuple_intersection_agg_integer(col, mode=None)[source]#
Aggregate function: returns the compact binary representation of the Datasketches TupleSketch that is the intersection of the integer TupleSketch objects in the input column.
New in version 4.2.0.
- Parameters
- Returns
ColumnThe binary representation of the intersected TupleSketch.
See also
Examples
>>> from pyspark.sql import functions as sf >>> df1 = spark.createDataFrame([(1, 10), (2, 20), (3, 30)], ["key", "value"]) >>> df1 = df1.agg(sf.tuple_sketch_agg_integer("key", "value").alias("sketch")) >>> df2 = spark.createDataFrame([(2, 40), (3, 50), (4, 60)], ["key", "value"]) >>> df2 = df2.agg(sf.tuple_sketch_agg_integer("key", "value").alias("sketch")) >>> df3 = df1.union(df2) >>> df3.agg(sf.tuple_sketch_estimate_integer(sf.tuple_intersection_agg_integer("sketch"))).show() +--------------------------------------------------------------------------+ |tuple_sketch_estimate_integer(tuple_intersection_agg_integer(sketch, sum))| +--------------------------------------------------------------------------+ | 2.0| +--------------------------------------------------------------------------+