pyspark.sql.DataFrameStatFunctions.freqItems

DataFrameStatFunctions.freqItems(cols, support=None)[source]

Finding frequent items for columns, possibly with false positives. Using the frequent element count algorithm described in “https://doi.org/10.1145/762471.762473, proposed by Karp, Schenker, and Papadimitriou”. DataFrame.freqItems() and DataFrameStatFunctions.freqItems() are aliases.

New in version 1.4.0.

Parameters
colslist or tuple

Names of the columns to calculate frequent items for as a list or tuple of strings.

supportfloat, optional

The frequency with which to consider an item ‘frequent’. Default is 1%. The support must be greater than 1e-4.

Notes

This function is meant for exploratory data analysis, as we make no guarantee about the backward compatibility of the schema of the resulting DataFrame.