freqItems {SparkR}R Documentation

Finding frequent items for columns, possibly with false positives

Description

Finding frequent items for columns, possibly with false positives. Using the frequent element count algorithm described in http://dx.doi.org/10.1145/762471.762473, proposed by Karp, Schenker, and Papadimitriou.

Usage

## S4 method for signature 'SparkDataFrame,character'
freqItems(x, cols, support = 0.01)

Arguments

x

A SparkDataFrame.

cols

A vector column names to search frequent items in.

support

(Optional) The minimum frequency for an item to be considered frequent. Should be greater than 1e-4. Default support = 0.01.

Value

a local R data.frame with the frequent items in each column

Note

freqItems since 1.6.0

See Also

Other stat functions: approxQuantile, approxQuantile,SparkDataFrame,character,numeric,numeric-method; corr, corr, corr, corr,Column-method, corr,SparkDataFrame-method; cov, cov, cov, cov,SparkDataFrame-method, cov,characterOrColumn-method, covar_samp, covar_samp, covar_samp,characterOrColumn,characterOrColumn-method; crosstab, crosstab,SparkDataFrame,character,character-method; sampleBy, sampleBy, sampleBy,SparkDataFrame,character,list,numeric-method

Examples

## Not run: 
##D df <- read.json("/path/to/file.json")
##D fi = freqItems(df, c("title", "gender"))
## End(Not run)

[Package SparkR version 2.1.0 Index]