approxQuantile {SparkR} R Documentation

## Calculates the approximate quantiles of a numerical column of a SparkDataFrame

### Description

Calculates the approximate quantiles of a numerical column of a SparkDataFrame. The result of this algorithm has the following deterministic bound: If the SparkDataFrame has N elements and if we request the quantile at probability 'p' up to error 'err', then the algorithm will return a sample 'x' from the SparkDataFrame so that the *exact* rank of 'x' is close to (p * N). More precisely, floor((p - err) * N) <= rank(x) <= ceil((p + err) * N). This method implements a variation of the Greenwald-Khanna algorithm (with some speed optimizations). The algorithm was first present in [[http://dx.doi.org/10.1145/375663.375670 Space-efficient Online Computation of Quantile Summaries]] by Greenwald and Khanna.

### Usage

```## S4 method for signature 'SparkDataFrame,character,numeric,numeric'
approxQuantile(x, col,
probabilities, relativeError)
```

### Arguments

 `x` A SparkDataFrame. `col` The name of the numerical column. `probabilities` A list of quantile probabilities. Each number must belong to [0, 1]. For example 0 is the minimum, 0.5 is the median, 1 is the maximum. `relativeError` The relative target precision to achieve (>= 0). If set to zero, the exact quantiles are computed, which could be very expensive. Note that values greater than 1 are accepted but give the same result as 1.

### Value

The approximate quantiles at the given probabilities.

### Note

approxQuantile since 2.0.0

Other stat functions: `corr`, `corr`, `corr`, `corr,Column-method`, `corr,SparkDataFrame-method`; `cov`, `cov`, `cov`, `cov,SparkDataFrame-method`, `cov,characterOrColumn-method`, `covar_samp`, `covar_samp`, `covar_samp,characterOrColumn,characterOrColumn-method`; `crosstab`, `crosstab,SparkDataFrame,character,character-method`; `freqItems`, `freqItems,SparkDataFrame,character-method`; `sampleBy`, `sampleBy`, `sampleBy,SparkDataFrame,character,list,numeric-method`
``````## Not run: