Skip to contents

This function computes a histogram for a given SparkR Column.

Usage

# S4 method for SparkDataFrame,characterOrColumn
histogram(df, col, nbins = 10)

Arguments

df

the SparkDataFrame containing the Column to build the histogram from.

col

the column as Character string or a Column to build the histogram from.

nbins

the number of bins (optional). Default value is 10.

Value

a data.frame with the histogram statistics, i.e., counts and centroids.

Note

histogram since 2.0.0

Examples

if (FALSE) {

# Create a SparkDataFrame from the Iris dataset
irisDF <- createDataFrame(iris)

# Compute histogram statistics
histStats <- histogram(irisDF, irisDF$Sepal_Length, nbins = 12)

# Once SparkR has computed the histogram statistics, the histogram can be
# rendered using the ggplot2 library:

require(ggplot2)
plot <- ggplot(histStats, aes(x = centroids, y = counts)) +
        geom_bar(stat = "identity") +
        xlab("Sepal_Length") + ylab("Frequency")
}