histogram {SparkR}R Documentation

Histogram

Description

This function computes a histogram for a given SparkR Column.

Usage

## S4 method for signature 'SparkDataFrame,characterOrColumn'
histogram(df, col, nbins = 10)

Arguments

df

the SparkDataFrame containing the Column to build the histogram from.

nbins

the number of bins (optional). Default value is 10.

colname

the name of the column to build the histogram from.

Value

a data.frame with the histogram statistics, i.e., counts and centroids.

See Also

Other SparkDataFrame functions: SparkDataFrame-class, [[, agg, arrange, as.data.frame, attach, cache, collect, colnames, coltypes, columns, count, dapply, describe, dim, distinct, dropDuplicates, dropna, drop, dtypes, except, explain, filter, first, group_by, head, insertInto, intersect, isLocal, join, limit, merge, mutate, ncol, persist, printSchema, registerTempTable, rename, repartition, sample, saveAsTable, selectExpr, select, showDF, show, str, take, unionAll, unpersist, withColumn, write.df, write.jdbc, write.json, write.parquet, write.text

Examples

## Not run: 
##D 
##D # Create a SparkDataFrame from the Iris dataset
##D irisDF <- createDataFrame(sqlContext, iris)
##D 
##D # Compute histogram statistics
##D histStats <- histogram(irisDF, irisDF$Sepal_Length, nbins = 12)
##D 
##D # Once SparkR has computed the histogram statistics, the histogram can be
##D # rendered using the ggplot2 library:
##D 
##D require(ggplot2)
##D plot <- ggplot(histStats, aes(x = centroids, y = counts)) +
##D         geom_bar(stat = "identity") +
##D         xlab("Sepal_Length") + ylab("Frequency")   
## End(Not run) 

[Package SparkR version 2.0.0 Index]