spark.kmeans {SparkR}R Documentation

K-Means Clustering Model

Description

Fits a k-means clustering model against a Spark DataFrame, similarly to R's kmeans(). Users can call summary to print a summary of the fitted model, predict to make predictions on new data, and write.ml/read.ml to save/load fitted models.

Usage

spark.kmeans(data, formula, ...)

## S4 method for signature 'SparkDataFrame,formula'
spark.kmeans(data, formula, k = 2,
  maxIter = 20, initMode = c("k-means||", "random"))

## S4 method for signature 'KMeansModel'
summary(object, ...)

## S4 method for signature 'KMeansModel'
predict(object, newData)

## S4 method for signature 'KMeansModel,character'
write.ml(object, path, overwrite = FALSE)

Arguments

data

SparkDataFrame for training

formula

A symbolic description of the model to be fitted. Currently only a few formula operators are supported, including '~', '.', ':', '+', and '-'. Note that the response variable of formula is empty in spark.kmeans.

k

Number of centers

maxIter

Maximum iteration number

initMode

The initialization algorithm choosen to fit the model

object

A fitted k-means model

path

The directory where the model is saved

overwrite

Overwrites or not if the output path already exists. Default is FALSE which means throw exception if the output path exists.

Value

spark.kmeans returns a fitted k-means model

summary returns the model's coefficients, size and cluster

predict returns the predicted values based on a k-means model

Note

spark.kmeans since 2.0.0

summary(KMeansModel) since 2.0.0

predict(KMeansModel) since 2.0.0

write.ml(KMeansModel, character) since 2.0.0

See Also

predict, read.ml, write.ml

Examples

## Not run: 
##D sparkR.session()
##D data(iris)
##D df <- createDataFrame(iris)
##D model <- spark.kmeans(df, Sepal_Length ~ Sepal_Width, k = 4, initMode = "random")
##D summary(model)
##D 
##D # fitted values on training data
##D fitted <- predict(model, df)
##D head(select(fitted, "Sepal_Length", "prediction"))
##D 
##D # save fitted model to input path
##D path <- "path/to/model"
##D write.ml(model, path)
##D 
##D # can also read back the saved model and print
##D savedModel <- read.ml(path)
##D summary(savedModel)
## End(Not run)

[Package SparkR version 2.0.0 Index]