spark.kmeans {SparkR} | R Documentation |
Fits a k-means clustering model against a Spark DataFrame, similarly to R's kmeans().
Users can call summary
to print a summary of the fitted model, predict
to make
predictions on new data, and write.ml
/read.ml
to save/load fitted models.
spark.kmeans(data, formula, ...) ## S4 method for signature 'SparkDataFrame,formula' spark.kmeans(data, formula, k = 2, maxIter = 20, initMode = c("k-means||", "random")) ## S4 method for signature 'KMeansModel' summary(object, ...) ## S4 method for signature 'KMeansModel' predict(object, newData) ## S4 method for signature 'KMeansModel,character' write.ml(object, path, overwrite = FALSE)
data |
SparkDataFrame for training |
formula |
A symbolic description of the model to be fitted. Currently only a few formula operators are supported, including '~', '.', ':', '+', and '-'. Note that the response variable of formula is empty in spark.kmeans. |
k |
Number of centers |
maxIter |
Maximum iteration number |
initMode |
The initialization algorithm choosen to fit the model |
object |
A fitted k-means model |
path |
The directory where the model is saved |
overwrite |
Overwrites or not if the output path already exists. Default is FALSE which means throw exception if the output path exists. |
spark.kmeans
returns a fitted k-means model
summary
returns the model's coefficients, size and cluster
predict
returns the predicted values based on a k-means model
spark.kmeans since 2.0.0
summary(KMeansModel) since 2.0.0
predict(KMeansModel) since 2.0.0
write.ml(KMeansModel, character) since 2.0.0
## Not run:
##D sparkR.session()
##D data(iris)
##D df <- createDataFrame(iris)
##D model <- spark.kmeans(df, Sepal_Length ~ Sepal_Width, k = 4, initMode = "random")
##D summary(model)
##D
##D # fitted values on training data
##D fitted <- predict(model, df)
##D head(select(fitted, "Sepal_Length", "prediction"))
##D
##D # save fitted model to input path
##D path <- "path/to/model"
##D write.ml(model, path)
##D
##D # can also read back the saved model and print
##D savedModel <- read.ml(path)
##D summary(savedModel)
## End(Not run)