R: Bisecting K-Means Clustering Model

spark.bisectingKmeans {SparkR}

R Documentation

Bisecting K-Means Clustering Model

Description

Fits a bisecting k-means clustering model against a SparkDataFrame. Users can call summary to print a summary of the fitted model, predict to make predictions on new data, and write.ml/read.ml to save/load fitted models.

Get fitted result from a bisecting k-means model. Note: A saved-loaded model does not support this method.

Usage

spark.bisectingKmeans(data, formula, ...)

## S4 method for signature 'SparkDataFrame,formula'
spark.bisectingKmeans(data, formula,
  k = 4, maxIter = 20, seed = NULL, minDivisibleClusterSize = 1)

## S4 method for signature 'BisectingKMeansModel'
summary(object)

## S4 method for signature 'BisectingKMeansModel'
predict(object, newData)

## S4 method for signature 'BisectingKMeansModel'
fitted(object, method = c("centers",
  "classes"))

## S4 method for signature 'BisectingKMeansModel,character'
write.ml(object, path,
  overwrite = FALSE)

Arguments

`data`	a SparkDataFrame for training.
`formula`	a symbolic description of the model to be fitted. Currently only a few formula operators are supported, including '~', '.', ':', '+', and '-'. Note that the response variable of formula is empty in spark.bisectingKmeans.
`...`	additional argument(s) passed to the method.
`k`	the desired number of leaf clusters. Must be > 1. The actual number could be smaller if there are no divisible leaf clusters.
`maxIter`	maximum iteration number.
`seed`	the random seed.
`minDivisibleClusterSize`	The minimum number of points (if greater than or equal to 1.0) or the minimum proportion of points (if less than 1.0) of a divisible cluster. Note that it is an expert parameter. The default value should be good enough for most cases.
`object`	a fitted bisecting k-means model.
`newData`	a SparkDataFrame for testing.
`method`	type of fitted results, `"centers"` for cluster centers or `"classes"` for assigned classes.
`path`	the directory where the model is saved.
`overwrite`	overwrites or not if the output path already exists. Default is FALSE which means throw exception if the output path exists.

Value

spark.bisectingKmeans returns a fitted bisecting k-means model.

summary returns summary information of the fitted model, which is a list. The list includes the model's k (number of cluster centers), coefficients (model cluster centers), size (number of data points in each cluster), cluster (cluster centers of the transformed data; cluster is NULL if is.loaded is TRUE), and is.loaded (whether the model is loaded from a saved file).

predict returns the predicted values based on a bisecting k-means model.

fitted returns a SparkDataFrame containing fitted values.

Note

spark.bisectingKmeans since 2.2.0

summary(BisectingKMeansModel) since 2.2.0

predict(BisectingKMeansModel) since 2.2.0

fitted since 2.2.0

write.ml(BisectingKMeansModel, character) since 2.2.0

Examples

## Not run: 
##D sparkR.session()
##D t <- as.data.frame(Titanic)
##D df <- createDataFrame(t)
##D model <- spark.bisectingKmeans(df, Class ~ Survived, k = 4)
##D summary(model)
##D 
##D # get fitted result from a bisecting k-means model
##D fitted.model <- fitted(model, "centers")
##D showDF(fitted.model)
##D 
##D # fitted values on training data
##D fitted <- predict(model, df)
##D head(select(fitted, "Class", "prediction"))
##D 
##D # save fitted model to input path
##D path <- "path/to/model"
##D write.ml(model, path)
##D 
##D # can also read back the saved model and print
##D savedModel <- read.ml(path)
##D summary(savedModel)
## End(Not run)

[Package SparkR version 2.4.3 Index]