spark.bisectingKmeans {SparkR} | R Documentation |

Fits a bisecting k-means clustering model against a SparkDataFrame.
Users can call `summary`

to print a summary of the fitted model, `predict`

to make
predictions on new data, and `write.ml`

/`read.ml`

to save/load fitted models.

Get fitted result from a bisecting k-means model. Note: A saved-loaded model does not support this method.

spark.bisectingKmeans(data, formula, ...) ## S4 method for signature 'SparkDataFrame,formula' spark.bisectingKmeans( data, formula, k = 4, maxIter = 20, seed = NULL, minDivisibleClusterSize = 1 ) ## S4 method for signature 'BisectingKMeansModel' summary(object) ## S4 method for signature 'BisectingKMeansModel' predict(object, newData) ## S4 method for signature 'BisectingKMeansModel' fitted(object, method = c("centers", "classes")) ## S4 method for signature 'BisectingKMeansModel,character' write.ml(object, path, overwrite = FALSE)

`data` |
a SparkDataFrame for training. |

`formula` |
a symbolic description of the model to be fitted. Currently only a few formula operators are supported, including '~', '.', ':', '+', '-', '*', and '^'. Note that the response variable of formula is empty in spark.bisectingKmeans. |

`...` |
additional argument(s) passed to the method. |

`k` |
the desired number of leaf clusters. Must be > 1. The actual number could be smaller if there are no divisible leaf clusters. |

`maxIter` |
maximum iteration number. |

`seed` |
the random seed. |

`minDivisibleClusterSize` |
The minimum number of points (if greater than or equal to 1.0) or the minimum proportion of points (if less than 1.0) of a divisible cluster. Note that it is an expert parameter. The default value should be good enough for most cases. |

`object` |
a fitted bisecting k-means model. |

`newData` |
a SparkDataFrame for testing. |

`method` |
type of fitted results, |

`path` |
the directory where the model is saved. |

`overwrite` |
overwrites or not if the output path already exists. Default is FALSE which means throw exception if the output path exists. |

`spark.bisectingKmeans`

returns a fitted bisecting k-means model.

`summary`

returns summary information of the fitted model, which is a list.
The list includes the model's `k`

(number of cluster centers),
`coefficients`

(model cluster centers),
`size`

(number of data points in each cluster), `cluster`

(cluster centers of the transformed data; cluster is NULL if is.loaded is TRUE),
and `is.loaded`

(whether the model is loaded from a saved file).

`predict`

returns the predicted values based on a bisecting k-means model.

`fitted`

returns a SparkDataFrame containing fitted values.

spark.bisectingKmeans since 2.2.0

summary(BisectingKMeansModel) since 2.2.0

predict(BisectingKMeansModel) since 2.2.0

fitted since 2.2.0

write.ml(BisectingKMeansModel, character) since 2.2.0

```
## Not run:
##D sparkR.session()
##D t <- as.data.frame(Titanic)
##D df <- createDataFrame(t)
##D model <- spark.bisectingKmeans(df, Class ~ Survived, k = 4)
##D summary(model)
##D
##D # get fitted result from a bisecting k-means model
##D fitted.model <- fitted(model, "centers")
##D showDF(fitted.model)
##D
##D # fitted values on training data
##D fitted <- predict(model, df)
##D head(select(fitted, "Class", "prediction"))
##D
##D # save fitted model to input path
##D path <- "path/to/model"
##D write.ml(model, path)
##D
##D # can also read back the saved model and print
##D savedModel <- read.ml(path)
##D summary(savedModel)
## End(Not run)
```

[Package *SparkR* version 3.0.0 Index]