spark.gaussianMixture {SparkR} | R Documentation |

Fits multivariate gaussian mixture model against a Spark DataFrame, similarly to R's
mvnormalmixEM(). Users can call `summary`

to print a summary of the fitted model,
`predict`

to make predictions on new data, and `write.ml`

/`read.ml`

to save/load fitted models.

spark.gaussianMixture(data, formula, ...) ## S4 method for signature 'GaussianMixtureModel,character' write.ml(object, path, overwrite = FALSE) ## S4 method for signature 'SparkDataFrame,formula' spark.gaussianMixture(data, formula, k = 2, maxIter = 100, tol = 0.01) ## S4 method for signature 'GaussianMixtureModel' summary(object) ## S4 method for signature 'GaussianMixtureModel' predict(object, newData)

`data` |
a SparkDataFrame for training. |

`formula` |
a symbolic description of the model to be fitted. Currently only a few formula operators are supported, including '~', '.', ':', '+', and '-'. Note that the response variable of formula is empty in spark.gaussianMixture. |

`...` |
additional arguments passed to the method. |

`object` |
a fitted gaussian mixture model. |

`path` |
the directory where the model is saved. |

`overwrite` |
overwrites or not if the output path already exists. Default is FALSE which means throw exception if the output path exists. |

`k` |
number of independent Gaussians in the mixture model. |

`maxIter` |
maximum iteration number. |

`tol` |
the convergence tolerance. |

`newData` |
a SparkDataFrame for testing. |

`spark.gaussianMixture`

returns a fitted multivariate gaussian mixture model.

`summary`

returns summary of the fitted model, which is a list.
The list includes the model's `lambda`

(lambda), `mu`

(mu),
`sigma`

(sigma), and `posterior`

(posterior).

`predict`

returns a SparkDataFrame containing predicted labels in a column named
"prediction".

write.ml(GaussianMixtureModel, character) since 2.1.0

spark.gaussianMixture since 2.1.0

summary(GaussianMixtureModel) since 2.1.0

predict(GaussianMixtureModel) since 2.1.0

mixtools: https://cran.r-project.org/package=mixtools

```
## Not run:
##D sparkR.session()
##D library(mvtnorm)
##D set.seed(100)
##D a <- rmvnorm(4, c(0, 0))
##D b <- rmvnorm(6, c(3, 4))
##D data <- rbind(a, b)
##D df <- createDataFrame(as.data.frame(data))
##D model <- spark.gaussianMixture(df, ~ V1 + V2, k = 2)
##D summary(model)
##D
##D # fitted values on training data
##D fitted <- predict(model, df)
##D head(select(fitted, "V1", "prediction"))
##D
##D # save fitted model to input path
##D path <- "path/to/model"
##D write.ml(model, path)
##D
##D # can also read back the saved model and print
##D savedModel <- read.ml(path)
##D summary(savedModel)
## End(Not run)
```

[Package *SparkR* version 2.1.0 Index]