spark.glm {SparkR}R Documentation

Generalized Linear Models

Description

Fits generalized linear model against a Spark DataFrame. Users can call summary to print a summary of the fitted model, predict to make predictions on new data, and write.ml/read.ml to save/load fitted models.

Usage

spark.glm(data, formula, ...)

## S4 method for signature 'SparkDataFrame,formula'
spark.glm(data, formula, family = gaussian,
  tol = 1e-06, maxIter = 25)

## S4 method for signature 'GeneralizedLinearRegressionModel'
summary(object, ...)

## S3 method for class 'summary.GeneralizedLinearRegressionModel'
print(x, ...)

## S4 method for signature 'GeneralizedLinearRegressionModel'
predict(object, newData)

## S4 method for signature 'GeneralizedLinearRegressionModel,character'
write.ml(object, path,
  overwrite = FALSE)

Arguments

data

a SparkDataFrame for training.

formula

a symbolic description of the model to be fitted. Currently only a few formula operators are supported, including '~', '.', ':', '+', and '-'.

...

additional arguments passed to the method.

family

a description of the error distribution and link function to be used in the model. This can be a character string naming a family function, a family function or the result of a call to a family function. Refer R family at https://stat.ethz.ch/R-manual/R-devel/library/stats/html/family.html.

tol

positive convergence tolerance of iterations.

maxIter

integer giving the maximal number of IRLS iterations.

object

a fitted generalized linear model.

x

summary object of fitted generalized linear model returned by summary function

newData

a SparkDataFrame for testing.

path

the directory where the model is saved.

overwrite

overwrites or not if the output path already exists. Default is FALSE which means throw exception if the output path exists.

Value

spark.glm returns a fitted generalized linear model

summary returns a summary object of the fitted model, a list of components including at least the coefficients, null/residual deviance, null/residual degrees of freedom, AIC and number of iterations IRLS takes.

predict returns a SparkDataFrame containing predicted labels in a column named "prediction"

Note

spark.glm since 2.0.0

summary(GeneralizedLinearRegressionModel) since 2.0.0

print.summary.GeneralizedLinearRegressionModel since 2.0.0

predict(GeneralizedLinearRegressionModel) since 1.5.0

write.ml(GeneralizedLinearRegressionModel, character) since 2.0.0

See Also

glm, read.ml

Examples

## Not run: 
##D sparkR.session()
##D data(iris)
##D df <- createDataFrame(iris)
##D model <- spark.glm(df, Sepal_Length ~ Sepal_Width, family = "gaussian")
##D summary(model)
##D 
##D # fitted values on training data
##D fitted <- predict(model, df)
##D head(select(fitted, "Sepal_Length", "prediction"))
##D 
##D # save fitted model to input path
##D path <- "path/to/model"
##D write.ml(model, path)
##D 
##D # can also read back the saved model and print
##D savedModel <- read.ml(path)
##D summary(savedModel)
## End(Not run)

[Package SparkR version 2.0.1 Index]