spark.glm {SparkR}  R Documentation 
Fits generalized linear model against a SparkDataFrame.
Users can call summary
to print a summary of the fitted model, predict
to make
predictions on new data, and write.ml
/read.ml
to save/load fitted models.
spark.glm(data, formula, ...) ## S4 method for signature 'SparkDataFrame,formula' spark.glm(data, formula, family = gaussian, tol = 1e06, maxIter = 25, weightCol = NULL, regParam = 0, var.power = 0, link.power = 1  var.power) ## S4 method for signature 'GeneralizedLinearRegressionModel' summary(object) ## S3 method for class 'summary.GeneralizedLinearRegressionModel' print(x, ...) ## S4 method for signature 'GeneralizedLinearRegressionModel' predict(object, newData) ## S4 method for signature 'GeneralizedLinearRegressionModel,character' write.ml(object, path, overwrite = FALSE)
data 
a SparkDataFrame for training. 
formula 
a symbolic description of the model to be fitted. Currently only a few formula operators are supported, including '~', '.', ':', '+', and ''. 
... 
additional arguments passed to the method. 
family 
a description of the error distribution and link function to be used in the model.
This can be a character string naming a family function, a family function or
the result of a call to a family function. Refer R family at
https://stat.ethz.ch/Rmanual/Rdevel/library/stats/html/family.html.
Currently these families are supported: Note that there are two ways to specify the tweedie family.

tol 
positive convergence tolerance of iterations. 
maxIter 
integer giving the maximal number of IRLS iterations. 
weightCol 
the weight column name. If this is not set or 
regParam 
regularization parameter for L2 regularization. 
var.power 
the power in the variance function of the Tweedie distribution which provides the relationship between the variance and mean of the distribution. Only applicable to the Tweedie family. 
link.power 
the index in the power link function. Only applicable to the Tweedie family. 
object 
a fitted generalized linear model. 
x 
summary object of fitted generalized linear model returned by 
newData 
a SparkDataFrame for testing. 
path 
the directory where the model is saved. 
overwrite 
overwrites or not if the output path already exists. Default is FALSE which means throw exception if the output path exists. 
spark.glm
returns a fitted generalized linear model.
summary
returns summary information of the fitted model, which is a list.
The list of components includes at least the coefficients
(coefficients matrix, which includes
coefficients, standard error of coefficients, t value and p value),
null.deviance
(null/residual degrees of freedom), aic
(AIC)
and iter
(number of iterations IRLS takes). If there are collinear columns in the data,
the coefficients matrix only provides coefficients.
predict
returns a SparkDataFrame containing predicted labels in a column named
"prediction".
spark.glm since 2.0.0
summary(GeneralizedLinearRegressionModel) since 2.0.0
print.summary.GeneralizedLinearRegressionModel since 2.0.0
predict(GeneralizedLinearRegressionModel) since 1.5.0
write.ml(GeneralizedLinearRegressionModel, character) since 2.0.0
## Not run:
##D sparkR.session()
##D t < as.data.frame(Titanic)
##D df < createDataFrame(t)
##D model < spark.glm(df, Freq ~ Sex + Age, family = "gaussian")
##D summary(model)
##D
##D # fitted values on training data
##D fitted < predict(model, df)
##D head(select(fitted, "Freq", "prediction"))
##D
##D # save fitted model to input path
##D path < "path/to/model"
##D write.ml(model, path)
##D
##D # can also read back the saved model and print
##D savedModel < read.ml(path)
##D summary(savedModel)
##D
##D # fit tweedie model
##D model < spark.glm(df, Freq ~ Sex + Age, family = "tweedie",
##D var.power = 1.2, link.power = 0)
##D summary(model)
##D
##D # use the tweedie family from statmod
##D library(statmod)
##D model < spark.glm(df, Freq ~ Sex + Age, family = tweedie(1.2, 0))
##D summary(model)
## End(Not run)