Package org.apache.spark.ml.regression
Class GeneralizedLinearRegression
Object
org.apache.spark.ml.PipelineStage
org.apache.spark.ml.Estimator<M>
org.apache.spark.ml.Predictor<FeaturesType,Learner,M>
org.apache.spark.ml.regression.Regressor<Vector,GeneralizedLinearRegression,GeneralizedLinearRegressionModel>
org.apache.spark.ml.regression.GeneralizedLinearRegression
- All Implemented Interfaces:
Serializable
,org.apache.spark.internal.Logging
,Params
,HasAggregationDepth
,HasFeaturesCol
,HasFitIntercept
,HasLabelCol
,HasMaxIter
,HasPredictionCol
,HasRegParam
,HasSolver
,HasTol
,HasWeightCol
,PredictorParams
,GeneralizedLinearRegressionBase
,DefaultParamsWritable
,Identifiable
,MLWritable
public class GeneralizedLinearRegression
extends Regressor<Vector,GeneralizedLinearRegression,GeneralizedLinearRegressionModel>
implements GeneralizedLinearRegressionBase, DefaultParamsWritable, org.apache.spark.internal.Logging
Fit a Generalized Linear Model
(see
Generalized linear model (Wikipedia))
specified by giving a symbolic description of the linear
predictor (link function) and a description of the error distribution (family).
It supports "gaussian", "binomial", "poisson", "gamma" and "tweedie" as family.
Valid link functions for each family is listed below. The first link function of each family
is the default one.
- "gaussian" : "identity", "log", "inverse"
- "binomial" : "logit", "probit", "cloglog"
- "poisson" : "log", "identity", "sqrt"
- "gamma" : "inverse", "identity", "log"
- "tweedie" : power link function specified through "linkPower". The default link power in
the tweedie family is 1 - variancePower.
- See Also:
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic class
Binomial exponential family distribution.static class
static class
static class
static class
Gamma exponential family distribution.static class
Gaussian exponential family distribution.static class
static class
static class
static class
static class
static class
Poisson exponential family distribution.static class
static class
static class
Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging
org.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionfinal IntParam
Param for suggested depth for treeAggregate (>= 2).Creates a copy of this instance with the same UID and some extra params.family()
Param for the name of family which is a description of the error distribution to be used in the model.final BooleanParam
Param for whether to fit an intercept term.link()
Param for the name of link function which provides the relationship between the linear predictor and the mean of the distribution function.final DoubleParam
Param for the index in the power link function.Param for link prediction (linear predictor) column name.static GeneralizedLinearRegression
final IntParam
maxIter()
Param for maximum number of iterations (>= 0).Param for offset column name.static MLReader<T>
read()
final DoubleParam
regParam()
Param for regularization parameter (>= 0).setAggregationDepth
(int value) Sets the value of paramfamily()
.setFitIntercept
(boolean value) Sets if we should fit the intercept.Sets the value of paramlink()
.setLinkPower
(double value) Sets the value of paramlinkPower()
.setLinkPredictionCol
(String value) Sets the link prediction (linear predictor) column name.setMaxIter
(int value) Sets the maximum number of iterations (applicable for solver "irls").setOffsetCol
(String value) Sets the value of paramoffsetCol()
.setRegParam
(double value) Sets the regularization parameter for L2 regularization.Sets the solver algorithm used for optimization.setTol
(double value) Sets the convergence tolerance of iterations.setVariancePower
(double value) Sets the value of paramvariancePower()
.setWeightCol
(String value) Sets the value of paramweightCol()
.solver()
The solver algorithm for optimization.final DoubleParam
tol()
Param for the convergence tolerance for iterative algorithms (>= 0).uid()
An immutable unique ID for the object and its derivatives.final DoubleParam
Param for the power in the variance function of the Tweedie distribution which provides the relationship between the variance and mean of the distribution.Param for weight column name.Methods inherited from class org.apache.spark.ml.Predictor
featuresCol, fit, labelCol, predictionCol, setFeaturesCol, setLabelCol, setPredictionCol, transformSchema
Methods inherited from class org.apache.spark.ml.PipelineStage
params
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.apache.spark.ml.util.DefaultParamsWritable
write
Methods inherited from interface org.apache.spark.ml.regression.GeneralizedLinearRegressionBase
getFamily, getLink, getLinkPower, getLinkPredictionCol, getOffsetCol, getVariancePower, hasLinkPredictionCol, hasOffsetCol, hasWeightCol, validateAndTransformSchema
Methods inherited from interface org.apache.spark.ml.param.shared.HasAggregationDepth
getAggregationDepth
Methods inherited from interface org.apache.spark.ml.param.shared.HasFeaturesCol
featuresCol, getFeaturesCol
Methods inherited from interface org.apache.spark.ml.param.shared.HasFitIntercept
getFitIntercept
Methods inherited from interface org.apache.spark.ml.param.shared.HasLabelCol
getLabelCol, labelCol
Methods inherited from interface org.apache.spark.ml.param.shared.HasMaxIter
getMaxIter
Methods inherited from interface org.apache.spark.ml.param.shared.HasPredictionCol
getPredictionCol, predictionCol
Methods inherited from interface org.apache.spark.ml.param.shared.HasRegParam
getRegParam
Methods inherited from interface org.apache.spark.ml.param.shared.HasWeightCol
getWeightCol
Methods inherited from interface org.apache.spark.ml.util.Identifiable
toString
Methods inherited from interface org.apache.spark.internal.Logging
initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContext
Methods inherited from interface org.apache.spark.ml.util.MLWritable
save
Methods inherited from interface org.apache.spark.ml.param.Params
clear, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, onParamChange, paramMap, params, set, set, set, setDefault, setDefault, shouldOwn
-
Constructor Details
-
GeneralizedLinearRegression
-
GeneralizedLinearRegression
public GeneralizedLinearRegression()
-
-
Method Details
-
load
-
read
-
family
Description copied from interface:GeneralizedLinearRegressionBase
Param for the name of family which is a description of the error distribution to be used in the model. Supported options: "gaussian", "binomial", "poisson", "gamma" and "tweedie". Default is "gaussian".- Specified by:
family
in interfaceGeneralizedLinearRegressionBase
- Returns:
- (undocumented)
-
variancePower
Description copied from interface:GeneralizedLinearRegressionBase
Param for the power in the variance function of the Tweedie distribution which provides the relationship between the variance and mean of the distribution. Only applicable to the Tweedie family. (see Tweedie Distribution (Wikipedia)) Supported values: 0 and [1, Inf). Note that variance power 0, 1, or 2 corresponds to the Gaussian, Poisson or Gamma family, respectively.- Specified by:
variancePower
in interfaceGeneralizedLinearRegressionBase
- Returns:
- (undocumented)
-
link
Description copied from interface:GeneralizedLinearRegressionBase
Param for the name of link function which provides the relationship between the linear predictor and the mean of the distribution function. Supported options: "identity", "log", "inverse", "logit", "probit", "cloglog" and "sqrt". This is used only when family is not "tweedie". The link function for the "tweedie" family must be specified throughGeneralizedLinearRegressionBase.linkPower()
.- Specified by:
link
in interfaceGeneralizedLinearRegressionBase
- Returns:
- (undocumented)
-
linkPower
Description copied from interface:GeneralizedLinearRegressionBase
Param for the index in the power link function. Only applicable to the Tweedie family. Note that link power 0, 1, -1 or 0.5 corresponds to the Log, Identity, Inverse or Sqrt link, respectively. When not set, this value defaults to 1 -GeneralizedLinearRegressionBase.variancePower()
, which matches the R "statmod" package.- Specified by:
linkPower
in interfaceGeneralizedLinearRegressionBase
- Returns:
- (undocumented)
-
linkPredictionCol
Description copied from interface:GeneralizedLinearRegressionBase
Param for link prediction (linear predictor) column name. Default is not set, which means we do not output link prediction.- Specified by:
linkPredictionCol
in interfaceGeneralizedLinearRegressionBase
- Returns:
- (undocumented)
-
offsetCol
Description copied from interface:GeneralizedLinearRegressionBase
Param for offset column name. If this is not set or empty, we treat all instance offsets as 0.0. The feature specified as offset has a constant coefficient of 1.0.- Specified by:
offsetCol
in interfaceGeneralizedLinearRegressionBase
- Returns:
- (undocumented)
-
solver
Description copied from interface:GeneralizedLinearRegressionBase
The solver algorithm for optimization. Supported options: "irls" (iteratively reweighted least squares). Default: "irls"- Specified by:
solver
in interfaceGeneralizedLinearRegressionBase
- Specified by:
solver
in interfaceHasSolver
- Returns:
- (undocumented)
-
aggregationDepth
Description copied from interface:HasAggregationDepth
Param for suggested depth for treeAggregate (>= 2).- Specified by:
aggregationDepth
in interfaceHasAggregationDepth
- Returns:
- (undocumented)
-
weightCol
Description copied from interface:HasWeightCol
Param for weight column name. If this is not set or empty, we treat all instance weights as 1.0.- Specified by:
weightCol
in interfaceHasWeightCol
- Returns:
- (undocumented)
-
regParam
Description copied from interface:HasRegParam
Param for regularization parameter (>= 0).- Specified by:
regParam
in interfaceHasRegParam
- Returns:
- (undocumented)
-
tol
Description copied from interface:HasTol
Param for the convergence tolerance for iterative algorithms (>= 0). -
maxIter
Description copied from interface:HasMaxIter
Param for maximum number of iterations (>= 0).- Specified by:
maxIter
in interfaceHasMaxIter
- Returns:
- (undocumented)
-
fitIntercept
Description copied from interface:HasFitIntercept
Param for whether to fit an intercept term.- Specified by:
fitIntercept
in interfaceHasFitIntercept
- Returns:
- (undocumented)
-
uid
Description copied from interface:Identifiable
An immutable unique ID for the object and its derivatives.- Specified by:
uid
in interfaceIdentifiable
- Returns:
- (undocumented)
-
setFamily
Sets the value of paramfamily()
. Default is "gaussian".- Parameters:
value
- (undocumented)- Returns:
- (undocumented)
-
setVariancePower
Sets the value of paramvariancePower()
. Used only when family is "tweedie". Default is 0.0, which corresponds to the "gaussian" family.- Parameters:
value
- (undocumented)- Returns:
- (undocumented)
-
setLinkPower
Sets the value of paramlinkPower()
. Used only when family is "tweedie".- Parameters:
value
- (undocumented)- Returns:
- (undocumented)
-
setLink
Sets the value of paramlink()
. Used only when family is not "tweedie".- Parameters:
value
- (undocumented)- Returns:
- (undocumented)
-
setFitIntercept
Sets if we should fit the intercept. Default is true.- Parameters:
value
- (undocumented)- Returns:
- (undocumented)
-
setMaxIter
Sets the maximum number of iterations (applicable for solver "irls"). Default is 25.- Parameters:
value
- (undocumented)- Returns:
- (undocumented)
-
setTol
Sets the convergence tolerance of iterations. Smaller value will lead to higher accuracy with the cost of more iterations. Default is 1E-6.- Parameters:
value
- (undocumented)- Returns:
- (undocumented)
-
setRegParam
Sets the regularization parameter for L2 regularization. The regularization term is$$ 0.5 * regParam * L2norm(coefficients)^2 $$
Default is 0.0.- Parameters:
value
- (undocumented)- Returns:
- (undocumented)
-
setWeightCol
Sets the value of paramweightCol()
. If this is not set or empty, we treat all instance weights as 1.0. Default is not set, so all instances have weight one. In the Binomial family, weights correspond to number of trials and should be integer. Non-integer weights are rounded to integer in AIC calculation.- Parameters:
value
- (undocumented)- Returns:
- (undocumented)
-
setOffsetCol
Sets the value of paramoffsetCol()
. If this is not set or empty, we treat all instance offsets as 0.0. Default is not set, so all instances have offset 0.0.- Parameters:
value
- (undocumented)- Returns:
- (undocumented)
-
setSolver
Sets the solver algorithm used for optimization. Currently only supports "irls" which is also the default solver.- Parameters:
value
- (undocumented)- Returns:
- (undocumented)
-
setLinkPredictionCol
Sets the link prediction (linear predictor) column name.- Parameters:
value
- (undocumented)- Returns:
- (undocumented)
-
setAggregationDepth
-
copy
Description copied from interface:Params
Creates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. SeedefaultCopy()
.- Specified by:
copy
in interfaceParams
- Specified by:
copy
in classPredictor<Vector,
GeneralizedLinearRegression, GeneralizedLinearRegressionModel> - Parameters:
extra
- (undocumented)- Returns:
- (undocumented)
-