Package org.apache.spark.ml.regression
Class GeneralizedLinearRegression
Object
org.apache.spark.ml.PipelineStage
org.apache.spark.ml.Estimator<M>
org.apache.spark.ml.Predictor<FeaturesType,Learner,M>
  
org.apache.spark.ml.regression.Regressor<Vector,GeneralizedLinearRegression,GeneralizedLinearRegressionModel>
  
org.apache.spark.ml.regression.GeneralizedLinearRegression
- All Implemented Interfaces:
- Serializable,- org.apache.spark.internal.Logging,- Params,- HasAggregationDepth,- HasFeaturesCol,- HasFitIntercept,- HasLabelCol,- HasMaxIter,- HasPredictionCol,- HasRegParam,- HasSolver,- HasTol,- HasWeightCol,- PredictorParams,- GeneralizedLinearRegressionBase,- DefaultParamsWritable,- Identifiable,- MLWritable
public class GeneralizedLinearRegression
extends Regressor<Vector,GeneralizedLinearRegression,GeneralizedLinearRegressionModel>
implements GeneralizedLinearRegressionBase, DefaultParamsWritable, org.apache.spark.internal.Logging  
Fit a Generalized Linear Model
 (see 
 Generalized linear model (Wikipedia))
 specified by giving a symbolic description of the linear
 predictor (link function) and a description of the error distribution (family).
 It supports "gaussian", "binomial", "poisson", "gamma" and "tweedie" as family.
 Valid link functions for each family is listed below. The first link function of each family
 is the default one.
  - "gaussian" : "identity", "log", "inverse"
  - "binomial" : "logit", "probit", "cloglog"
  - "poisson"  : "log", "identity", "sqrt"
  - "gamma"    : "inverse", "identity", "log"
  - "tweedie"  : power link function specified through "linkPower". The default link power in
  the tweedie family is 1 - variancePower.
- See Also:
- 
Nested Class SummaryNested ClassesModifier and TypeClassDescriptionstatic classBinomial exponential family distribution.static classstatic classstatic classstatic classGamma exponential family distribution.static classGaussian exponential family distribution.static classstatic classstatic classstatic classstatic classstatic classPoisson exponential family distribution.static classstatic classstatic classNested classes/interfaces inherited from interface org.apache.spark.internal.Loggingorg.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
- 
Constructor SummaryConstructors
- 
Method SummaryModifier and TypeMethodDescriptionfinal IntParamParam for suggested depth for treeAggregate (>= 2).Creates a copy of this instance with the same UID and some extra params.longestimateModelSize(Dataset<?> dataset) family()Param for the name of family which is a description of the error distribution to be used in the model.final BooleanParamParam for whether to fit an intercept term.link()Param for the name of link function which provides the relationship between the linear predictor and the mean of the distribution function.final DoubleParamParam for the index in the power link function.Param for link prediction (linear predictor) column name.static GeneralizedLinearRegressionfinal IntParammaxIter()Param for maximum number of iterations (>= 0).Param for offset column name.static MLReader<T>read()final DoubleParamregParam()Param for regularization parameter (>= 0).setAggregationDepth(int value) Sets the value of paramfamily().setFitIntercept(boolean value) Sets if we should fit the intercept.Sets the value of paramlink().setLinkPower(double value) Sets the value of paramlinkPower().setLinkPredictionCol(String value) Sets the link prediction (linear predictor) column name.setMaxIter(int value) Sets the maximum number of iterations (applicable for solver "irls").setOffsetCol(String value) Sets the value of paramoffsetCol().setRegParam(double value) Sets the regularization parameter for L2 regularization.Sets the solver algorithm used for optimization.setTol(double value) Sets the convergence tolerance of iterations.setVariancePower(double value) Sets the value of paramvariancePower().setWeightCol(String value) Sets the value of paramweightCol().solver()The solver algorithm for optimization.final DoubleParamtol()Param for the convergence tolerance for iterative algorithms (>= 0).uid()An immutable unique ID for the object and its derivatives.final DoubleParamParam for the power in the variance function of the Tweedie distribution which provides the relationship between the variance and mean of the distribution.Param for weight column name.Methods inherited from class org.apache.spark.ml.PredictorfeaturesCol, fit, labelCol, predictionCol, setFeaturesCol, setLabelCol, setPredictionCol, transformSchemaMethods inherited from class org.apache.spark.ml.PipelineStageparamsMethods inherited from class java.lang.Objectequals, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.spark.ml.util.DefaultParamsWritablewriteMethods inherited from interface org.apache.spark.ml.regression.GeneralizedLinearRegressionBasegetFamily, getLink, getLinkPower, getLinkPredictionCol, getOffsetCol, getVariancePower, hasLinkPredictionCol, hasOffsetCol, hasWeightCol, validateAndTransformSchemaMethods inherited from interface org.apache.spark.ml.param.shared.HasAggregationDepthgetAggregationDepthMethods inherited from interface org.apache.spark.ml.param.shared.HasFeaturesColfeaturesCol, getFeaturesColMethods inherited from interface org.apache.spark.ml.param.shared.HasFitInterceptgetFitInterceptMethods inherited from interface org.apache.spark.ml.param.shared.HasLabelColgetLabelCol, labelColMethods inherited from interface org.apache.spark.ml.param.shared.HasMaxItergetMaxIterMethods inherited from interface org.apache.spark.ml.param.shared.HasPredictionColgetPredictionCol, predictionColMethods inherited from interface org.apache.spark.ml.param.shared.HasRegParamgetRegParamMethods inherited from interface org.apache.spark.ml.param.shared.HasWeightColgetWeightColMethods inherited from interface org.apache.spark.ml.util.IdentifiabletoStringMethods inherited from interface org.apache.spark.internal.LogginginitializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logBasedOnLevel, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, MDC, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContextMethods inherited from interface org.apache.spark.ml.util.MLWritablesaveMethods inherited from interface org.apache.spark.ml.param.Paramsclear, copyValues, defaultCopy, defaultParamMap, estimateMatadataSize, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, onParamChange, paramMap, params, set, set, set, setDefault, setDefault, shouldOwn
- 
Constructor Details- 
GeneralizedLinearRegression
- 
GeneralizedLinearRegressionpublic GeneralizedLinearRegression()
 
- 
- 
Method Details- 
load
- 
read
- 
familyDescription copied from interface:GeneralizedLinearRegressionBaseParam for the name of family which is a description of the error distribution to be used in the model. Supported options: "gaussian", "binomial", "poisson", "gamma" and "tweedie". Default is "gaussian".- Specified by:
- familyin interface- GeneralizedLinearRegressionBase
- Returns:
- (undocumented)
 
- 
variancePowerDescription copied from interface:GeneralizedLinearRegressionBaseParam for the power in the variance function of the Tweedie distribution which provides the relationship between the variance and mean of the distribution. Only applicable to the Tweedie family. (see Tweedie Distribution (Wikipedia)) Supported values: 0 and [1, Inf). Note that variance power 0, 1, or 2 corresponds to the Gaussian, Poisson or Gamma family, respectively.- Specified by:
- variancePowerin interface- GeneralizedLinearRegressionBase
- Returns:
- (undocumented)
 
- 
linkDescription copied from interface:GeneralizedLinearRegressionBaseParam for the name of link function which provides the relationship between the linear predictor and the mean of the distribution function. Supported options: "identity", "log", "inverse", "logit", "probit", "cloglog" and "sqrt". This is used only when family is not "tweedie". The link function for the "tweedie" family must be specified throughGeneralizedLinearRegressionBase.linkPower().- Specified by:
- linkin interface- GeneralizedLinearRegressionBase
- Returns:
- (undocumented)
 
- 
linkPowerDescription copied from interface:GeneralizedLinearRegressionBaseParam for the index in the power link function. Only applicable to the Tweedie family. Note that link power 0, 1, -1 or 0.5 corresponds to the Log, Identity, Inverse or Sqrt link, respectively. When not set, this value defaults to 1 -GeneralizedLinearRegressionBase.variancePower(), which matches the R "statmod" package.- Specified by:
- linkPowerin interface- GeneralizedLinearRegressionBase
- Returns:
- (undocumented)
 
- 
linkPredictionColDescription copied from interface:GeneralizedLinearRegressionBaseParam for link prediction (linear predictor) column name. Default is not set, which means we do not output link prediction.- Specified by:
- linkPredictionColin interface- GeneralizedLinearRegressionBase
- Returns:
- (undocumented)
 
- 
offsetColDescription copied from interface:GeneralizedLinearRegressionBaseParam for offset column name. If this is not set or empty, we treat all instance offsets as 0.0. The feature specified as offset has a constant coefficient of 1.0.- Specified by:
- offsetColin interface- GeneralizedLinearRegressionBase
- Returns:
- (undocumented)
 
- 
solverDescription copied from interface:GeneralizedLinearRegressionBaseThe solver algorithm for optimization. Supported options: "irls" (iteratively reweighted least squares). Default: "irls"- Specified by:
- solverin interface- GeneralizedLinearRegressionBase
- Specified by:
- solverin interface- HasSolver
- Returns:
- (undocumented)
 
- 
aggregationDepthDescription copied from interface:HasAggregationDepthParam for suggested depth for treeAggregate (>= 2).- Specified by:
- aggregationDepthin interface- HasAggregationDepth
- Returns:
- (undocumented)
 
- 
weightColDescription copied from interface:HasWeightColParam for weight column name. If this is not set or empty, we treat all instance weights as 1.0.- Specified by:
- weightColin interface- HasWeightCol
- Returns:
- (undocumented)
 
- 
regParamDescription copied from interface:HasRegParamParam for regularization parameter (>= 0).- Specified by:
- regParamin interface- HasRegParam
- Returns:
- (undocumented)
 
- 
tolDescription copied from interface:HasTolParam for the convergence tolerance for iterative algorithms (>= 0).
- 
maxIterDescription copied from interface:HasMaxIterParam for maximum number of iterations (>= 0).- Specified by:
- maxIterin interface- HasMaxIter
- Returns:
- (undocumented)
 
- 
fitInterceptDescription copied from interface:HasFitInterceptParam for whether to fit an intercept term.- Specified by:
- fitInterceptin interface- HasFitIntercept
- Returns:
- (undocumented)
 
- 
uidDescription copied from interface:IdentifiableAn immutable unique ID for the object and its derivatives.- Specified by:
- uidin interface- Identifiable
- Returns:
- (undocumented)
 
- 
setFamilySets the value of paramfamily(). Default is "gaussian".- Parameters:
- value- (undocumented)
- Returns:
- (undocumented)
 
- 
setVariancePowerSets the value of paramvariancePower(). Used only when family is "tweedie". Default is 0.0, which corresponds to the "gaussian" family.- Parameters:
- value- (undocumented)
- Returns:
- (undocumented)
 
- 
setLinkPowerSets the value of paramlinkPower(). Used only when family is "tweedie".- Parameters:
- value- (undocumented)
- Returns:
- (undocumented)
 
- 
setLinkSets the value of paramlink(). Used only when family is not "tweedie".- Parameters:
- value- (undocumented)
- Returns:
- (undocumented)
 
- 
setFitInterceptSets if we should fit the intercept. Default is true.- Parameters:
- value- (undocumented)
- Returns:
- (undocumented)
 
- 
setMaxIterSets the maximum number of iterations (applicable for solver "irls"). Default is 25.- Parameters:
- value- (undocumented)
- Returns:
- (undocumented)
 
- 
setTolSets the convergence tolerance of iterations. Smaller value will lead to higher accuracy with the cost of more iterations. Default is 1E-6.- Parameters:
- value- (undocumented)
- Returns:
- (undocumented)
 
- 
setRegParamSets the regularization parameter for L2 regularization. The regularization term is$$ 0.5 * regParam * L2norm(coefficients)^2 $$ Default is 0.0.- Parameters:
- value- (undocumented)
- Returns:
- (undocumented)
 
- 
setWeightColSets the value of paramweightCol(). If this is not set or empty, we treat all instance weights as 1.0. Default is not set, so all instances have weight one. In the Binomial family, weights correspond to number of trials and should be integer. Non-integer weights are rounded to integer in AIC calculation.- Parameters:
- value- (undocumented)
- Returns:
- (undocumented)
 
- 
setOffsetColSets the value of paramoffsetCol(). If this is not set or empty, we treat all instance offsets as 0.0. Default is not set, so all instances have offset 0.0.- Parameters:
- value- (undocumented)
- Returns:
- (undocumented)
 
- 
setSolverSets the solver algorithm used for optimization. Currently only supports "irls" which is also the default solver.- Parameters:
- value- (undocumented)
- Returns:
- (undocumented)
 
- 
setLinkPredictionColSets the link prediction (linear predictor) column name.- Parameters:
- value- (undocumented)
- Returns:
- (undocumented)
 
- 
setAggregationDepth
- 
copyDescription copied from interface:ParamsCreates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. SeedefaultCopy().- Specified by:
- copyin interface- Params
- Specified by:
- copyin class- Predictor<Vector,- GeneralizedLinearRegression, - GeneralizedLinearRegressionModel> 
- Parameters:
- extra- (undocumented)
- Returns:
- (undocumented)
 
- 
estimateModelSize
 
-