Class GeneralizedLinearRegression

All Implemented Interfaces:
Serializable, org.apache.spark.internal.Logging, Params, HasAggregationDepth, HasFeaturesCol, HasFitIntercept, HasLabelCol, HasMaxIter, HasPredictionCol, HasRegParam, HasSolver, HasTol, HasWeightCol, PredictorParams, GeneralizedLinearRegressionBase, DefaultParamsWritable, Identifiable, MLWritable

public class GeneralizedLinearRegression extends Regressor<Vector,GeneralizedLinearRegression,GeneralizedLinearRegressionModel> implements GeneralizedLinearRegressionBase, DefaultParamsWritable, org.apache.spark.internal.Logging
Fit a Generalized Linear Model (see Generalized linear model (Wikipedia)) specified by giving a symbolic description of the linear predictor (link function) and a description of the error distribution (family). It supports "gaussian", "binomial", "poisson", "gamma" and "tweedie" as family. Valid link functions for each family is listed below. The first link function of each family is the default one. - "gaussian" : "identity", "log", "inverse" - "binomial" : "logit", "probit", "cloglog" - "poisson" : "log", "identity", "sqrt" - "gamma" : "inverse", "identity", "log" - "tweedie" : power link function specified through "linkPower". The default link power in the tweedie family is 1 - variancePower.
See Also:
  • Constructor Details

    • GeneralizedLinearRegression

      public GeneralizedLinearRegression(String uid)
    • GeneralizedLinearRegression

      public GeneralizedLinearRegression()
  • Method Details

    • load

      public static GeneralizedLinearRegression load(String path)
    • read

      public static MLReader<T> read()
    • family

      public final Param<String> family()
      Description copied from interface: GeneralizedLinearRegressionBase
      Param for the name of family which is a description of the error distribution to be used in the model. Supported options: "gaussian", "binomial", "poisson", "gamma" and "tweedie". Default is "gaussian".

      Specified by:
      family in interface GeneralizedLinearRegressionBase
      Returns:
      (undocumented)
    • variancePower

      public final DoubleParam variancePower()
      Description copied from interface: GeneralizedLinearRegressionBase
      Param for the power in the variance function of the Tweedie distribution which provides the relationship between the variance and mean of the distribution. Only applicable to the Tweedie family. (see Tweedie Distribution (Wikipedia)) Supported values: 0 and [1, Inf). Note that variance power 0, 1, or 2 corresponds to the Gaussian, Poisson or Gamma family, respectively.

      Specified by:
      variancePower in interface GeneralizedLinearRegressionBase
      Returns:
      (undocumented)
    • link

      public final Param<String> link()
      Description copied from interface: GeneralizedLinearRegressionBase
      Param for the name of link function which provides the relationship between the linear predictor and the mean of the distribution function. Supported options: "identity", "log", "inverse", "logit", "probit", "cloglog" and "sqrt". This is used only when family is not "tweedie". The link function for the "tweedie" family must be specified through GeneralizedLinearRegressionBase.linkPower().

      Specified by:
      link in interface GeneralizedLinearRegressionBase
      Returns:
      (undocumented)
    • linkPower

      public final DoubleParam linkPower()
      Description copied from interface: GeneralizedLinearRegressionBase
      Param for the index in the power link function. Only applicable to the Tweedie family. Note that link power 0, 1, -1 or 0.5 corresponds to the Log, Identity, Inverse or Sqrt link, respectively. When not set, this value defaults to 1 - GeneralizedLinearRegressionBase.variancePower(), which matches the R "statmod" package.

      Specified by:
      linkPower in interface GeneralizedLinearRegressionBase
      Returns:
      (undocumented)
    • linkPredictionCol

      public final Param<String> linkPredictionCol()
      Description copied from interface: GeneralizedLinearRegressionBase
      Param for link prediction (linear predictor) column name. Default is not set, which means we do not output link prediction.

      Specified by:
      linkPredictionCol in interface GeneralizedLinearRegressionBase
      Returns:
      (undocumented)
    • offsetCol

      public final Param<String> offsetCol()
      Description copied from interface: GeneralizedLinearRegressionBase
      Param for offset column name. If this is not set or empty, we treat all instance offsets as 0.0. The feature specified as offset has a constant coefficient of 1.0.

      Specified by:
      offsetCol in interface GeneralizedLinearRegressionBase
      Returns:
      (undocumented)
    • solver

      public final Param<String> solver()
      Description copied from interface: GeneralizedLinearRegressionBase
      The solver algorithm for optimization. Supported options: "irls" (iteratively reweighted least squares). Default: "irls"

      Specified by:
      solver in interface GeneralizedLinearRegressionBase
      Specified by:
      solver in interface HasSolver
      Returns:
      (undocumented)
    • aggregationDepth

      public final IntParam aggregationDepth()
      Description copied from interface: HasAggregationDepth
      Param for suggested depth for treeAggregate (&gt;= 2).
      Specified by:
      aggregationDepth in interface HasAggregationDepth
      Returns:
      (undocumented)
    • weightCol

      public final Param<String> weightCol()
      Description copied from interface: HasWeightCol
      Param for weight column name. If this is not set or empty, we treat all instance weights as 1.0.
      Specified by:
      weightCol in interface HasWeightCol
      Returns:
      (undocumented)
    • regParam

      public final DoubleParam regParam()
      Description copied from interface: HasRegParam
      Param for regularization parameter (&gt;= 0).
      Specified by:
      regParam in interface HasRegParam
      Returns:
      (undocumented)
    • tol

      public final DoubleParam tol()
      Description copied from interface: HasTol
      Param for the convergence tolerance for iterative algorithms (&gt;= 0).
      Specified by:
      tol in interface HasTol
      Returns:
      (undocumented)
    • maxIter

      public final IntParam maxIter()
      Description copied from interface: HasMaxIter
      Param for maximum number of iterations (&gt;= 0).
      Specified by:
      maxIter in interface HasMaxIter
      Returns:
      (undocumented)
    • fitIntercept

      public final BooleanParam fitIntercept()
      Description copied from interface: HasFitIntercept
      Param for whether to fit an intercept term.
      Specified by:
      fitIntercept in interface HasFitIntercept
      Returns:
      (undocumented)
    • uid

      public String uid()
      Description copied from interface: Identifiable
      An immutable unique ID for the object and its derivatives.
      Specified by:
      uid in interface Identifiable
      Returns:
      (undocumented)
    • setFamily

      public GeneralizedLinearRegression setFamily(String value)
      Sets the value of param family(). Default is "gaussian".

      Parameters:
      value - (undocumented)
      Returns:
      (undocumented)
    • setVariancePower

      public GeneralizedLinearRegression setVariancePower(double value)
      Sets the value of param variancePower(). Used only when family is "tweedie". Default is 0.0, which corresponds to the "gaussian" family.

      Parameters:
      value - (undocumented)
      Returns:
      (undocumented)
    • setLinkPower

      public GeneralizedLinearRegression setLinkPower(double value)
      Sets the value of param linkPower(). Used only when family is "tweedie".

      Parameters:
      value - (undocumented)
      Returns:
      (undocumented)
    • setLink

      public GeneralizedLinearRegression setLink(String value)
      Sets the value of param link(). Used only when family is not "tweedie".

      Parameters:
      value - (undocumented)
      Returns:
      (undocumented)
    • setFitIntercept

      public GeneralizedLinearRegression setFitIntercept(boolean value)
      Sets if we should fit the intercept. Default is true.

      Parameters:
      value - (undocumented)
      Returns:
      (undocumented)
    • setMaxIter

      public GeneralizedLinearRegression setMaxIter(int value)
      Sets the maximum number of iterations (applicable for solver "irls"). Default is 25.

      Parameters:
      value - (undocumented)
      Returns:
      (undocumented)
    • setTol

      public GeneralizedLinearRegression setTol(double value)
      Sets the convergence tolerance of iterations. Smaller value will lead to higher accuracy with the cost of more iterations. Default is 1E-6.

      Parameters:
      value - (undocumented)
      Returns:
      (undocumented)
    • setRegParam

      public GeneralizedLinearRegression setRegParam(double value)
      Sets the regularization parameter for L2 regularization. The regularization term is
      $$ 0.5 * regParam * L2norm(coefficients)^2 $$
      Default is 0.0.

      Parameters:
      value - (undocumented)
      Returns:
      (undocumented)
    • setWeightCol

      public GeneralizedLinearRegression setWeightCol(String value)
      Sets the value of param weightCol(). If this is not set or empty, we treat all instance weights as 1.0. Default is not set, so all instances have weight one. In the Binomial family, weights correspond to number of trials and should be integer. Non-integer weights are rounded to integer in AIC calculation.

      Parameters:
      value - (undocumented)
      Returns:
      (undocumented)
    • setOffsetCol

      public GeneralizedLinearRegression setOffsetCol(String value)
      Sets the value of param offsetCol(). If this is not set or empty, we treat all instance offsets as 0.0. Default is not set, so all instances have offset 0.0.

      Parameters:
      value - (undocumented)
      Returns:
      (undocumented)
    • setSolver

      public GeneralizedLinearRegression setSolver(String value)
      Sets the solver algorithm used for optimization. Currently only supports "irls" which is also the default solver.

      Parameters:
      value - (undocumented)
      Returns:
      (undocumented)
    • setLinkPredictionCol

      public GeneralizedLinearRegression setLinkPredictionCol(String value)
      Sets the link prediction (linear predictor) column name.

      Parameters:
      value - (undocumented)
      Returns:
      (undocumented)
    • setAggregationDepth

      public GeneralizedLinearRegression setAggregationDepth(int value)
    • copy

      Description copied from interface: Params
      Creates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. See defaultCopy().
      Specified by:
      copy in interface Params
      Specified by:
      copy in class Predictor<Vector,GeneralizedLinearRegression,GeneralizedLinearRegressionModel>
      Parameters:
      extra - (undocumented)
      Returns:
      (undocumented)