Interface GeneralizedLinearRegressionBase

All Superinterfaces:
HasAggregationDepth, HasFeaturesCol, HasFitIntercept, HasLabelCol, HasMaxIter, HasPredictionCol, HasRegParam, HasSolver, HasTol, HasWeightCol, Identifiable, org.apache.spark.internal.Logging, Params, PredictorParams, Serializable, scala.Serializable
All Known Implementing Classes:
GeneralizedLinearRegression, GeneralizedLinearRegressionModel

public interface GeneralizedLinearRegressionBase extends PredictorParams, HasFitIntercept, HasMaxIter, HasTol, HasRegParam, HasWeightCol, HasSolver, HasAggregationDepth, org.apache.spark.internal.Logging
Params for Generalized Linear Regression.
  • Method Details

    • family

      Param<String> family()
      Param for the name of family which is a description of the error distribution to be used in the model. Supported options: "gaussian", "binomial", "poisson", "gamma" and "tweedie". Default is "gaussian".

    • getFamily

      String getFamily()
    • getLink

      String getLink()
    • getLinkPower

      double getLinkPower()
    • getLinkPredictionCol

      String getLinkPredictionCol()
    • getOffsetCol

      String getOffsetCol()
    • getVariancePower

      double getVariancePower()
    • hasLinkPredictionCol

      boolean hasLinkPredictionCol()
      Checks whether we should output link prediction.
    • hasOffsetCol

      boolean hasOffsetCol()
      Checks whether offset column is set and nonempty.
    • hasWeightCol

      boolean hasWeightCol()
      Checks whether weight column is set and nonempty.
    • link

      Param<String> link()
      Param for the name of link function which provides the relationship between the linear predictor and the mean of the distribution function. Supported options: "identity", "log", "inverse", "logit", "probit", "cloglog" and "sqrt". This is used only when family is not "tweedie". The link function for the "tweedie" family must be specified through linkPower().

    • linkPower

      DoubleParam linkPower()
      Param for the index in the power link function. Only applicable to the Tweedie family. Note that link power 0, 1, -1 or 0.5 corresponds to the Log, Identity, Inverse or Sqrt link, respectively. When not set, this value defaults to 1 - variancePower(), which matches the R "statmod" package.

    • linkPredictionCol

      Param<String> linkPredictionCol()
      Param for link prediction (linear predictor) column name. Default is not set, which means we do not output link prediction.

    • offsetCol

      Param<String> offsetCol()
      Param for offset column name. If this is not set or empty, we treat all instance offsets as 0.0. The feature specified as offset has a constant coefficient of 1.0.

    • solver

      Param<String> solver()
      The solver algorithm for optimization. Supported options: "irls" (iteratively reweighted least squares). Default: "irls"

      Specified by:
      solver in interface HasSolver
    • validateAndTransformSchema

      StructType validateAndTransformSchema(StructType schema, boolean fitting, DataType featuresDataType)
      Description copied from interface: PredictorParams
      Validates and transforms the input schema with the provided param map.

      Specified by:
      validateAndTransformSchema in interface PredictorParams
      schema - input schema
      fitting - whether this is in fitting
      featuresDataType - SQL DataType for FeaturesType. E.g., VectorUDT for vector features.
      output schema
    • variancePower

      DoubleParam variancePower()
      Param for the power in the variance function of the Tweedie distribution which provides the relationship between the variance and mean of the distribution. Only applicable to the Tweedie family. (see Tweedie Distribution (Wikipedia)) Supported values: 0 and [1, Inf). Note that variance power 0, 1, or 2 corresponds to the Gaussian, Poisson or Gamma family, respectively.