Class LogisticRegression
- All Implemented Interfaces:
- Serializable,- org.apache.spark.internal.Logging,- ClassifierParams,- LogisticRegressionParams,- ProbabilisticClassifierParams,- Params,- HasAggregationDepth,- HasElasticNetParam,- HasFeaturesCol,- HasFitIntercept,- HasLabelCol,- HasMaxBlockSizeInMB,- HasMaxIter,- HasPredictionCol,- HasProbabilityCol,- HasRawPredictionCol,- HasRegParam,- HasStandardization,- HasThreshold,- HasThresholds,- HasTol,- HasWeightCol,- PredictorParams,- DefaultParamsWritable,- Identifiable,- MLWritable
This class supports fitting traditional logistic regression model by LBFGS/OWLQN and bound (box) constrained logistic regression model by LBFGSB.
Since 3.1.0, it supports stacking instances into blocks and using GEMV/GEMM for better performance. The block size will be 1.0 MB, if param maxBlockSizeInMB is set 0.0 by default.
- See Also:
- 
Nested Class SummaryNested classes/interfaces inherited from interface org.apache.spark.internal.Loggingorg.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
- 
Constructor SummaryConstructors
- 
Method SummaryModifier and TypeMethodDescriptionfinal IntParamParam for suggested depth for treeAggregate (>= 2).Creates a copy of this instance with the same UID and some extra params.final DoubleParamParam for the ElasticNet mixing parameter, in range [0, 1].family()Param for the name of family which is a description of the label distribution to be used in the model.final BooleanParamParam for whether to fit an intercept term.doubleGet threshold for binary classification.double[]Get thresholds for binary or multiclass classification.static LogisticRegressionThe lower bounds on coefficients if fitting under bound constrained optimization.The lower bounds on intercepts if fitting under bound constrained optimization.final DoubleParamParam for Maximum memory in MB for stacking input data into blocks.final IntParammaxIter()Param for maximum number of iterations (>= 0).static MLReader<T>read()final DoubleParamregParam()Param for regularization parameter (>= 0).setAggregationDepth(int value) Suggested depth for treeAggregate (greater than or equal to 2).setElasticNetParam(double value) Set the ElasticNet mixing parameter.Sets the value of paramfamily().setFitIntercept(boolean value) Whether to fit an intercept term.Set the lower bounds on coefficients if fitting under bound constrained optimization.setLowerBoundsOnIntercepts(Vector value) Set the lower bounds on intercepts if fitting under bound constrained optimization.setMaxBlockSizeInMB(double value) Sets the value of parammaxBlockSizeInMB().setMaxIter(int value) Set the maximum number of iterations.setRegParam(double value) Set the regularization parameter.setStandardization(boolean value) Whether to standardize the training features before fitting the model.setThreshold(double value) Set threshold in binary classification, in range [0, 1].setThresholds(double[] value) Set thresholds in multiclass (or binary) classification to adjust the probability of predicting each class.setTol(double value) Set the convergence tolerance of iterations.Set the upper bounds on coefficients if fitting under bound constrained optimization.setUpperBoundsOnIntercepts(Vector value) Set the upper bounds on intercepts if fitting under bound constrained optimization.setWeightCol(String value) Sets the value of paramweightCol().final BooleanParamParam for whether to standardize the training features before fitting the model.Param for threshold in binary classification prediction, in range [0, 1].final DoubleParamtol()Param for the convergence tolerance for iterative algorithms (>= 0).uid()An immutable unique ID for the object and its derivatives.The upper bounds on coefficients if fitting under bound constrained optimization.The upper bounds on intercepts if fitting under bound constrained optimization.Param for weight column name.Methods inherited from class org.apache.spark.ml.classification.ProbabilisticClassifierprobabilityCol, setProbabilityCol, thresholdsMethods inherited from class org.apache.spark.ml.classification.ClassifierrawPredictionCol, setRawPredictionColMethods inherited from class org.apache.spark.ml.PredictorfeaturesCol, fit, labelCol, predictionCol, setFeaturesCol, setLabelCol, setPredictionCol, transformSchemaMethods inherited from class org.apache.spark.ml.PipelineStageparamsMethods inherited from class java.lang.Objectequals, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.spark.ml.util.DefaultParamsWritablewriteMethods inherited from interface org.apache.spark.ml.param.shared.HasAggregationDepthgetAggregationDepthMethods inherited from interface org.apache.spark.ml.param.shared.HasElasticNetParamgetElasticNetParamMethods inherited from interface org.apache.spark.ml.param.shared.HasFeaturesColfeaturesCol, getFeaturesColMethods inherited from interface org.apache.spark.ml.param.shared.HasFitInterceptgetFitInterceptMethods inherited from interface org.apache.spark.ml.param.shared.HasLabelColgetLabelCol, labelColMethods inherited from interface org.apache.spark.ml.param.shared.HasMaxBlockSizeInMBgetMaxBlockSizeInMBMethods inherited from interface org.apache.spark.ml.param.shared.HasMaxItergetMaxIterMethods inherited from interface org.apache.spark.ml.param.shared.HasPredictionColgetPredictionCol, predictionColMethods inherited from interface org.apache.spark.ml.param.shared.HasProbabilityColgetProbabilityCol, probabilityColMethods inherited from interface org.apache.spark.ml.param.shared.HasRawPredictionColgetRawPredictionCol, rawPredictionColMethods inherited from interface org.apache.spark.ml.param.shared.HasRegParamgetRegParamMethods inherited from interface org.apache.spark.ml.param.shared.HasStandardizationgetStandardizationMethods inherited from interface org.apache.spark.ml.param.shared.HasThresholdsthresholdsMethods inherited from interface org.apache.spark.ml.param.shared.HasWeightColgetWeightColMethods inherited from interface org.apache.spark.ml.util.IdentifiabletoStringMethods inherited from interface org.apache.spark.internal.LogginginitializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logBasedOnLevel, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, MDC, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContextMethods inherited from interface org.apache.spark.ml.classification.LogisticRegressionParamscheckThresholdConsistency, getFamily, getLowerBoundsOnCoefficients, getLowerBoundsOnIntercepts, getUpperBoundsOnCoefficients, getUpperBoundsOnIntercepts, usingBoundConstrainedOptimization, validateAndTransformSchemaMethods inherited from interface org.apache.spark.ml.util.MLWritablesaveMethods inherited from interface org.apache.spark.ml.param.Paramsclear, copyValues, defaultCopy, defaultParamMap, estimateMatadataSize, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, onParamChange, paramMap, params, set, set, set, setDefault, setDefault, shouldOwn
- 
Constructor Details- 
LogisticRegression
- 
LogisticRegressionpublic LogisticRegression()
 
- 
- 
Method Details- 
load
- 
read
- 
familyDescription copied from interface:LogisticRegressionParamsParam for the name of family which is a description of the label distribution to be used in the model. Supported options: - "auto": Automatically select the family based on the number of classes: If numClasses == 1 || numClasses == 2, set to "binomial". Else, set to "multinomial" - "binomial": Binary logistic regression with pivoting. - "multinomial": Multinomial logistic (softmax) regression without pivoting. Default is "auto".- Specified by:
- familyin interface- LogisticRegressionParams
- Returns:
- (undocumented)
 
- 
lowerBoundsOnCoefficientsDescription copied from interface:LogisticRegressionParamsThe lower bounds on coefficients if fitting under bound constrained optimization. The bound matrix must be compatible with the shape (1, number of features) for binomial regression, or (number of classes, number of features) for multinomial regression. Otherwise, it throws exception. Default is none.- Specified by:
- lowerBoundsOnCoefficientsin interface- LogisticRegressionParams
- Returns:
- (undocumented)
 
- 
upperBoundsOnCoefficientsDescription copied from interface:LogisticRegressionParamsThe upper bounds on coefficients if fitting under bound constrained optimization. The bound matrix must be compatible with the shape (1, number of features) for binomial regression, or (number of classes, number of features) for multinomial regression. Otherwise, it throws exception. Default is none.- Specified by:
- upperBoundsOnCoefficientsin interface- LogisticRegressionParams
- Returns:
- (undocumented)
 
- 
lowerBoundsOnInterceptsDescription copied from interface:LogisticRegressionParamsThe lower bounds on intercepts if fitting under bound constrained optimization. The bounds vector size must be equal to 1 for binomial regression, or the number of classes for multinomial regression. Otherwise, it throws exception. Default is none.- Specified by:
- lowerBoundsOnInterceptsin interface- LogisticRegressionParams
- Returns:
- (undocumented)
 
- 
upperBoundsOnInterceptsDescription copied from interface:LogisticRegressionParamsThe upper bounds on intercepts if fitting under bound constrained optimization. The bound vector size must be equal to 1 for binomial regression, or the number of classes for multinomial regression. Otherwise, it throws exception. Default is none.- Specified by:
- upperBoundsOnInterceptsin interface- LogisticRegressionParams
- Returns:
- (undocumented)
 
- 
maxBlockSizeInMBDescription copied from interface:HasMaxBlockSizeInMBParam for Maximum memory in MB for stacking input data into blocks. Data is stacked within partitions. If more than remaining data size in a partition then it is adjusted to the data size. Default 0.0 represents choosing optimal value, depends on specific algorithm. Must be >= 0..- Specified by:
- maxBlockSizeInMBin interface- HasMaxBlockSizeInMB
- Returns:
- (undocumented)
 
- 
aggregationDepthDescription copied from interface:HasAggregationDepthParam for suggested depth for treeAggregate (>= 2).- Specified by:
- aggregationDepthin interface- HasAggregationDepth
- Returns:
- (undocumented)
 
- 
thresholdDescription copied from interface:HasThresholdParam for threshold in binary classification prediction, in range [0, 1].- Specified by:
- thresholdin interface- HasThreshold
- Returns:
- (undocumented)
 
- 
weightColDescription copied from interface:HasWeightColParam for weight column name. If this is not set or empty, we treat all instance weights as 1.0.- Specified by:
- weightColin interface- HasWeightCol
- Returns:
- (undocumented)
 
- 
standardizationDescription copied from interface:HasStandardizationParam for whether to standardize the training features before fitting the model.- Specified by:
- standardizationin interface- HasStandardization
- Returns:
- (undocumented)
 
- 
tolDescription copied from interface:HasTolParam for the convergence tolerance for iterative algorithms (>= 0).
- 
fitInterceptDescription copied from interface:HasFitInterceptParam for whether to fit an intercept term.- Specified by:
- fitInterceptin interface- HasFitIntercept
- Returns:
- (undocumented)
 
- 
maxIterDescription copied from interface:HasMaxIterParam for maximum number of iterations (>= 0).- Specified by:
- maxIterin interface- HasMaxIter
- Returns:
- (undocumented)
 
- 
elasticNetParamDescription copied from interface:HasElasticNetParamParam for the ElasticNet mixing parameter, in range [0, 1]. For alpha = 0, the penalty is an L2 penalty. For alpha = 1, it is an L1 penalty.- Specified by:
- elasticNetParamin interface- HasElasticNetParam
- Returns:
- (undocumented)
 
- 
regParamDescription copied from interface:HasRegParamParam for regularization parameter (>= 0).- Specified by:
- regParamin interface- HasRegParam
- Returns:
- (undocumented)
 
- 
uidDescription copied from interface:IdentifiableAn immutable unique ID for the object and its derivatives.- Specified by:
- uidin interface- Identifiable
- Returns:
- (undocumented)
 
- 
setRegParamSet the regularization parameter. Default is 0.0.- Parameters:
- value- (undocumented)
- Returns:
- (undocumented)
 
- 
setElasticNetParamSet the ElasticNet mixing parameter. For alpha = 0, the penalty is an L2 penalty. For alpha = 1, it is an L1 penalty. For alpha in (0,1), the penalty is a combination of L1 and L2. Default is 0.0 which is an L2 penalty.Note: Fitting under bound constrained optimization only supports L2 regularization, so throws exception if this param is non-zero value. - Parameters:
- value- (undocumented)
- Returns:
- (undocumented)
 
- 
setMaxIterSet the maximum number of iterations. Default is 100.- Parameters:
- value- (undocumented)
- Returns:
- (undocumented)
 
- 
setTolSet the convergence tolerance of iterations. Smaller value will lead to higher accuracy at the cost of more iterations. Default is 1E-6.- Parameters:
- value- (undocumented)
- Returns:
- (undocumented)
 
- 
setFitInterceptWhether to fit an intercept term. Default is true.- Parameters:
- value- (undocumented)
- Returns:
- (undocumented)
 
- 
setFamilySets the value of paramfamily(). Default is "auto".- Parameters:
- value- (undocumented)
- Returns:
- (undocumented)
 
- 
setStandardizationWhether to standardize the training features before fitting the model. The coefficients of models will be always returned on the original scale, so it will be transparent for users. Note that with/without standardization, the models should be always converged to the same solution when no regularization is applied. In R's GLMNET package, the default behavior is true as well. Default is true.- Parameters:
- value- (undocumented)
- Returns:
- (undocumented)
 
- 
setThresholdDescription copied from interface:LogisticRegressionParamsSet threshold in binary classification, in range [0, 1].If the estimated probability of class label 1 is greater than threshold, then predict 1, else 0. A high threshold encourages the model to predict 0 more often; a low threshold encourages the model to predict 1 more often. Note: Calling this with threshold p is equivalent to calling setThresholds(Array(1-p, p)). WhensetThreshold()is called, any user-set value forthresholdswill be cleared. If boththresholdandthresholdsare set in a ParamMap, then they must be equivalent.Default is 0.5. - Specified by:
- setThresholdin interface- LogisticRegressionParams
- Parameters:
- value- (undocumented)
- Returns:
- (undocumented)
 
- 
getThresholdpublic double getThreshold()Description copied from interface:LogisticRegressionParamsGet threshold for binary classification.If thresholdsis set with length 2 (i.e., binary classification), this returns the equivalent threshold:
 . Otherwise, returns `threshold` if set, or its default value if unset. @group getParam @throws IllegalArgumentException if `thresholds` is set to an array of length other than 2.1 / (1 + thresholds(0) / thresholds(1))- Specified by:
- getThresholdin interface- HasThreshold
- Specified by:
- getThresholdin interface- LogisticRegressionParams
- Returns:
- (undocumented)
 
- 
setWeightColSets the value of paramweightCol(). If this is not set or empty, we treat all instance weights as 1.0. Default is not set, so all instances have weight one.- Parameters:
- value- (undocumented)
- Returns:
- (undocumented)
 
- 
setThresholdsDescription copied from interface:LogisticRegressionParamsSet thresholds in multiclass (or binary) classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values greater than 0, excepting that at most one value may be 0. The class with largest value p/t is predicted, where p is the original probability of that class and t is the class's threshold.Note: When setThresholds()is called, any user-set value forthresholdwill be cleared. If boththresholdandthresholdsare set in a ParamMap, then they must be equivalent.- Specified by:
- setThresholdsin interface- LogisticRegressionParams
- Overrides:
- setThresholdsin class- ProbabilisticClassifier<Vector,- LogisticRegression, - LogisticRegressionModel> 
- Parameters:
- value- (undocumented)
- Returns:
- (undocumented)
 
- 
getThresholdspublic double[] getThresholds()Description copied from interface:LogisticRegressionParamsGet thresholds for binary or multiclass classification.If thresholdsis set, return its value. Otherwise, ifthresholdis set, return the equivalent thresholds for binary classification: (1-threshold, threshold). If neither are set, throw an exception.- Specified by:
- getThresholdsin interface- HasThresholds
- Specified by:
- getThresholdsin interface- LogisticRegressionParams
- Returns:
- (undocumented)
 
- 
setAggregationDepthSuggested depth for treeAggregate (greater than or equal to 2). If the dimensions of features or the number of partitions are large, this param could be adjusted to a larger size. Default is 2.- Parameters:
- value- (undocumented)
- Returns:
- (undocumented)
 
- 
setLowerBoundsOnCoefficientsSet the lower bounds on coefficients if fitting under bound constrained optimization.- Parameters:
- value- (undocumented)
- Returns:
- (undocumented)
 
- 
setUpperBoundsOnCoefficientsSet the upper bounds on coefficients if fitting under bound constrained optimization.- Parameters:
- value- (undocumented)
- Returns:
- (undocumented)
 
- 
setLowerBoundsOnInterceptsSet the lower bounds on intercepts if fitting under bound constrained optimization.- Parameters:
- value- (undocumented)
- Returns:
- (undocumented)
 
- 
setUpperBoundsOnInterceptsSet the upper bounds on intercepts if fitting under bound constrained optimization.- Parameters:
- value- (undocumented)
- Returns:
- (undocumented)
 
- 
setMaxBlockSizeInMBSets the value of parammaxBlockSizeInMB(). Default is 0.0, then 1.0 MB will be chosen.- Parameters:
- value- (undocumented)
- Returns:
- (undocumented)
 
- 
setInitialModel
- 
copyDescription copied from interface:ParamsCreates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. SeedefaultCopy().- Specified by:
- copyin interface- Params
- Specified by:
- copyin class- Predictor<Vector,- LogisticRegression, - LogisticRegressionModel> 
- Parameters:
- extra- (undocumented)
- Returns:
- (undocumented)
 
 
-