Class LogisticRegression
- All Implemented Interfaces:
Serializable
,org.apache.spark.internal.Logging
,ClassifierParams
,LogisticRegressionParams
,ProbabilisticClassifierParams
,Params
,HasAggregationDepth
,HasElasticNetParam
,HasFeaturesCol
,HasFitIntercept
,HasLabelCol
,HasMaxBlockSizeInMB
,HasMaxIter
,HasPredictionCol
,HasProbabilityCol
,HasRawPredictionCol
,HasRegParam
,HasStandardization
,HasThreshold
,HasThresholds
,HasTol
,HasWeightCol
,PredictorParams
,DefaultParamsWritable
,Identifiable
,MLWritable
This class supports fitting traditional logistic regression model by LBFGS/OWLQN and bound (box) constrained logistic regression model by LBFGSB.
Since 3.1.0, it supports stacking instances into blocks and using GEMV/GEMM for better performance. The block size will be 1.0 MB, if param maxBlockSizeInMB is set 0.0 by default.
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging
org.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionfinal IntParam
Param for suggested depth for treeAggregate (>= 2).Creates a copy of this instance with the same UID and some extra params.final DoubleParam
Param for the ElasticNet mixing parameter, in range [0, 1].family()
Param for the name of family which is a description of the label distribution to be used in the model.final BooleanParam
Param for whether to fit an intercept term.double
Get threshold for binary classification.double[]
Get thresholds for binary or multiclass classification.static LogisticRegression
The lower bounds on coefficients if fitting under bound constrained optimization.The lower bounds on intercepts if fitting under bound constrained optimization.final DoubleParam
Param for Maximum memory in MB for stacking input data into blocks.final IntParam
maxIter()
Param for maximum number of iterations (>= 0).static MLReader<T>
read()
final DoubleParam
regParam()
Param for regularization parameter (>= 0).setAggregationDepth
(int value) Suggested depth for treeAggregate (greater than or equal to 2).setElasticNetParam
(double value) Set the ElasticNet mixing parameter.Sets the value of paramfamily()
.setFitIntercept
(boolean value) Whether to fit an intercept term.Set the lower bounds on coefficients if fitting under bound constrained optimization.setLowerBoundsOnIntercepts
(Vector value) Set the lower bounds on intercepts if fitting under bound constrained optimization.setMaxBlockSizeInMB
(double value) Sets the value of parammaxBlockSizeInMB()
.setMaxIter
(int value) Set the maximum number of iterations.setRegParam
(double value) Set the regularization parameter.setStandardization
(boolean value) Whether to standardize the training features before fitting the model.setThreshold
(double value) Set threshold in binary classification, in range [0, 1].setThresholds
(double[] value) Set thresholds in multiclass (or binary) classification to adjust the probability of predicting each class.setTol
(double value) Set the convergence tolerance of iterations.Set the upper bounds on coefficients if fitting under bound constrained optimization.setUpperBoundsOnIntercepts
(Vector value) Set the upper bounds on intercepts if fitting under bound constrained optimization.setWeightCol
(String value) Sets the value of paramweightCol()
.final BooleanParam
Param for whether to standardize the training features before fitting the model.Param for threshold in binary classification prediction, in range [0, 1].final DoubleParam
tol()
Param for the convergence tolerance for iterative algorithms (>= 0).uid()
An immutable unique ID for the object and its derivatives.The upper bounds on coefficients if fitting under bound constrained optimization.The upper bounds on intercepts if fitting under bound constrained optimization.Param for weight column name.Methods inherited from class org.apache.spark.ml.classification.ProbabilisticClassifier
probabilityCol, setProbabilityCol, thresholds
Methods inherited from class org.apache.spark.ml.classification.Classifier
rawPredictionCol, setRawPredictionCol
Methods inherited from class org.apache.spark.ml.Predictor
featuresCol, fit, labelCol, predictionCol, setFeaturesCol, setLabelCol, setPredictionCol, transformSchema
Methods inherited from class org.apache.spark.ml.PipelineStage
params
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.apache.spark.ml.util.DefaultParamsWritable
write
Methods inherited from interface org.apache.spark.ml.param.shared.HasAggregationDepth
getAggregationDepth
Methods inherited from interface org.apache.spark.ml.param.shared.HasElasticNetParam
getElasticNetParam
Methods inherited from interface org.apache.spark.ml.param.shared.HasFeaturesCol
featuresCol, getFeaturesCol
Methods inherited from interface org.apache.spark.ml.param.shared.HasFitIntercept
getFitIntercept
Methods inherited from interface org.apache.spark.ml.param.shared.HasLabelCol
getLabelCol, labelCol
Methods inherited from interface org.apache.spark.ml.param.shared.HasMaxBlockSizeInMB
getMaxBlockSizeInMB
Methods inherited from interface org.apache.spark.ml.param.shared.HasMaxIter
getMaxIter
Methods inherited from interface org.apache.spark.ml.param.shared.HasPredictionCol
getPredictionCol, predictionCol
Methods inherited from interface org.apache.spark.ml.param.shared.HasProbabilityCol
getProbabilityCol, probabilityCol
Methods inherited from interface org.apache.spark.ml.param.shared.HasRawPredictionCol
getRawPredictionCol, rawPredictionCol
Methods inherited from interface org.apache.spark.ml.param.shared.HasRegParam
getRegParam
Methods inherited from interface org.apache.spark.ml.param.shared.HasStandardization
getStandardization
Methods inherited from interface org.apache.spark.ml.param.shared.HasThresholds
thresholds
Methods inherited from interface org.apache.spark.ml.param.shared.HasWeightCol
getWeightCol
Methods inherited from interface org.apache.spark.ml.util.Identifiable
toString
Methods inherited from interface org.apache.spark.internal.Logging
initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContext
Methods inherited from interface org.apache.spark.ml.classification.LogisticRegressionParams
checkThresholdConsistency, getFamily, getLowerBoundsOnCoefficients, getLowerBoundsOnIntercepts, getUpperBoundsOnCoefficients, getUpperBoundsOnIntercepts, usingBoundConstrainedOptimization, validateAndTransformSchema
Methods inherited from interface org.apache.spark.ml.util.MLWritable
save
Methods inherited from interface org.apache.spark.ml.param.Params
clear, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, onParamChange, paramMap, params, set, set, set, setDefault, setDefault, shouldOwn
-
Constructor Details
-
LogisticRegression
-
LogisticRegression
public LogisticRegression()
-
-
Method Details
-
load
-
read
-
family
Description copied from interface:LogisticRegressionParams
Param for the name of family which is a description of the label distribution to be used in the model. Supported options: - "auto": Automatically select the family based on the number of classes: If numClasses == 1 || numClasses == 2, set to "binomial". Else, set to "multinomial" - "binomial": Binary logistic regression with pivoting. - "multinomial": Multinomial logistic (softmax) regression without pivoting. Default is "auto".- Specified by:
family
in interfaceLogisticRegressionParams
- Returns:
- (undocumented)
-
lowerBoundsOnCoefficients
Description copied from interface:LogisticRegressionParams
The lower bounds on coefficients if fitting under bound constrained optimization. The bound matrix must be compatible with the shape (1, number of features) for binomial regression, or (number of classes, number of features) for multinomial regression. Otherwise, it throws exception. Default is none.- Specified by:
lowerBoundsOnCoefficients
in interfaceLogisticRegressionParams
- Returns:
- (undocumented)
-
upperBoundsOnCoefficients
Description copied from interface:LogisticRegressionParams
The upper bounds on coefficients if fitting under bound constrained optimization. The bound matrix must be compatible with the shape (1, number of features) for binomial regression, or (number of classes, number of features) for multinomial regression. Otherwise, it throws exception. Default is none.- Specified by:
upperBoundsOnCoefficients
in interfaceLogisticRegressionParams
- Returns:
- (undocumented)
-
lowerBoundsOnIntercepts
Description copied from interface:LogisticRegressionParams
The lower bounds on intercepts if fitting under bound constrained optimization. The bounds vector size must be equal to 1 for binomial regression, or the number of classes for multinomial regression. Otherwise, it throws exception. Default is none.- Specified by:
lowerBoundsOnIntercepts
in interfaceLogisticRegressionParams
- Returns:
- (undocumented)
-
upperBoundsOnIntercepts
Description copied from interface:LogisticRegressionParams
The upper bounds on intercepts if fitting under bound constrained optimization. The bound vector size must be equal to 1 for binomial regression, or the number of classes for multinomial regression. Otherwise, it throws exception. Default is none.- Specified by:
upperBoundsOnIntercepts
in interfaceLogisticRegressionParams
- Returns:
- (undocumented)
-
maxBlockSizeInMB
Description copied from interface:HasMaxBlockSizeInMB
Param for Maximum memory in MB for stacking input data into blocks. Data is stacked within partitions. If more than remaining data size in a partition then it is adjusted to the data size. Default 0.0 represents choosing optimal value, depends on specific algorithm. Must be >= 0..- Specified by:
maxBlockSizeInMB
in interfaceHasMaxBlockSizeInMB
- Returns:
- (undocumented)
-
aggregationDepth
Description copied from interface:HasAggregationDepth
Param for suggested depth for treeAggregate (>= 2).- Specified by:
aggregationDepth
in interfaceHasAggregationDepth
- Returns:
- (undocumented)
-
threshold
Description copied from interface:HasThreshold
Param for threshold in binary classification prediction, in range [0, 1].- Specified by:
threshold
in interfaceHasThreshold
- Returns:
- (undocumented)
-
weightCol
Description copied from interface:HasWeightCol
Param for weight column name. If this is not set or empty, we treat all instance weights as 1.0.- Specified by:
weightCol
in interfaceHasWeightCol
- Returns:
- (undocumented)
-
standardization
Description copied from interface:HasStandardization
Param for whether to standardize the training features before fitting the model.- Specified by:
standardization
in interfaceHasStandardization
- Returns:
- (undocumented)
-
tol
Description copied from interface:HasTol
Param for the convergence tolerance for iterative algorithms (>= 0). -
fitIntercept
Description copied from interface:HasFitIntercept
Param for whether to fit an intercept term.- Specified by:
fitIntercept
in interfaceHasFitIntercept
- Returns:
- (undocumented)
-
maxIter
Description copied from interface:HasMaxIter
Param for maximum number of iterations (>= 0).- Specified by:
maxIter
in interfaceHasMaxIter
- Returns:
- (undocumented)
-
elasticNetParam
Description copied from interface:HasElasticNetParam
Param for the ElasticNet mixing parameter, in range [0, 1]. For alpha = 0, the penalty is an L2 penalty. For alpha = 1, it is an L1 penalty.- Specified by:
elasticNetParam
in interfaceHasElasticNetParam
- Returns:
- (undocumented)
-
regParam
Description copied from interface:HasRegParam
Param for regularization parameter (>= 0).- Specified by:
regParam
in interfaceHasRegParam
- Returns:
- (undocumented)
-
uid
Description copied from interface:Identifiable
An immutable unique ID for the object and its derivatives.- Specified by:
uid
in interfaceIdentifiable
- Returns:
- (undocumented)
-
setRegParam
Set the regularization parameter. Default is 0.0.- Parameters:
value
- (undocumented)- Returns:
- (undocumented)
-
setElasticNetParam
Set the ElasticNet mixing parameter. For alpha = 0, the penalty is an L2 penalty. For alpha = 1, it is an L1 penalty. For alpha in (0,1), the penalty is a combination of L1 and L2. Default is 0.0 which is an L2 penalty.Note: Fitting under bound constrained optimization only supports L2 regularization, so throws exception if this param is non-zero value.
- Parameters:
value
- (undocumented)- Returns:
- (undocumented)
-
setMaxIter
Set the maximum number of iterations. Default is 100.- Parameters:
value
- (undocumented)- Returns:
- (undocumented)
-
setTol
Set the convergence tolerance of iterations. Smaller value will lead to higher accuracy at the cost of more iterations. Default is 1E-6.- Parameters:
value
- (undocumented)- Returns:
- (undocumented)
-
setFitIntercept
Whether to fit an intercept term. Default is true.- Parameters:
value
- (undocumented)- Returns:
- (undocumented)
-
setFamily
Sets the value of paramfamily()
. Default is "auto".- Parameters:
value
- (undocumented)- Returns:
- (undocumented)
-
setStandardization
Whether to standardize the training features before fitting the model. The coefficients of models will be always returned on the original scale, so it will be transparent for users. Note that with/without standardization, the models should be always converged to the same solution when no regularization is applied. In R's GLMNET package, the default behavior is true as well. Default is true.- Parameters:
value
- (undocumented)- Returns:
- (undocumented)
-
setThreshold
Description copied from interface:LogisticRegressionParams
Set threshold in binary classification, in range [0, 1].If the estimated probability of class label 1 is greater than threshold, then predict 1, else 0. A high threshold encourages the model to predict 0 more often; a low threshold encourages the model to predict 1 more often.
Note: Calling this with threshold p is equivalent to calling
setThresholds(Array(1-p, p))
. WhensetThreshold()
is called, any user-set value forthresholds
will be cleared. If boththreshold
andthresholds
are set in a ParamMap, then they must be equivalent.Default is 0.5.
- Specified by:
setThreshold
in interfaceLogisticRegressionParams
- Parameters:
value
- (undocumented)- Returns:
- (undocumented)
-
getThreshold
public double getThreshold()Description copied from interface:LogisticRegressionParams
Get threshold for binary classification.If
thresholds
is set with length 2 (i.e., binary classification), this returns the equivalent threshold:
. Otherwise, returns `threshold` if set, or its default value if unset. @group getParam @throws IllegalArgumentException if `thresholds` is set to an array of length other than 2.1 / (1 + thresholds(0) / thresholds(1))
- Specified by:
getThreshold
in interfaceHasThreshold
- Specified by:
getThreshold
in interfaceLogisticRegressionParams
- Returns:
- (undocumented)
-
setWeightCol
Sets the value of paramweightCol()
. If this is not set or empty, we treat all instance weights as 1.0. Default is not set, so all instances have weight one.- Parameters:
value
- (undocumented)- Returns:
- (undocumented)
-
setThresholds
Description copied from interface:LogisticRegressionParams
Set thresholds in multiclass (or binary) classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values greater than 0, excepting that at most one value may be 0. The class with largest value p/t is predicted, where p is the original probability of that class and t is the class's threshold.Note: When
setThresholds()
is called, any user-set value forthreshold
will be cleared. If boththreshold
andthresholds
are set in a ParamMap, then they must be equivalent.- Specified by:
setThresholds
in interfaceLogisticRegressionParams
- Overrides:
setThresholds
in classProbabilisticClassifier<Vector,
LogisticRegression, LogisticRegressionModel> - Parameters:
value
- (undocumented)- Returns:
- (undocumented)
-
getThresholds
public double[] getThresholds()Description copied from interface:LogisticRegressionParams
Get thresholds for binary or multiclass classification.If
thresholds
is set, return its value. Otherwise, ifthreshold
is set, return the equivalent thresholds for binary classification: (1-threshold, threshold). If neither are set, throw an exception.- Specified by:
getThresholds
in interfaceHasThresholds
- Specified by:
getThresholds
in interfaceLogisticRegressionParams
- Returns:
- (undocumented)
-
setAggregationDepth
Suggested depth for treeAggregate (greater than or equal to 2). If the dimensions of features or the number of partitions are large, this param could be adjusted to a larger size. Default is 2.- Parameters:
value
- (undocumented)- Returns:
- (undocumented)
-
setLowerBoundsOnCoefficients
Set the lower bounds on coefficients if fitting under bound constrained optimization.- Parameters:
value
- (undocumented)- Returns:
- (undocumented)
-
setUpperBoundsOnCoefficients
Set the upper bounds on coefficients if fitting under bound constrained optimization.- Parameters:
value
- (undocumented)- Returns:
- (undocumented)
-
setLowerBoundsOnIntercepts
Set the lower bounds on intercepts if fitting under bound constrained optimization.- Parameters:
value
- (undocumented)- Returns:
- (undocumented)
-
setUpperBoundsOnIntercepts
Set the upper bounds on intercepts if fitting under bound constrained optimization.- Parameters:
value
- (undocumented)- Returns:
- (undocumented)
-
setMaxBlockSizeInMB
Sets the value of parammaxBlockSizeInMB()
. Default is 0.0, then 1.0 MB will be chosen.- Parameters:
value
- (undocumented)- Returns:
- (undocumented)
-
setInitialModel
-
copy
Description copied from interface:Params
Creates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. SeedefaultCopy()
.- Specified by:
copy
in interfaceParams
- Specified by:
copy
in classPredictor<Vector,
LogisticRegression, LogisticRegressionModel> - Parameters:
extra
- (undocumented)- Returns:
- (undocumented)
-