public class LogisticRegression extends ProbabilisticClassifier<Vector,LogisticRegression,LogisticRegressionModel> implements LogisticRegressionParams, DefaultParamsWritable, org.apache.spark.internal.Logging
This class supports fitting traditional logistic regression model by LBFGS/OWLQN and bound (box) constrained logistic regression model by LBFGSB.
Since 3.1.0, it supports stacking instances into blocks and using GEMV/GEMM for better performance. The block size will be 1.0 MB, if param maxBlockSizeInMB is set 0.0 by default.
Constructor and Description |
---|
LogisticRegression() |
LogisticRegression(String uid) |
Modifier and Type | Method and Description |
---|---|
IntParam |
aggregationDepth()
Param for suggested depth for treeAggregate (>= 2).
|
LogisticRegression |
copy(ParamMap extra)
Creates a copy of this instance with the same UID and some extra params.
|
DoubleParam |
elasticNetParam()
Param for the ElasticNet mixing parameter, in range [0, 1].
|
Param<String> |
family()
Param for the name of family which is a description of the label distribution
to be used in the model.
|
BooleanParam |
fitIntercept()
Param for whether to fit an intercept term.
|
double |
getThreshold()
Get threshold for binary classification.
|
double[] |
getThresholds()
Get thresholds for binary or multiclass classification.
|
static LogisticRegression |
load(String path) |
Param<Matrix> |
lowerBoundsOnCoefficients()
The lower bounds on coefficients if fitting under bound constrained optimization.
|
Param<Vector> |
lowerBoundsOnIntercepts()
The lower bounds on intercepts if fitting under bound constrained optimization.
|
DoubleParam |
maxBlockSizeInMB()
Param for Maximum memory in MB for stacking input data into blocks.
|
IntParam |
maxIter()
Param for maximum number of iterations (>= 0).
|
static MLReader<T> |
read() |
DoubleParam |
regParam()
Param for regularization parameter (>= 0).
|
LogisticRegression |
setAggregationDepth(int value)
Suggested depth for treeAggregate (greater than or equal to 2).
|
LogisticRegression |
setElasticNetParam(double value)
Set the ElasticNet mixing parameter.
|
LogisticRegression |
setFamily(String value)
Sets the value of param
family . |
LogisticRegression |
setFitIntercept(boolean value)
Whether to fit an intercept term.
|
LogisticRegression |
setInitialModel(LogisticRegressionModel model) |
LogisticRegression |
setLowerBoundsOnCoefficients(Matrix value)
Set the lower bounds on coefficients if fitting under bound constrained optimization.
|
LogisticRegression |
setLowerBoundsOnIntercepts(Vector value)
Set the lower bounds on intercepts if fitting under bound constrained optimization.
|
LogisticRegression |
setMaxBlockSizeInMB(double value)
Sets the value of param
maxBlockSizeInMB . |
LogisticRegression |
setMaxIter(int value)
Set the maximum number of iterations.
|
LogisticRegression |
setRegParam(double value)
Set the regularization parameter.
|
LogisticRegression |
setStandardization(boolean value)
Whether to standardize the training features before fitting the model.
|
LogisticRegression |
setThreshold(double value)
Set threshold in binary classification, in range [0, 1].
|
LogisticRegression |
setThresholds(double[] value)
Set thresholds in multiclass (or binary) classification to adjust the probability of
predicting each class.
|
LogisticRegression |
setTol(double value)
Set the convergence tolerance of iterations.
|
LogisticRegression |
setUpperBoundsOnCoefficients(Matrix value)
Set the upper bounds on coefficients if fitting under bound constrained optimization.
|
LogisticRegression |
setUpperBoundsOnIntercepts(Vector value)
Set the upper bounds on intercepts if fitting under bound constrained optimization.
|
LogisticRegression |
setWeightCol(String value)
Sets the value of param
weightCol . |
BooleanParam |
standardization()
Param for whether to standardize the training features before fitting the model.
|
DoubleParam |
threshold()
Param for threshold in binary classification prediction, in range [0, 1].
|
DoubleParam |
tol()
Param for the convergence tolerance for iterative algorithms (>= 0).
|
String |
uid()
An immutable unique ID for the object and its derivatives.
|
Param<Matrix> |
upperBoundsOnCoefficients()
The upper bounds on coefficients if fitting under bound constrained optimization.
|
Param<Vector> |
upperBoundsOnIntercepts()
The upper bounds on intercepts if fitting under bound constrained optimization.
|
Param<String> |
weightCol()
Param for weight column name.
|
probabilityCol, setProbabilityCol, thresholds
rawPredictionCol, setRawPredictionCol
featuresCol, fit, labelCol, predictionCol, setFeaturesCol, setLabelCol, setPredictionCol, transformSchema
params
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
checkThresholdConsistency, getFamily, getLowerBoundsOnCoefficients, getLowerBoundsOnIntercepts, getUpperBoundsOnCoefficients, getUpperBoundsOnIntercepts, usingBoundConstrainedOptimization, validateAndTransformSchema
getLabelCol, labelCol
featuresCol, getFeaturesCol
getPredictionCol, predictionCol
clear, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, onParamChange, paramMap, params, set, set, set, setDefault, setDefault, shouldOwn
toString
getRawPredictionCol, rawPredictionCol
getProbabilityCol, probabilityCol
thresholds
getRegParam
getElasticNetParam
getMaxIter
getFitIntercept
getStandardization
getWeightCol
getAggregationDepth
getMaxBlockSizeInMB
write
save
$init$, initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, initLock, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$internal$Logging$$log__$eq, org$apache$spark$internal$Logging$$log_, uninitialize
public LogisticRegression(String uid)
public LogisticRegression()
public static LogisticRegression load(String path)
public static MLReader<T> read()
public final Param<String> family()
LogisticRegressionParams
family
in interface LogisticRegressionParams
public Param<Matrix> lowerBoundsOnCoefficients()
LogisticRegressionParams
lowerBoundsOnCoefficients
in interface LogisticRegressionParams
public Param<Matrix> upperBoundsOnCoefficients()
LogisticRegressionParams
upperBoundsOnCoefficients
in interface LogisticRegressionParams
public Param<Vector> lowerBoundsOnIntercepts()
LogisticRegressionParams
lowerBoundsOnIntercepts
in interface LogisticRegressionParams
public Param<Vector> upperBoundsOnIntercepts()
LogisticRegressionParams
upperBoundsOnIntercepts
in interface LogisticRegressionParams
public final DoubleParam maxBlockSizeInMB()
HasMaxBlockSizeInMB
maxBlockSizeInMB
in interface HasMaxBlockSizeInMB
public final IntParam aggregationDepth()
HasAggregationDepth
aggregationDepth
in interface HasAggregationDepth
public DoubleParam threshold()
HasThreshold
threshold
in interface HasThreshold
public final Param<String> weightCol()
HasWeightCol
weightCol
in interface HasWeightCol
public final BooleanParam standardization()
HasStandardization
standardization
in interface HasStandardization
public final DoubleParam tol()
HasTol
public final BooleanParam fitIntercept()
HasFitIntercept
fitIntercept
in interface HasFitIntercept
public final IntParam maxIter()
HasMaxIter
maxIter
in interface HasMaxIter
public final DoubleParam elasticNetParam()
HasElasticNetParam
elasticNetParam
in interface HasElasticNetParam
public final DoubleParam regParam()
HasRegParam
regParam
in interface HasRegParam
public String uid()
Identifiable
uid
in interface Identifiable
public LogisticRegression setRegParam(double value)
value
- (undocumented)public LogisticRegression setElasticNetParam(double value)
Note: Fitting under bound constrained optimization only supports L2 regularization, so throws exception if this param is non-zero value.
value
- (undocumented)public LogisticRegression setMaxIter(int value)
value
- (undocumented)public LogisticRegression setTol(double value)
value
- (undocumented)public LogisticRegression setFitIntercept(boolean value)
value
- (undocumented)public LogisticRegression setFamily(String value)
family
.
Default is "auto".
value
- (undocumented)public LogisticRegression setStandardization(boolean value)
value
- (undocumented)public LogisticRegression setThreshold(double value)
LogisticRegressionParams
If the estimated probability of class label 1 is greater than threshold, then predict 1, else 0. A high threshold encourages the model to predict 0 more often; a low threshold encourages the model to predict 1 more often.
Note: Calling this with threshold p is equivalent to calling setThresholds(Array(1-p, p))
.
When setThreshold()
is called, any user-set value for thresholds
will be cleared.
If both threshold
and thresholds
are set in a ParamMap, then they must be
equivalent.
Default is 0.5.
setThreshold
in interface LogisticRegressionParams
value
- (undocumented)public double getThreshold()
LogisticRegressionParams
If thresholds
is set with length 2 (i.e., binary classification),
this returns the equivalent threshold:
1 / (1 + thresholds(0) / thresholds(1))
.
Otherwise, returns `threshold` if set, or its default value if unset.
@group getParam
@throws IllegalArgumentException if `thresholds` is set to an array of length other than 2.getThreshold
in interface LogisticRegressionParams
getThreshold
in interface HasThreshold
public LogisticRegression setWeightCol(String value)
weightCol
.
If this is not set or empty, we treat all instance weights as 1.0.
Default is not set, so all instances have weight one.
value
- (undocumented)public LogisticRegression setThresholds(double[] value)
LogisticRegressionParams
Note: When setThresholds()
is called, any user-set value for threshold
will be cleared.
If both threshold
and thresholds
are set in a ParamMap, then they must be
equivalent.
setThresholds
in interface LogisticRegressionParams
setThresholds
in class ProbabilisticClassifier<Vector,LogisticRegression,LogisticRegressionModel>
value
- (undocumented)public double[] getThresholds()
LogisticRegressionParams
If thresholds
is set, return its value.
Otherwise, if threshold
is set, return the equivalent thresholds for binary
classification: (1-threshold, threshold).
If neither are set, throw an exception.
getThresholds
in interface LogisticRegressionParams
getThresholds
in interface HasThresholds
public LogisticRegression setAggregationDepth(int value)
value
- (undocumented)public LogisticRegression setLowerBoundsOnCoefficients(Matrix value)
value
- (undocumented)public LogisticRegression setUpperBoundsOnCoefficients(Matrix value)
value
- (undocumented)public LogisticRegression setLowerBoundsOnIntercepts(Vector value)
value
- (undocumented)public LogisticRegression setUpperBoundsOnIntercepts(Vector value)
value
- (undocumented)public LogisticRegression setMaxBlockSizeInMB(double value)
maxBlockSizeInMB
.
Default is 0.0, then 1.0 MB will be chosen.
value
- (undocumented)public LogisticRegression setInitialModel(LogisticRegressionModel model)
public LogisticRegression copy(ParamMap extra)
Params
defaultCopy()
.copy
in interface Params
copy
in class Predictor<Vector,LogisticRegression,LogisticRegressionModel>
extra
- (undocumented)