public class LogisticRegression extends ProbabilisticClassifier<Vector,LogisticRegression,LogisticRegressionModel> implements LogisticRegressionParams, DefaultParamsWritable, org.apache.spark.internal.Logging
This class supports fitting traditional logistic regression model by LBFGS/OWLQN and bound (box) constrained logistic regression model by LBFGSB.
Since 3.1.0, it supports stacking instances into blocks and using GEMV/GEMM for better performance. The block size will be 1.0 MB, if param maxBlockSizeInMB is set 0.0 by default.
| Constructor and Description |
|---|
LogisticRegression() |
LogisticRegression(String uid) |
| Modifier and Type | Method and Description |
|---|---|
IntParam |
aggregationDepth()
Param for suggested depth for treeAggregate (>= 2).
|
LogisticRegression |
copy(ParamMap extra)
Creates a copy of this instance with the same UID and some extra params.
|
DoubleParam |
elasticNetParam()
Param for the ElasticNet mixing parameter, in range [0, 1].
|
Param<String> |
family()
Param for the name of family which is a description of the label distribution
to be used in the model.
|
BooleanParam |
fitIntercept()
Param for whether to fit an intercept term.
|
double |
getThreshold()
Get threshold for binary classification.
|
double[] |
getThresholds()
Get thresholds for binary or multiclass classification.
|
static LogisticRegression |
load(String path) |
Param<Matrix> |
lowerBoundsOnCoefficients()
The lower bounds on coefficients if fitting under bound constrained optimization.
|
Param<Vector> |
lowerBoundsOnIntercepts()
The lower bounds on intercepts if fitting under bound constrained optimization.
|
DoubleParam |
maxBlockSizeInMB()
Param for Maximum memory in MB for stacking input data into blocks.
|
IntParam |
maxIter()
Param for maximum number of iterations (>= 0).
|
static MLReader<T> |
read() |
DoubleParam |
regParam()
Param for regularization parameter (>= 0).
|
LogisticRegression |
setAggregationDepth(int value)
Suggested depth for treeAggregate (greater than or equal to 2).
|
LogisticRegression |
setElasticNetParam(double value)
Set the ElasticNet mixing parameter.
|
LogisticRegression |
setFamily(String value)
Sets the value of param
family. |
LogisticRegression |
setFitIntercept(boolean value)
Whether to fit an intercept term.
|
LogisticRegression |
setInitialModel(LogisticRegressionModel model) |
LogisticRegression |
setLowerBoundsOnCoefficients(Matrix value)
Set the lower bounds on coefficients if fitting under bound constrained optimization.
|
LogisticRegression |
setLowerBoundsOnIntercepts(Vector value)
Set the lower bounds on intercepts if fitting under bound constrained optimization.
|
LogisticRegression |
setMaxBlockSizeInMB(double value)
Sets the value of param
maxBlockSizeInMB. |
LogisticRegression |
setMaxIter(int value)
Set the maximum number of iterations.
|
LogisticRegression |
setRegParam(double value)
Set the regularization parameter.
|
LogisticRegression |
setStandardization(boolean value)
Whether to standardize the training features before fitting the model.
|
LogisticRegression |
setThreshold(double value)
Set threshold in binary classification, in range [0, 1].
|
LogisticRegression |
setThresholds(double[] value)
Set thresholds in multiclass (or binary) classification to adjust the probability of
predicting each class.
|
LogisticRegression |
setTol(double value)
Set the convergence tolerance of iterations.
|
LogisticRegression |
setUpperBoundsOnCoefficients(Matrix value)
Set the upper bounds on coefficients if fitting under bound constrained optimization.
|
LogisticRegression |
setUpperBoundsOnIntercepts(Vector value)
Set the upper bounds on intercepts if fitting under bound constrained optimization.
|
LogisticRegression |
setWeightCol(String value)
Sets the value of param
weightCol. |
BooleanParam |
standardization()
Param for whether to standardize the training features before fitting the model.
|
DoubleParam |
threshold()
Param for threshold in binary classification prediction, in range [0, 1].
|
DoubleParam |
tol()
Param for the convergence tolerance for iterative algorithms (>= 0).
|
String |
uid()
An immutable unique ID for the object and its derivatives.
|
Param<Matrix> |
upperBoundsOnCoefficients()
The upper bounds on coefficients if fitting under bound constrained optimization.
|
Param<Vector> |
upperBoundsOnIntercepts()
The upper bounds on intercepts if fitting under bound constrained optimization.
|
Param<String> |
weightCol()
Param for weight column name.
|
probabilityCol, setProbabilityCol, thresholdsrawPredictionCol, setRawPredictionColfeaturesCol, fit, labelCol, predictionCol, setFeaturesCol, setLabelCol, setPredictionCol, transformSchemaparamsequals, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitcheckThresholdConsistency, getFamily, getLowerBoundsOnCoefficients, getLowerBoundsOnIntercepts, getUpperBoundsOnCoefficients, getUpperBoundsOnIntercepts, usingBoundConstrainedOptimization, validateAndTransformSchemagetLabelCol, labelColfeaturesCol, getFeaturesColgetPredictionCol, predictionColclear, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, onParamChange, paramMap, params, set, set, set, setDefault, setDefault, shouldOwntoStringgetRawPredictionCol, rawPredictionColgetProbabilityCol, probabilityColthresholdsgetRegParamgetElasticNetParamgetMaxItergetFitInterceptgetStandardizationgetWeightColgetAggregationDepthgetMaxBlockSizeInMBwritesave$init$, initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, initLock, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$internal$Logging$$log__$eq, org$apache$spark$internal$Logging$$log_, uninitializepublic LogisticRegression(String uid)
public LogisticRegression()
public static LogisticRegression load(String path)
public static MLReader<T> read()
public final Param<String> family()
LogisticRegressionParamsfamily in interface LogisticRegressionParamspublic Param<Matrix> lowerBoundsOnCoefficients()
LogisticRegressionParamslowerBoundsOnCoefficients in interface LogisticRegressionParamspublic Param<Matrix> upperBoundsOnCoefficients()
LogisticRegressionParamsupperBoundsOnCoefficients in interface LogisticRegressionParamspublic Param<Vector> lowerBoundsOnIntercepts()
LogisticRegressionParamslowerBoundsOnIntercepts in interface LogisticRegressionParamspublic Param<Vector> upperBoundsOnIntercepts()
LogisticRegressionParamsupperBoundsOnIntercepts in interface LogisticRegressionParamspublic final DoubleParam maxBlockSizeInMB()
HasMaxBlockSizeInMBmaxBlockSizeInMB in interface HasMaxBlockSizeInMBpublic final IntParam aggregationDepth()
HasAggregationDepthaggregationDepth in interface HasAggregationDepthpublic DoubleParam threshold()
HasThresholdthreshold in interface HasThresholdpublic final Param<String> weightCol()
HasWeightColweightCol in interface HasWeightColpublic final BooleanParam standardization()
HasStandardizationstandardization in interface HasStandardizationpublic final DoubleParam tol()
HasTolpublic final BooleanParam fitIntercept()
HasFitInterceptfitIntercept in interface HasFitInterceptpublic final IntParam maxIter()
HasMaxItermaxIter in interface HasMaxIterpublic final DoubleParam elasticNetParam()
HasElasticNetParamelasticNetParam in interface HasElasticNetParampublic final DoubleParam regParam()
HasRegParamregParam in interface HasRegParampublic String uid()
Identifiableuid in interface Identifiablepublic LogisticRegression setRegParam(double value)
value - (undocumented)public LogisticRegression setElasticNetParam(double value)
Note: Fitting under bound constrained optimization only supports L2 regularization, so throws exception if this param is non-zero value.
value - (undocumented)public LogisticRegression setMaxIter(int value)
value - (undocumented)public LogisticRegression setTol(double value)
value - (undocumented)public LogisticRegression setFitIntercept(boolean value)
value - (undocumented)public LogisticRegression setFamily(String value)
family.
Default is "auto".
value - (undocumented)public LogisticRegression setStandardization(boolean value)
value - (undocumented)public LogisticRegression setThreshold(double value)
LogisticRegressionParamsIf the estimated probability of class label 1 is greater than threshold, then predict 1, else 0. A high threshold encourages the model to predict 0 more often; a low threshold encourages the model to predict 1 more often.
Note: Calling this with threshold p is equivalent to calling setThresholds(Array(1-p, p)).
When setThreshold() is called, any user-set value for thresholds will be cleared.
If both threshold and thresholds are set in a ParamMap, then they must be
equivalent.
Default is 0.5.
setThreshold in interface LogisticRegressionParamsvalue - (undocumented)public double getThreshold()
LogisticRegressionParams
If thresholds is set with length 2 (i.e., binary classification),
this returns the equivalent threshold:
1 / (1 + thresholds(0) / thresholds(1)).
Otherwise, returns `threshold` if set, or its default value if unset.
@group getParam
@throws IllegalArgumentException if `thresholds` is set to an array of length other than 2.getThreshold in interface LogisticRegressionParamsgetThreshold in interface HasThresholdpublic LogisticRegression setWeightCol(String value)
weightCol.
If this is not set or empty, we treat all instance weights as 1.0.
Default is not set, so all instances have weight one.
value - (undocumented)public LogisticRegression setThresholds(double[] value)
LogisticRegressionParams
Note: When setThresholds() is called, any user-set value for threshold will be cleared.
If both threshold and thresholds are set in a ParamMap, then they must be
equivalent.
setThresholds in interface LogisticRegressionParamssetThresholds in class ProbabilisticClassifier<Vector,LogisticRegression,LogisticRegressionModel>value - (undocumented)public double[] getThresholds()
LogisticRegressionParams
If thresholds is set, return its value.
Otherwise, if threshold is set, return the equivalent thresholds for binary
classification: (1-threshold, threshold).
If neither are set, throw an exception.
getThresholds in interface LogisticRegressionParamsgetThresholds in interface HasThresholdspublic LogisticRegression setAggregationDepth(int value)
value - (undocumented)public LogisticRegression setLowerBoundsOnCoefficients(Matrix value)
value - (undocumented)public LogisticRegression setUpperBoundsOnCoefficients(Matrix value)
value - (undocumented)public LogisticRegression setLowerBoundsOnIntercepts(Vector value)
value - (undocumented)public LogisticRegression setUpperBoundsOnIntercepts(Vector value)
value - (undocumented)public LogisticRegression setMaxBlockSizeInMB(double value)
maxBlockSizeInMB.
Default is 0.0, then 1.0 MB will be chosen.
value - (undocumented)public LogisticRegression setInitialModel(LogisticRegressionModel model)
public LogisticRegression copy(ParamMap extra)
ParamsdefaultCopy().copy in interface Paramscopy in class Predictor<Vector,LogisticRegression,LogisticRegressionModel>extra - (undocumented)