Class LogisticRegressionModel
- All Implemented Interfaces:
Serializable,org.apache.spark.internal.Logging,ClassifierParams,LogisticRegressionParams,ProbabilisticClassifierParams,Params,HasAggregationDepth,HasElasticNetParam,HasFeaturesCol,HasFitIntercept,HasLabelCol,HasMaxBlockSizeInMB,HasMaxIter,HasPredictionCol,HasProbabilityCol,HasRawPredictionCol,HasRegParam,HasStandardization,HasThreshold,HasThresholds,HasTol,HasWeightCol,PredictorParams,HasTrainingSummary<LogisticRegressionTrainingSummary>,Identifiable,MLWritable
LogisticRegression.- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging
org.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter -
Method Summary
Modifier and TypeMethodDescriptionfinal IntParamParam for suggested depth for treeAggregate (>= 2).Gets summary of model on training set.A vector of model coefficients for "binomial" logistic regression.Creates a copy of this instance with the same UID and some extra params.final DoubleParamParam for the ElasticNet mixing parameter, in range [0, 1].Evaluates the model on a test dataset.family()Param for the name of family which is a description of the label distribution to be used in the model.final BooleanParamParam for whether to fit an intercept term.doubleGet threshold for binary classification.double[]Get thresholds for binary or multiclass classification.doubleThe model intercept for "binomial" logistic regression.static LogisticRegressionModelThe lower bounds on coefficients if fitting under bound constrained optimization.The lower bounds on intercepts if fitting under bound constrained optimization.final DoubleParamParam for Maximum memory in MB for stacking input data into blocks.final IntParammaxIter()Param for maximum number of iterations (>= 0).intNumber of classes (values which the label can take).intReturns the number of features the model was trained on.doublePredict label for the given feature vector.predictRaw(Vector features) Raw prediction for each possible label.static MLReader<LogisticRegressionModel>read()final DoubleParamregParam()Param for regularization parameter (>= 0).setThreshold(double value) Set threshold in binary classification, in range [0, 1].setThresholds(double[] value) Set thresholds in multiclass (or binary) classification to adjust the probability of predicting each class.final BooleanParamParam for whether to standardize the training features before fitting the model.summary()Gets summary of model on training set.Param for threshold in binary classification prediction, in range [0, 1].final DoubleParamtol()Param for the convergence tolerance for iterative algorithms (>= 0).toString()uid()An immutable unique ID for the object and its derivatives.The upper bounds on coefficients if fitting under bound constrained optimization.The upper bounds on intercepts if fitting under bound constrained optimization.Param for weight column name.write()Returns aMLWriterinstance for this ML instance.Methods inherited from class org.apache.spark.ml.classification.ProbabilisticClassificationModel
normalizeToProbabilitiesInPlace, predictProbability, probabilityCol, setProbabilityCol, thresholds, transform, transformSchemaMethods inherited from class org.apache.spark.ml.classification.ClassificationModel
rawPredictionCol, setRawPredictionCol, transformImplMethods inherited from class org.apache.spark.ml.PredictionModel
featuresCol, labelCol, predictionCol, setFeaturesCol, setPredictionColMethods inherited from class org.apache.spark.ml.Transformer
transform, transform, transformMethods inherited from class org.apache.spark.ml.PipelineStage
paramsMethods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, waitMethods inherited from interface org.apache.spark.ml.param.shared.HasAggregationDepth
getAggregationDepthMethods inherited from interface org.apache.spark.ml.param.shared.HasElasticNetParam
getElasticNetParamMethods inherited from interface org.apache.spark.ml.param.shared.HasFeaturesCol
featuresCol, getFeaturesColMethods inherited from interface org.apache.spark.ml.param.shared.HasFitIntercept
getFitInterceptMethods inherited from interface org.apache.spark.ml.param.shared.HasLabelCol
getLabelCol, labelColMethods inherited from interface org.apache.spark.ml.param.shared.HasMaxBlockSizeInMB
getMaxBlockSizeInMBMethods inherited from interface org.apache.spark.ml.param.shared.HasMaxIter
getMaxIterMethods inherited from interface org.apache.spark.ml.param.shared.HasPredictionCol
getPredictionCol, predictionColMethods inherited from interface org.apache.spark.ml.param.shared.HasProbabilityCol
getProbabilityCol, probabilityColMethods inherited from interface org.apache.spark.ml.param.shared.HasRawPredictionCol
getRawPredictionCol, rawPredictionColMethods inherited from interface org.apache.spark.ml.param.shared.HasRegParam
getRegParamMethods inherited from interface org.apache.spark.ml.param.shared.HasStandardization
getStandardizationMethods inherited from interface org.apache.spark.ml.param.shared.HasThresholds
thresholdsMethods inherited from interface org.apache.spark.ml.util.HasTrainingSummary
hasSummary, setSummaryMethods inherited from interface org.apache.spark.ml.param.shared.HasWeightCol
getWeightColMethods inherited from interface org.apache.spark.internal.Logging
initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContextMethods inherited from interface org.apache.spark.ml.classification.LogisticRegressionParams
checkThresholdConsistency, getFamily, getLowerBoundsOnCoefficients, getLowerBoundsOnIntercepts, getUpperBoundsOnCoefficients, getUpperBoundsOnIntercepts, usingBoundConstrainedOptimization, validateAndTransformSchemaMethods inherited from interface org.apache.spark.ml.util.MLWritable
saveMethods inherited from interface org.apache.spark.ml.param.Params
clear, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, paramMap, params, set, set, set, setDefault, setDefault, shouldOwn
-
Method Details
-
read
-
load
-
family
Description copied from interface:LogisticRegressionParamsParam for the name of family which is a description of the label distribution to be used in the model. Supported options: - "auto": Automatically select the family based on the number of classes: If numClasses == 1 || numClasses == 2, set to "binomial". Else, set to "multinomial" - "binomial": Binary logistic regression with pivoting. - "multinomial": Multinomial logistic (softmax) regression without pivoting. Default is "auto".- Specified by:
familyin interfaceLogisticRegressionParams- Returns:
- (undocumented)
-
lowerBoundsOnCoefficients
Description copied from interface:LogisticRegressionParamsThe lower bounds on coefficients if fitting under bound constrained optimization. The bound matrix must be compatible with the shape (1, number of features) for binomial regression, or (number of classes, number of features) for multinomial regression. Otherwise, it throws exception. Default is none.- Specified by:
lowerBoundsOnCoefficientsin interfaceLogisticRegressionParams- Returns:
- (undocumented)
-
upperBoundsOnCoefficients
Description copied from interface:LogisticRegressionParamsThe upper bounds on coefficients if fitting under bound constrained optimization. The bound matrix must be compatible with the shape (1, number of features) for binomial regression, or (number of classes, number of features) for multinomial regression. Otherwise, it throws exception. Default is none.- Specified by:
upperBoundsOnCoefficientsin interfaceLogisticRegressionParams- Returns:
- (undocumented)
-
lowerBoundsOnIntercepts
Description copied from interface:LogisticRegressionParamsThe lower bounds on intercepts if fitting under bound constrained optimization. The bounds vector size must be equal to 1 for binomial regression, or the number of classes for multinomial regression. Otherwise, it throws exception. Default is none.- Specified by:
lowerBoundsOnInterceptsin interfaceLogisticRegressionParams- Returns:
- (undocumented)
-
upperBoundsOnIntercepts
Description copied from interface:LogisticRegressionParamsThe upper bounds on intercepts if fitting under bound constrained optimization. The bound vector size must be equal to 1 for binomial regression, or the number of classes for multinomial regression. Otherwise, it throws exception. Default is none.- Specified by:
upperBoundsOnInterceptsin interfaceLogisticRegressionParams- Returns:
- (undocumented)
-
maxBlockSizeInMB
Description copied from interface:HasMaxBlockSizeInMBParam for Maximum memory in MB for stacking input data into blocks. Data is stacked within partitions. If more than remaining data size in a partition then it is adjusted to the data size. Default 0.0 represents choosing optimal value, depends on specific algorithm. Must be >= 0..- Specified by:
maxBlockSizeInMBin interfaceHasMaxBlockSizeInMB- Returns:
- (undocumented)
-
aggregationDepth
Description copied from interface:HasAggregationDepthParam for suggested depth for treeAggregate (>= 2).- Specified by:
aggregationDepthin interfaceHasAggregationDepth- Returns:
- (undocumented)
-
threshold
Description copied from interface:HasThresholdParam for threshold in binary classification prediction, in range [0, 1].- Specified by:
thresholdin interfaceHasThreshold- Returns:
- (undocumented)
-
weightCol
Description copied from interface:HasWeightColParam for weight column name. If this is not set or empty, we treat all instance weights as 1.0.- Specified by:
weightColin interfaceHasWeightCol- Returns:
- (undocumented)
-
standardization
Description copied from interface:HasStandardizationParam for whether to standardize the training features before fitting the model.- Specified by:
standardizationin interfaceHasStandardization- Returns:
- (undocumented)
-
tol
Description copied from interface:HasTolParam for the convergence tolerance for iterative algorithms (>= 0). -
fitIntercept
Description copied from interface:HasFitInterceptParam for whether to fit an intercept term.- Specified by:
fitInterceptin interfaceHasFitIntercept- Returns:
- (undocumented)
-
maxIter
Description copied from interface:HasMaxIterParam for maximum number of iterations (>= 0).- Specified by:
maxIterin interfaceHasMaxIter- Returns:
- (undocumented)
-
elasticNetParam
Description copied from interface:HasElasticNetParamParam for the ElasticNet mixing parameter, in range [0, 1]. For alpha = 0, the penalty is an L2 penalty. For alpha = 1, it is an L1 penalty.- Specified by:
elasticNetParamin interfaceHasElasticNetParam- Returns:
- (undocumented)
-
regParam
Description copied from interface:HasRegParamParam for regularization parameter (>= 0).- Specified by:
regParamin interfaceHasRegParam- Returns:
- (undocumented)
-
uid
Description copied from interface:IdentifiableAn immutable unique ID for the object and its derivatives.- Specified by:
uidin interfaceIdentifiable- Returns:
- (undocumented)
-
coefficientMatrix
-
interceptVector
-
numClasses
public int numClasses()Description copied from class:ClassificationModelNumber of classes (values which the label can take).- Specified by:
numClassesin classClassificationModel<Vector,LogisticRegressionModel>
-
coefficients
A vector of model coefficients for "binomial" logistic regression. If this model was trained using the "multinomial" family then an exception is thrown.- Returns:
- Vector
-
intercept
public double intercept()The model intercept for "binomial" logistic regression. If this model was fit with the "multinomial" family then an exception is thrown.- Returns:
- Double
-
setThreshold
Description copied from interface:LogisticRegressionParamsSet threshold in binary classification, in range [0, 1].If the estimated probability of class label 1 is greater than threshold, then predict 1, else 0. A high threshold encourages the model to predict 0 more often; a low threshold encourages the model to predict 1 more often.
Note: Calling this with threshold p is equivalent to calling
setThresholds(Array(1-p, p)). WhensetThreshold()is called, any user-set value forthresholdswill be cleared. If boththresholdandthresholdsare set in a ParamMap, then they must be equivalent.Default is 0.5.
- Specified by:
setThresholdin interfaceLogisticRegressionParams- Parameters:
value- (undocumented)- Returns:
- (undocumented)
-
getThreshold
public double getThreshold()Description copied from interface:LogisticRegressionParamsGet threshold for binary classification.If
thresholdsis set with length 2 (i.e., binary classification), this returns the equivalent threshold:
. Otherwise, returns `threshold` if set, or its default value if unset. @group getParam @throws IllegalArgumentException if `thresholds` is set to an array of length other than 2.1 / (1 + thresholds(0) / thresholds(1))- Specified by:
getThresholdin interfaceHasThreshold- Specified by:
getThresholdin interfaceLogisticRegressionParams- Returns:
- (undocumented)
-
setThresholds
Description copied from interface:LogisticRegressionParamsSet thresholds in multiclass (or binary) classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values greater than 0, excepting that at most one value may be 0. The class with largest value p/t is predicted, where p is the original probability of that class and t is the class's threshold.Note: When
setThresholds()is called, any user-set value forthresholdwill be cleared. If boththresholdandthresholdsare set in a ParamMap, then they must be equivalent.- Specified by:
setThresholdsin interfaceLogisticRegressionParams- Overrides:
setThresholdsin classProbabilisticClassificationModel<Vector,LogisticRegressionModel> - Parameters:
value- (undocumented)- Returns:
- (undocumented)
-
getThresholds
public double[] getThresholds()Description copied from interface:LogisticRegressionParamsGet thresholds for binary or multiclass classification.If
thresholdsis set, return its value. Otherwise, ifthresholdis set, return the equivalent thresholds for binary classification: (1-threshold, threshold). If neither are set, throw an exception.- Specified by:
getThresholdsin interfaceHasThresholds- Specified by:
getThresholdsin interfaceLogisticRegressionParams- Returns:
- (undocumented)
-
numFeatures
public int numFeatures()Description copied from class:PredictionModelReturns the number of features the model was trained on. If unknown, returns -1- Overrides:
numFeaturesin classPredictionModel<Vector,LogisticRegressionModel>
-
summary
Gets summary of model on training set. An exception is thrown ifhasSummaryis false.- Specified by:
summaryin interfaceHasTrainingSummary<LogisticRegressionTrainingSummary>- Returns:
- (undocumented)
-
binarySummary
Gets summary of model on training set. An exception is thrown ifhasSummaryis false or it is a multiclass model.- Returns:
- (undocumented)
-
evaluate
Evaluates the model on a test dataset.- Parameters:
dataset- Test dataset to evaluate model on.- Returns:
- (undocumented)
-
predict
Predict label for the given feature vector. The behavior of this can be adjusted usingthresholds.- Overrides:
predictin classClassificationModel<Vector,LogisticRegressionModel> - Parameters:
features- (undocumented)- Returns:
- (undocumented)
-
predictRaw
Description copied from class:ClassificationModelRaw prediction for each possible label. The meaning of a "raw" prediction may vary between algorithms, but it intuitively gives a measure of confidence in each possible label (where larger = more confident). This internal method is used to implementtransform()and outputClassificationModel.rawPredictionCol().- Specified by:
predictRawin classClassificationModel<Vector,LogisticRegressionModel> - Parameters:
features- (undocumented)- Returns:
- vector where element i is the raw prediction for label i. This raw prediction may be any real number, where a larger value indicates greater confidence for that label.
-
copy
Description copied from interface:ParamsCreates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. SeedefaultCopy().- Specified by:
copyin interfaceParams- Specified by:
copyin classModel<LogisticRegressionModel>- Parameters:
extra- (undocumented)- Returns:
- (undocumented)
-
write
Returns aMLWriterinstance for this ML instance.For
LogisticRegressionModel, this does NOT currently save the trainingsummary(). An option to savesummary()may be added in the future.This also does not save the
Model.parent()currently.- Specified by:
writein interfaceMLWritable- Returns:
- (undocumented)
-
toString
- Specified by:
toStringin interfaceIdentifiable- Overrides:
toStringin classObject
-