Class NaiveBayes
Object
org.apache.spark.ml.PipelineStage
org.apache.spark.ml.Estimator<M>
org.apache.spark.ml.Predictor<FeaturesType,E,M>
org.apache.spark.ml.classification.Classifier<FeaturesType,E,M>
org.apache.spark.ml.classification.ProbabilisticClassifier<Vector,NaiveBayes,NaiveBayesModel>
org.apache.spark.ml.classification.NaiveBayes
- All Implemented Interfaces:
Serializable
,org.apache.spark.internal.Logging
,ClassifierParams
,NaiveBayesParams
,ProbabilisticClassifierParams
,Params
,HasFeaturesCol
,HasLabelCol
,HasPredictionCol
,HasProbabilityCol
,HasRawPredictionCol
,HasThresholds
,HasWeightCol
,PredictorParams
,DefaultParamsWritable
,Identifiable
,MLWritable
public class NaiveBayes
extends ProbabilisticClassifier<Vector,NaiveBayes,NaiveBayesModel>
implements NaiveBayesParams, DefaultParamsWritable
Naive Bayes Classifiers.
It supports Multinomial NB
(see
here)
which can handle finitely supported discrete data. For example, by converting documents into
TF-IDF vectors, it can be used for document classification. By making every vector a
binary (0/1) data, it can also be used as Bernoulli NB
(see
here).
The input feature values for Multinomial NB and Bernoulli NB must be nonnegative.
Since 3.0.0, it supports Complement NB which is an adaptation of the Multinomial NB. Specifically,
Complement NB uses statistics from the complement of each class to compute the model's coefficients
The inventors of Complement NB show empirically that the parameter estimates for CNB are more stable
than those for Multinomial NB. Like Multinomial NB, the input feature values for Complement NB must
be nonnegative.
Since 3.0.0, it also supports Gaussian NB
(see
here)
which can handle continuous data.
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging
org.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionCreates a copy of this instance with the same UID and some extra params.static NaiveBayes
The model type which is a string (case-sensitive).static MLReader<T>
read()
setModelType
(String value) Set the model type using a string (case-sensitive).setSmoothing
(double value) Set the smoothing parameter.setWeightCol
(String value) Sets the value of paramweightCol()
.final DoubleParam
The smoothing parameter.uid()
An immutable unique ID for the object and its derivatives.Param for weight column name.Methods inherited from class org.apache.spark.ml.classification.ProbabilisticClassifier
probabilityCol, setProbabilityCol, setThresholds, thresholds
Methods inherited from class org.apache.spark.ml.classification.Classifier
rawPredictionCol, setRawPredictionCol
Methods inherited from class org.apache.spark.ml.Predictor
featuresCol, fit, labelCol, predictionCol, setFeaturesCol, setLabelCol, setPredictionCol, transformSchema
Methods inherited from class org.apache.spark.ml.PipelineStage
params
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.apache.spark.ml.util.DefaultParamsWritable
write
Methods inherited from interface org.apache.spark.ml.param.shared.HasFeaturesCol
featuresCol, getFeaturesCol
Methods inherited from interface org.apache.spark.ml.param.shared.HasLabelCol
getLabelCol, labelCol
Methods inherited from interface org.apache.spark.ml.param.shared.HasPredictionCol
getPredictionCol, predictionCol
Methods inherited from interface org.apache.spark.ml.param.shared.HasProbabilityCol
getProbabilityCol
Methods inherited from interface org.apache.spark.ml.param.shared.HasRawPredictionCol
getRawPredictionCol, rawPredictionCol
Methods inherited from interface org.apache.spark.ml.param.shared.HasThresholds
getThresholds
Methods inherited from interface org.apache.spark.ml.param.shared.HasWeightCol
getWeightCol
Methods inherited from interface org.apache.spark.ml.util.Identifiable
toString
Methods inherited from interface org.apache.spark.internal.Logging
initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContext
Methods inherited from interface org.apache.spark.ml.util.MLWritable
save
Methods inherited from interface org.apache.spark.ml.classification.NaiveBayesParams
getModelType, getSmoothing
Methods inherited from interface org.apache.spark.ml.param.Params
clear, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, onParamChange, paramMap, params, set, set, set, setDefault, setDefault, shouldOwn
Methods inherited from interface org.apache.spark.ml.classification.ProbabilisticClassifierParams
validateAndTransformSchema
-
Constructor Details
-
NaiveBayes
-
NaiveBayes
public NaiveBayes()
-
-
Method Details
-
load
-
read
-
smoothing
Description copied from interface:NaiveBayesParams
The smoothing parameter. (default = 1.0).- Specified by:
smoothing
in interfaceNaiveBayesParams
- Returns:
- (undocumented)
-
modelType
Description copied from interface:NaiveBayesParams
The model type which is a string (case-sensitive). Supported options: "multinomial", "complement", "bernoulli", "gaussian". (default = multinomial)- Specified by:
modelType
in interfaceNaiveBayesParams
- Returns:
- (undocumented)
-
weightCol
Description copied from interface:HasWeightCol
Param for weight column name. If this is not set or empty, we treat all instance weights as 1.0.- Specified by:
weightCol
in interfaceHasWeightCol
- Returns:
- (undocumented)
-
uid
Description copied from interface:Identifiable
An immutable unique ID for the object and its derivatives.- Specified by:
uid
in interfaceIdentifiable
- Returns:
- (undocumented)
-
setSmoothing
Set the smoothing parameter. Default is 1.0.- Parameters:
value
- (undocumented)- Returns:
- (undocumented)
-
setModelType
Set the model type using a string (case-sensitive). Supported options: "multinomial", "complement", "bernoulli", and "gaussian". Default is "multinomial"- Parameters:
value
- (undocumented)- Returns:
- (undocumented)
-
setWeightCol
Sets the value of paramweightCol()
. If this is not set or empty, we treat all instance weights as 1.0. Default is not set, so all instances have weight one.- Parameters:
value
- (undocumented)- Returns:
- (undocumented)
-
copy
Description copied from interface:Params
Creates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. SeedefaultCopy()
.- Specified by:
copy
in interfaceParams
- Specified by:
copy
in classPredictor<Vector,
NaiveBayes, NaiveBayesModel> - Parameters:
extra
- (undocumented)- Returns:
- (undocumented)
-