org.apache.spark.mllib.classification.NaiveBayes

All Implemented Interfaces:: Serializable, org.apache.spark.internal.Logging

public class NaiveBayes extends Object implements Serializable, org.apache.spark.internal.Logging

Trains a Naive Bayes model given an RDD of (label, features) pairs.

This is the Multinomial NB (see here) which can handle all kinds of discrete data. For example, by converting documents into TF-IDF vectors, it can be used for document classification. By making every vector a 0-1 vector, it can also be used as Bernoulli NB (see here). The input feature values must be nonnegative.

See Also:

Serialized Form

Nested Class Summary

Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging
org.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
Constructor Summary

Constructors

Constructor

Description

NaiveBayes()

NaiveBayes(double lambda)
Method Summary

Modifier and Type

Method

Description

double

getLambda()

Get the smoothing parameter.

String

getModelType()

Get the model type.

NaiveBayesModel

run(RDD<LabeledPoint> data)

Run the algorithm with the configured parameters on an input RDD of LabeledPoint entries.

NaiveBayes

setLambda(double lambda)

Set the smoothing parameter.

NaiveBayes

setModelType(String modelType)

Set the model type using a string (case-sensitive).

static NaiveBayesModel

train(RDD<LabeledPoint> input)

Trains a Naive Bayes model given an RDD of (label, features) pairs.

static NaiveBayesModel

train(RDD<LabeledPoint> input, double lambda)

Trains a Naive Bayes model given an RDD of (label, features) pairs.

static NaiveBayesModel

train(RDD<LabeledPoint> input, double lambda, String modelType)

Trains a Naive Bayes model given an RDD of (label, features) pairs.

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.spark.internal.Logging
initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logBasedOnLevel, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, MDC, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContext

Constructor Details
- NaiveBayes
  
  public NaiveBayes(double lambda)
- NaiveBayes
  
  public NaiveBayes()
Method Details
- train
  
  public static NaiveBayesModel train(RDD<LabeledPoint> input)
  
  Trains a Naive Bayes model given an RDD of (label, features) pairs.
  This is the default Multinomial NB (see here) which can handle all kinds of discrete data. For example, by converting documents into TF-IDF vectors, it can be used for document classification.
  This version of the method uses a default smoothing parameter of 1.0.
  
  Parameters:
  
  input - RDD of (label, array of features) pairs. Every vector should be a frequency vector or a count vector.
  
  Returns:
  
  (undocumented)
- train
  
  public static NaiveBayesModel train(RDD<LabeledPoint> input, double lambda)
  
  Trains a Naive Bayes model given an RDD of (label, features) pairs.
  This is the default Multinomial NB (see here) which can handle all kinds of discrete data. For example, by converting documents into TF-IDF vectors, it can be used for document classification.
  
  Parameters:
  
  input - RDD of (label, array of features) pairs. Every vector should be a frequency vector or a count vector.
  
  lambda - The smoothing parameter
  
  Returns:
  
  (undocumented)
- train
  
  public static NaiveBayesModel train(RDD<LabeledPoint> input, double lambda, String modelType)
  
  Trains a Naive Bayes model given an RDD of (label, features) pairs.
  The model type can be set to either Multinomial NB (see here) or Bernoulli NB (see here). The Multinomial NB can handle discrete count data and can be called by setting the model type to "multinomial". For example, it can be used with word counts or TF_IDF vectors of documents. The Bernoulli model fits presence or absence (0-1) counts. By making every vector a 0-1 vector and setting the model type to "bernoulli", the fits and predicts as Bernoulli NB.
  
  Parameters:
  
  input - RDD of (label, array of features) pairs. Every vector should be a frequency vector or a count vector.
  
  lambda - The smoothing parameter
  
  modelType - The type of NB model to fit from the enumeration NaiveBayesModels, can be multinomial or bernoulli
  
  Returns:
  
  (undocumented)
- setLambda
  
  public NaiveBayes setLambda(double lambda)
  
  Set the smoothing parameter. Default: 1.0.
- getLambda
  
  public double getLambda()
  
  Get the smoothing parameter.
- setModelType
  
  public NaiveBayes setModelType(String modelType)
  
  Set the model type using a string (case-sensitive). Supported options: "multinomial" (default) and "bernoulli".
  
  Parameters:
  
  modelType - (undocumented)
  
  Returns:
  
  (undocumented)
- getModelType
  
  public String getModelType()
  
  Get the model type.
- run
  
  public NaiveBayesModel run(RDD<LabeledPoint> data)
  
  Run the algorithm with the configured parameters on an input RDD of LabeledPoint entries.
  
  Parameters:
  
  data - RDD of LabeledPoint.
  
  Returns:
  
  (undocumented)

Class NaiveBayes

Nested Class Summary

Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface org.apache.spark.internal.Logging

Constructor Details

NaiveBayes

NaiveBayes

Method Details

train

train

train

setLambda

getLambda

setModelType

getModelType

run