org.apache.spark.mllib.classification
Class NaiveBayes

Object
  extended by org.apache.spark.mllib.classification.NaiveBayes
All Implemented Interfaces:
java.io.Serializable, Logging

public class NaiveBayes
extends Object
implements scala.Serializable, Logging

See Also:
Serialized Form

Constructor Summary
NaiveBayes()
           
NaiveBayes(double lambda)
           
 
Method Summary
static String Bernoulli()
          String name for Bernoulli model type.
 double getLambda()
          Get the smoothing parameter.
 String getModelType()
          Get the model type.
static String Multinomial()
          String name for multinomial model type.
 NaiveBayesModel run(RDD<LabeledPoint> data)
          Run the algorithm with the configured parameters on an input RDD of LabeledPoint entries.
 NaiveBayes setLambda(double lambda)
          Set the smoothing parameter.
 NaiveBayes setModelType(String modelType)
          Set the model type using a string (case-sensitive).
static scala.collection.immutable.Set<String> supportedModelTypes()
           
static NaiveBayesModel train(RDD<LabeledPoint> input)
          Trains a Naive Bayes model given an RDD of (label, features) pairs.
static NaiveBayesModel train(RDD<LabeledPoint> input, double lambda)
          Trains a Naive Bayes model given an RDD of (label, features) pairs.
static NaiveBayesModel train(RDD<LabeledPoint> input, double lambda, String modelType)
          Trains a Naive Bayes model given an RDD of (label, features) pairs.
 
Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.spark.Logging
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning
 

Constructor Detail

NaiveBayes

public NaiveBayes(double lambda)

NaiveBayes

public NaiveBayes()
Method Detail

Multinomial

public static String Multinomial()
String name for multinomial model type.


Bernoulli

public static String Bernoulli()
String name for Bernoulli model type.


supportedModelTypes

public static scala.collection.immutable.Set<String> supportedModelTypes()

train

public static NaiveBayesModel train(RDD<LabeledPoint> input)
Trains a Naive Bayes model given an RDD of (label, features) pairs.

This is the default Multinomial NB (http://tinyurl.com/lsdw6p) which can handle all kinds of discrete data. For example, by converting documents into TF-IDF vectors, it can be used for document classification.

This version of the method uses a default smoothing parameter of 1.0.

Parameters:
input - RDD of (label, array of features) pairs. Every vector should be a frequency vector or a count vector.
Returns:
(undocumented)

train

public static NaiveBayesModel train(RDD<LabeledPoint> input,
                                    double lambda)
Trains a Naive Bayes model given an RDD of (label, features) pairs.

This is the default Multinomial NB (http://tinyurl.com/lsdw6p) which can handle all kinds of discrete data. For example, by converting documents into TF-IDF vectors, it can be used for document classification.

Parameters:
input - RDD of (label, array of features) pairs. Every vector should be a frequency vector or a count vector.
lambda - The smoothing parameter
Returns:
(undocumented)

train

public static NaiveBayesModel train(RDD<LabeledPoint> input,
                                    double lambda,
                                    String modelType)
Trains a Naive Bayes model given an RDD of (label, features) pairs.

The model type can be set to either Multinomial NB (http://tinyurl.com/lsdw6p) or Bernoulli NB (http://tinyurl.com/p7c96j6). The Multinomial NB can handle discrete count data and can be called by setting the model type to "multinomial". For example, it can be used with word counts or TF_IDF vectors of documents. The Bernoulli model fits presence or absence (0-1) counts. By making every vector a 0-1 vector and setting the model type to "bernoulli", the fits and predicts as Bernoulli NB.

Parameters:
input - RDD of (label, array of features) pairs. Every vector should be a frequency vector or a count vector.
lambda - The smoothing parameter

modelType - The type of NB model to fit from the enumeration NaiveBayesModels, can be multinomial or bernoulli
Returns:
(undocumented)

setLambda

public NaiveBayes setLambda(double lambda)
Set the smoothing parameter. Default: 1.0.


getLambda

public double getLambda()
Get the smoothing parameter.


setModelType

public NaiveBayes setModelType(String modelType)
Set the model type using a string (case-sensitive). Supported options: "multinomial" (default) and "bernoulli".

Parameters:
modelType - (undocumented)
Returns:
(undocumented)

getModelType

public String getModelType()
Get the model type.


run

public NaiveBayesModel run(RDD<LabeledPoint> data)
Run the algorithm with the configured parameters on an input RDD of LabeledPoint entries.

Parameters:
data - RDD of LabeledPoint.
Returns:
(undocumented)