Packages

class NaiveBayes extends Serializable with Logging

Trains a Naive Bayes model given an RDD of (label, features) pairs.

This is the Multinomial NB (see here) which can handle all kinds of discrete data. For example, by converting documents into TF-IDF vectors, it can be used for document classification. By making every vector a 0-1 vector, it can also be used as Bernoulli NB (see here). The input feature values must be nonnegative.

Annotations
@Since( "0.9.0" )
Source
NaiveBayes.scala
Linear Supertypes
Logging, Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. NaiveBayes
  2. Logging
  3. Serializable
  4. Serializable
  5. AnyRef
  6. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new NaiveBayes()
    Annotations
    @Since( "0.9.0" )
  2. new NaiveBayes(lambda: Double)
    Annotations
    @Since( "1.4.0" )

Value Members

  1. def getLambda: Double

    Get the smoothing parameter.

    Get the smoothing parameter.

    Annotations
    @Since( "1.4.0" )
  2. def getModelType: String

    Get the model type.

    Get the model type.

    Annotations
    @Since( "1.4.0" )
  3. def run(data: RDD[LabeledPoint]): NaiveBayesModel

    Run the algorithm with the configured parameters on an input RDD of LabeledPoint entries.

    Run the algorithm with the configured parameters on an input RDD of LabeledPoint entries.

    data

    RDD of org.apache.spark.mllib.regression.LabeledPoint.

    Annotations
    @Since( "0.9.0" )
  4. def setLambda(lambda: Double): NaiveBayes

    Set the smoothing parameter.

    Set the smoothing parameter. Default: 1.0.

    Annotations
    @Since( "0.9.0" )
  5. def setModelType(modelType: String): NaiveBayes

    Set the model type using a string (case-sensitive).

    Set the model type using a string (case-sensitive). Supported options: "multinomial" (default) and "bernoulli".

    Annotations
    @Since( "1.4.0" )