Package pyspark :: Package mllib :: Module classification :: Class NaiveBayes
[frames] | no frames]

Class NaiveBayes

source code

object --+
         |
        NaiveBayes

Instance Methods

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __init__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __str__, __subclasshook__

Class Methods
 
train(cls, data, lambda_=1.0)
Train a Naive Bayes model given an RDD of (label, features) vectors.
source code
Properties

Inherited from object: __class__

Method Details

train(cls, data, lambda_=1.0)
Class Method

source code 

Train a Naive Bayes model given an RDD of (label, features) vectors.

This is the Multinomial NB (http://tinyurl.com/lsdw6p) which can handle all kinds of discrete data. For example, by converting documents into TF-IDF vectors, it can be used for document classification. By making every vector a 0-1 vector, it can also be used as Bernoulli NB (http://tinyurl.com/p7c96j6).

Parameters:
  • data - RDD of NumPy vectors, one per element, where the first coordinate is the label and the rest is the feature vector (e.g. a count vector).
  • lambda_ - The smoothing parameter