IDF (Spark 1.4.1 JavaDoc)

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.spark.mllib.feature
Class IDF

Object
  org.apache.spark.mllib.feature.IDF

public class IDF
extends Object
extends Object

:: Experimental :: Inverse document frequency (IDF). The standard formulation is used: idf = log((m + 1) / (d(t) + 1)), where m is the total number of documents and d(t) is the number of documents that contain term t.

This implementation supports filtering out terms which do not appear in a minimum number of documents (controlled by the variable minDocFreq). For terms that are not in at least minDocFreq documents, the IDF is found as 0, resulting in TF-IDFs of 0.

param: minDocFreq minimum of documents in which a term should appear for filtering

Nested Class Summary
`static class`	`IDF.DocumentFrequencyAggregator` Document frequency aggregator.

Constructor Summary
`IDF()`
`IDF(int minDocFreq)`

Method Summary
`IDFModel`	`fit(JavaRDD<Vector> dataset)` Computes the inverse document frequency.
`IDFModel`	`fit(RDD<Vector> dataset)` Computes the inverse document frequency.
`int`	`minDocFreq()`

Methods inherited from class Object
`equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

IDF

public IDF(int minDocFreq)

IDF

public IDF()

Method Detail

minDocFreq

public int minDocFreq()

fit

public IDFModel fit(RDD<Vector> dataset)

Computes the inverse document frequency.

Parameters:: dataset - an RDD of term frequency vectors
Returns:: (undocumented)

fit

public IDFModel fit(JavaRDD<Vector> dataset)