Package org.apache.spark.mllib.feature
Class IDF
Object
org.apache.spark.mllib.feature.IDF
Inverse document frequency (IDF).
 The standard formulation is used: 
idf = log((m + 1) / (d(t) + 1)), where m is the total
 number of documents and d(t) is the number of documents that contain term t.
 
 This implementation supports filtering out terms which do not appear in a minimum number
 of documents (controlled by the variable minDocFreq). For terms that are not in
 at least minDocFreq documents, the IDF is found as 0, resulting in TF-IDFs of 0.
 The document frequency is 0 as well for such terms
 
param: minDocFreq minimum of documents in which a term should appear for filtering
- 
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic classDocument frequency aggregator. - 
Constructor Summary
Constructors - 
Method Summary
 
- 
Constructor Details
- 
IDF
public IDF(int minDocFreq)  - 
IDF
public IDF() 
 - 
 - 
Method Details
- 
minDocFreq
public int minDocFreq() - 
fit
Computes the inverse document frequency.- Parameters:
 dataset- an RDD of term frequency vectors- Returns:
 - (undocumented)
 
 - 
fit
Computes the inverse document frequency.- Parameters:
 dataset- a JavaRDD of term frequency vectors- Returns:
 - (undocumented)
 
 
 -