Package org.apache.spark.mllib.feature
Class IDF
Object
org.apache.spark.mllib.feature.IDF
Inverse document frequency (IDF).
 The standard formulation is used: 
idf = log((m + 1) / (d(t) + 1)), where m is the total
 number of documents and d(t) is the number of documents that contain term t.
 
 This implementation supports filtering out terms which do not appear in a minimum number
 of documents (controlled by the variable minDocFreq). For terms that are not in
 at least minDocFreq documents, the IDF is found as 0, resulting in TF-IDFs of 0.
 The document frequency is 0 as well for such terms
 
param: minDocFreq minimum of documents in which a term should appear for filtering
- 
Nested Class SummaryNested ClassesModifier and TypeClassDescriptionstatic classDocument frequency aggregator.
- 
Constructor SummaryConstructors
- 
Method Summary
- 
Constructor Details- 
IDFpublic IDF(int minDocFreq) 
- 
IDFpublic IDF()
 
- 
- 
Method Details- 
minDocFreqpublic int minDocFreq()
- 
fitComputes the inverse document frequency.- Parameters:
- dataset- an RDD of term frequency vectors
- Returns:
- (undocumented)
 
- 
fitComputes the inverse document frequency.- Parameters:
- dataset- a JavaRDD of term frequency vectors
- Returns:
- (undocumented)
 
 
-