public class IDF
extends Object
idf = log((m + 1) / (d(t) + 1))
, where m
is the total
number of documents and d(t)
is the number of documents that contain term t
.
This implementation supports filtering out terms which do not appear in a minimum number
of documents (controlled by the variable minDocFreq
). For terms that are not in
at least minDocFreq
documents, the IDF is found as 0, resulting in TF-IDFs of 0.
The document frequency is 0 as well for such terms
param: minDocFreq minimum of documents in which a term should appear for filtering
Modifier and Type | Class and Description |
---|---|
static class |
IDF.DocumentFrequencyAggregator
Document frequency aggregator.
|
Modifier and Type | Method and Description |
---|---|
IDFModel |
fit(JavaRDD<Vector> dataset)
Computes the inverse document frequency.
|
IDFModel |
fit(RDD<Vector> dataset)
Computes the inverse document frequency.
|
int |
minDocFreq() |