Inverse document frequency (IDF).
The standard formulation is used: idf = log((m + 1) / (d(t) + 1)), where m is the total
number of documents and d(t) is the number of documents that contain term t.
idf = log((m + 1) / (d(t) + 1))
This implementation supports filtering out terms which do not appear in a minimum number
of documents (controlled by the variable minDocFreq). For terms that are not in
at least minDocFreq documents, the IDF is found as 0, resulting in TF-IDFs of 0.
minimum of documents in which a term
should appear for filtering
Computes the inverse document frequency.
a JavaRDD of term frequency vectors
an RDD of term frequency vectors