IDFModel¶

class
pyspark.mllib.feature.
IDFModel
(java_model)[source]¶ Represents an IDF model that can transform term frequency vectors.
New in version 1.2.0.
Methods
call
(name, *a)Call method of java_model
docFreq
()Returns the document frequency.
idf
()Returns the current IDF vector.
numDocs
()Returns number of documents evaluated to compute idf
transform
(x)Transforms term frequency (TF) vectors to TFIDF vectors.
Methods Documentation

call
(name, *a)¶ Call method of java_model

transform
(x)[source]¶ Transforms term frequency (TF) vectors to TFIDF vectors.
If minDocFreq was set for the IDF calculation, the terms which occur in fewer than minDocFreq documents will have an entry of 0.
New in version 1.2.0.
 Parameters:
 x
pyspark.mllib.linalg.Vector
orpyspark.RDD
an RDD of term frequency vectors or a term frequency vector
 x
 Returns:
pyspark.mllib.linalg.Vector
orpyspark.RDD
an RDD of TFIDF vectors or a TFIDF vector
Notes
In Python, transform cannot currently be used within an RDD transformation or action. Call transform directly on the RDD instead.
