IDFModel

class pyspark.mllib.feature.IDFModel(java_model)[source]

Represents an IDF model that can transform term frequency vectors.

New in version 1.2.0.

Methods

call(name, *a)

Call method of java_model

docFreq()

Returns the document frequency.

idf()

Returns the current IDF vector.

numDocs()

Returns number of documents evaluated to compute idf

transform(x)

Transforms term frequency (TF) vectors to TF-IDF vectors.

Methods Documentation

call(name, *a)

Call method of java_model

docFreq()[source]

Returns the document frequency.

New in version 3.0.0.

idf()[source]

Returns the current IDF vector.

New in version 1.4.0.

numDocs()[source]

Returns number of documents evaluated to compute idf

New in version 3.0.0.

transform(x)[source]

Transforms term frequency (TF) vectors to TF-IDF vectors.

If minDocFreq was set for the IDF calculation, the terms which occur in fewer than minDocFreq documents will have an entry of 0.

New in version 1.2.0.

Parameters:
xpyspark.mllib.linalg.Vector or pyspark.RDD

an RDD of term frequency vectors or a term frequency vector

Returns:
pyspark.mllib.linalg.Vector or pyspark.RDD

an RDD of TF-IDF vectors or a TF-IDF vector

Notes

In Python, transform cannot currently be used within an RDD transformation or action. Call transform directly on the RDD instead.