IDFModel

class pyspark.mllib.feature.IDFModel(java_model: py4j.java_gateway.JavaObject)[source]

Represents an IDF model that can transform term frequency vectors.

New in version 1.2.0.

Methods

call(name, *a)

Call method of java_model

docFreq()

Returns the document frequency.

idf()

Returns the current IDF vector.

numDocs()

Returns number of documents evaluated to compute idf

transform(x)

Transforms term frequency (TF) vectors to TF-IDF vectors.

Methods Documentation

call(name: str, *a: Any) → Any

Call method of java_model

docFreq() → List[int][source]

Returns the document frequency.

New in version 3.0.0.

idf()pyspark.mllib.linalg.Vector[source]

Returns the current IDF vector.

New in version 1.4.0.

numDocs() → int[source]

Returns number of documents evaluated to compute idf

New in version 3.0.0.

transform(x: Union[VectorLike, pyspark.rdd.RDD[VectorLike]]) → Union[pyspark.mllib.linalg.Vector, pyspark.rdd.RDD[pyspark.mllib.linalg.Vector]][source]

Transforms term frequency (TF) vectors to TF-IDF vectors.

If minDocFreq was set for the IDF calculation, the terms which occur in fewer than minDocFreq documents will have an entry of 0.

New in version 1.2.0.

Parameters
xpyspark.mllib.linalg.Vector or pyspark.RDD

an RDD of term frequency vectors or a term frequency vector

Returns
pyspark.mllib.linalg.Vector or pyspark.RDD

an RDD of TF-IDF vectors or a TF-IDF vector

Notes

In Python, transform cannot currently be used within an RDD transformation or action. Call transform directly on the RDD instead.