HashingTF¶

class pyspark.mllib.feature.HashingTF(numFeatures=1048576)[source]¶

Maps a sequence of terms to their term frequencies using the hashing trick.

New in version 1.2.0.

Parameters

Notes

The terms must be hashable (can not be dict/set/list…).

Examples

>>> htf = HashingTF(100)
>>> doc = "a a b b c d".split(" ")
>>> htf.transform(doc)
SparseVector(100, {...})

Methods

`indexOf`(term)	Returns the index of the input term.
`setBinary`(value)	If True, term frequency vector will be binary such that non-zero term counts will be set to 1 (default: False)
`transform`(document)	Transforms the input document (list of terms) to term frequency vectors, or transform the RDD of document to RDD of term frequency vectors.

Methods Documentation

indexOf(term)[source]¶: Returns the index of the input term.

New in version 1.2.0.

setBinary(value)[source]¶: If True, term frequency vector will be binary such that non-zero term counts will be set to 1 (default: False)

New in version 2.0.0.

transform(document)[source]¶: Transforms the input document (list of terms) to term frequency vectors, or transform the RDD of document to RDD of term frequency vectors.

New in version 1.2.0.

StandardScaler IDFModel