public class HashingTF
extends Object
implements scala.Serializable
param: numFeatures number of features (default: 2^20^)
Modifier and Type | Method and Description |
---|---|
int |
indexOf(Object term)
Returns the index of the input term.
|
int |
numFeatures() |
HashingTF |
setBinary(boolean value)
If true, term frequency vector will be binary such that non-zero term counts will be set to 1
(default: false)
|
HashingTF |
setHashAlgorithm(String value)
Set the hash algorithm used when mapping term to integer.
|
Vector |
transform(scala.collection.Iterable<?> document)
Transforms the input document into a sparse term frequency vector.
|
Vector |
transform(Iterable<?> document)
Transforms the input document into a sparse term frequency vector (Java version).
|
<D extends Iterable<?>> |
transform(JavaRDD<D> dataset)
Transforms the input document to term frequency vectors (Java version).
|
<D extends scala.collection.Iterable<?>> |
transform(RDD<D> dataset)
Transforms the input document to term frequency vectors.
|
public int numFeatures()
public HashingTF setBinary(boolean value)
value
- (undocumented)public HashingTF setHashAlgorithm(String value)
value
- (undocumented)public int indexOf(Object term)
term
- (undocumented)public Vector transform(scala.collection.Iterable<?> document)
document
- (undocumented)public Vector transform(Iterable<?> document)
document
- (undocumented)public <D extends scala.collection.Iterable<?>> RDD<Vector> transform(RDD<D> dataset)
dataset
- (undocumented)