org.apache.spark.mllib.feature
Class HashingTF

Object
  extended by org.apache.spark.mllib.feature.HashingTF
All Implemented Interfaces:
java.io.Serializable

public class HashingTF
extends Object
implements scala.Serializable

:: Experimental :: Maps a sequence of terms to their term frequencies using the hashing trick.

param: numFeatures number of features (default: 2^20^)

See Also:
Serialized Form

Constructor Summary
HashingTF()
           
HashingTF(int numFeatures)
           
 
Method Summary
 int indexOf(Object term)
          Returns the index of the input term.
 int numFeatures()
           
 Vector transform(Iterable<?> document)
          Transforms the input document into a sparse term frequency vector (Java version).
 Vector transform(scala.collection.Iterable<Object> document)
          Transforms the input document into a sparse term frequency vector.
<D extends Iterable<?>>
JavaRDD<Vector>
transform(JavaRDD<D> dataset)
          Transforms the input document to term frequency vectors (Java version).
<D extends scala.collection.Iterable<Object>>
RDD<Vector>
transform(RDD<D> dataset)
          Transforms the input document to term frequency vectors.
 
Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HashingTF

public HashingTF(int numFeatures)

HashingTF

public HashingTF()
Method Detail

numFeatures

public int numFeatures()

indexOf

public int indexOf(Object term)
Returns the index of the input term.

Parameters:
term - (undocumented)
Returns:
(undocumented)

transform

public Vector transform(scala.collection.Iterable<Object> document)
Transforms the input document into a sparse term frequency vector.

Parameters:
document - (undocumented)
Returns:
(undocumented)

transform

public Vector transform(Iterable<?> document)
Transforms the input document into a sparse term frequency vector (Java version).

Parameters:
document - (undocumented)
Returns:
(undocumented)

transform

public <D extends scala.collection.Iterable<Object>> RDD<Vector> transform(RDD<D> dataset)
Transforms the input document to term frequency vectors.

Parameters:
dataset - (undocumented)
Returns:
(undocumented)

transform

public <D extends Iterable<?>> JavaRDD<Vector> transform(JavaRDD<D> dataset)
Transforms the input document to term frequency vectors (Java version).

Parameters:
dataset - (undocumented)
Returns:
(undocumented)