org.apache.spark.ml.feature
Class VectorIndexer.CategoryStats

Object
  extended by org.apache.spark.ml.feature.VectorIndexer.CategoryStats
All Implemented Interfaces:
java.io.Serializable
Enclosing class:
VectorIndexer

public static class VectorIndexer.CategoryStats
extends Object
implements scala.Serializable

Helper class for tracking unique values for each feature.

TODO: Track which features are known to be continuous already; do not update counts for them.

param: numFeatures This class fails if it encounters a Vector whose length is not numFeatures. param: maxCategories This class caps the number of unique values collected at maxCategories.

See Also:
Serialized Form

Constructor Summary
VectorIndexer.CategoryStats(int numFeatures, int maxCategories)
           
 
Method Summary
 void addVector(Vector v)
          Add a new vector to this index, updating sets of unique feature values
 scala.collection.immutable.Map<Object,scala.collection.immutable.Map<Object,Object>> getCategoryMaps()
          Based on stats collected, decide which features are categorical, and choose indices for categories.
 VectorIndexer.CategoryStats merge(VectorIndexer.CategoryStats other)
          Merge with another instance, modifying this instance.
 
Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

VectorIndexer.CategoryStats

public VectorIndexer.CategoryStats(int numFeatures,
                                   int maxCategories)
Method Detail

merge

public VectorIndexer.CategoryStats merge(VectorIndexer.CategoryStats other)
Merge with another instance, modifying this instance.


addVector

public void addVector(Vector v)
Add a new vector to this index, updating sets of unique feature values


getCategoryMaps

public scala.collection.immutable.Map<Object,scala.collection.immutable.Map<Object,Object>> getCategoryMaps()
Based on stats collected, decide which features are categorical, and choose indices for categories.

Sparsity: This tries to maintain sparsity by treating value 0.0 specially. If a categorical feature takes value 0.0, then value 0.0 is given index 0.

Returns:
Feature value index. Keys are categorical feature indices (column indices). Values are mappings from original features values to 0-based category indices.