org.apache.spark.ml.feature
Class VectorIndexerModel

Object
  extended by org.apache.spark.ml.PipelineStage
      extended by org.apache.spark.ml.Transformer
          extended by org.apache.spark.ml.Model<VectorIndexerModel>
              extended by org.apache.spark.ml.feature.VectorIndexerModel
All Implemented Interfaces:
java.io.Serializable, Logging, Params

public class VectorIndexerModel
extends Model<VectorIndexerModel>

:: Experimental :: Transform categorical features to use 0-based indices instead of their original values. - Categorical features are mapped to indices. - Continuous features (columns) are left unchanged. This also appends metadata to the output column, marking features as Numeric (continuous), Nominal (categorical), or Binary (either continuous or categorical). Non-ML metadata is not carried over from the input to the output column.

This maintains vector sparsity.

param: numFeatures Number of features, i.e., length of Vectors which this transforms param: categoryMaps Feature value index. Keys are categorical feature indices (column indices). Values are maps from original features values to 0-based category indices. If a feature is not in this map, it is treated as continuous.

See Also:
Serialized Form

Method Summary
 scala.collection.immutable.Map<Object,scala.collection.immutable.Map<Object,Object>> categoryMaps()
           
 VectorIndexerModel copy(ParamMap extra)
          Creates a copy of this instance with the same UID and some extra params.
 int getMaxCategories()
           
 java.util.Map<Integer,java.util.Map<Double,Integer>> javaCategoryMaps()
          Java-friendly version of categoryMaps
 IntParam maxCategories()
          Threshold for the number of values a categorical feature can take.
 int numFeatures()
           
 VectorIndexerModel setInputCol(String value)
           
 VectorIndexerModel setOutputCol(String value)
           
 DataFrame transform(DataFrame dataset)
          Transforms the input dataset.
 StructType transformSchema(StructType schema)
          :: DeveloperApi ::
 String uid()
           
 
Methods inherited from class org.apache.spark.ml.Model
hasParent, parent, setParent
 
Methods inherited from class org.apache.spark.ml.Transformer
transform, transform, transform
 
Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.spark.ml.param.Params
clear, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, paramMap, params, set, set, set, setDefault, setDefault, setDefault, shouldOwn, validateParams
 
Methods inherited from interface org.apache.spark.Logging
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning
 

Method Detail

uid

public String uid()

numFeatures

public int numFeatures()

categoryMaps

public scala.collection.immutable.Map<Object,scala.collection.immutable.Map<Object,Object>> categoryMaps()

javaCategoryMaps

public java.util.Map<Integer,java.util.Map<Double,Integer>> javaCategoryMaps()
Java-friendly version of categoryMaps


setInputCol

public VectorIndexerModel setInputCol(String value)

setOutputCol

public VectorIndexerModel setOutputCol(String value)

transform

public DataFrame transform(DataFrame dataset)
Description copied from class: Transformer
Transforms the input dataset.

Specified by:
transform in class Transformer
Parameters:
dataset - (undocumented)
Returns:
(undocumented)

transformSchema

public StructType transformSchema(StructType schema)
Description copied from class: PipelineStage
:: DeveloperApi ::

Derives the output schema from the input schema.

Specified by:
transformSchema in class PipelineStage
Parameters:
schema - (undocumented)
Returns:
(undocumented)

copy

public VectorIndexerModel copy(ParamMap extra)
Description copied from interface: Params
Creates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly.

Specified by:
copy in interface Params
Specified by:
copy in class Model<VectorIndexerModel>
Parameters:
extra - (undocumented)
Returns:
(undocumented)
See Also:
defaultCopy()

maxCategories

public IntParam maxCategories()
Threshold for the number of values a categorical feature can take. If a feature is found to have > maxCategories values, then it is declared continuous. Must be >= 2.

(default = 20)

Returns:
(undocumented)

getMaxCategories

public int getMaxCategories()