org.apache.spark.ml.feature
Class StringIndexer

Object
  extended by org.apache.spark.ml.PipelineStage
      extended by org.apache.spark.ml.Estimator<StringIndexerModel>
          extended by org.apache.spark.ml.feature.StringIndexer
All Implemented Interfaces:
java.io.Serializable, Logging, Params

public class StringIndexer
extends Estimator<StringIndexerModel>

:: Experimental :: A label indexer that maps a string column of labels to an ML column of label indices. If the input column is numeric, we cast it to string and index the string values. The indices are in [0, numLabels), ordered by label frequencies. So the most frequent label gets index 0.

See Also:
Serialized Form

Constructor Summary
StringIndexer()
           
StringIndexer(String uid)
           
 
Method Summary
 StringIndexer copy(ParamMap extra)
          Creates a copy of this instance with the same UID and some extra params.
 StringIndexerModel fit(DataFrame dataset)
          Fits a model to the input data.
 StringIndexer setInputCol(String value)
           
 StringIndexer setOutputCol(String value)
           
 StructType transformSchema(StructType schema)
          :: DeveloperApi ::
 String uid()
           
 StructType validateAndTransformSchema(StructType schema)
          Validates and transforms the input schema.
 
Methods inherited from class org.apache.spark.ml.Estimator
fit, fit, fit, fit
 
Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.spark.ml.param.Params
clear, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, paramMap, params, set, set, set, setDefault, setDefault, setDefault, shouldOwn, validateParams
 
Methods inherited from interface org.apache.spark.Logging
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning
 

Constructor Detail

StringIndexer

public StringIndexer(String uid)

StringIndexer

public StringIndexer()
Method Detail

uid

public String uid()

setInputCol

public StringIndexer setInputCol(String value)

setOutputCol

public StringIndexer setOutputCol(String value)

fit

public StringIndexerModel fit(DataFrame dataset)
Description copied from class: Estimator
Fits a model to the input data.

Specified by:
fit in class Estimator<StringIndexerModel>
Parameters:
dataset - (undocumented)
Returns:
(undocumented)

transformSchema

public StructType transformSchema(StructType schema)
Description copied from class: PipelineStage
:: DeveloperApi ::

Derives the output schema from the input schema.

Specified by:
transformSchema in class PipelineStage
Parameters:
schema - (undocumented)
Returns:
(undocumented)

copy

public StringIndexer copy(ParamMap extra)
Description copied from interface: Params
Creates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly.

Specified by:
copy in interface Params
Specified by:
copy in class Estimator<StringIndexerModel>
Parameters:
extra - (undocumented)
Returns:
(undocumented)
See Also:
defaultCopy()

validateAndTransformSchema

public StructType validateAndTransformSchema(StructType schema)
Validates and transforms the input schema.