OneHotEncoder (Spark 1.4.1 JavaDoc)

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.spark.ml.feature
Class OneHotEncoder

Object
  org.apache.spark.ml.PipelineStage
      org.apache.spark.ml.Transformer
          org.apache.spark.ml.feature.OneHotEncoder

All Implemented Interfaces:: java.io.Serializable, Logging, Params

public class OneHotEncoder
extends Transformer
extends Transformer

:: Experimental :: A one-hot encoder that maps a column of category indices to a column of binary vectors, with at most a single one-value per row that indicates the input category index. For example with 5 categories, an input value of 2.0 would map to an output vector of [0.0, 0.0, 1.0, 0.0]. The last category is not included by default (configurable via OneHotEncoder!.dropLast because it makes the vector entries sum up to one, and hence linearly dependent. So an input value of 4.0 maps to [0.0, 0.0, 0.0, 0.0]. Note that this is different from scikit-learn's OneHotEncoder, which keeps all categories. The output vectors are sparse.

See Also:: StringIndexer} for converting categorical values into category indices, Serialized Form

Constructor Summary
`OneHotEncoder()`
`OneHotEncoder(String uid)`

Method Summary
`OneHotEncoder`	`copy(ParamMap extra)` Creates a copy of this instance with the same UID and some extra params.
`BooleanParam`	`dropLast()` Whether to drop the last category in the encoded vector (default: true)
`OneHotEncoder`	`setDropLast(boolean value)`
`OneHotEncoder`	`setInputCol(String value)`
`OneHotEncoder`	`setOutputCol(String value)`
`DataFrame`	`transform(DataFrame dataset)` Transforms the input dataset.
`StructType`	`transformSchema(StructType schema)` :: DeveloperApi ::
`String`	`uid()`

Methods inherited from class org.apache.spark.ml.Transformer
`transform, transform, transform`

Methods inherited from class Object
`equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Methods inherited from interface org.apache.spark.ml.param.Params
`clear, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, paramMap, params, set, set, set, setDefault, setDefault, setDefault, shouldOwn, validateParams`

Methods inherited from interface org.apache.spark.Logging
`initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning`

Constructor Detail

OneHotEncoder

public OneHotEncoder(String uid)

OneHotEncoder

public OneHotEncoder()

Method Detail

uid

public String uid()

dropLast

public final BooleanParam dropLast()

Whether to drop the last category in the encoded vector (default: true)

Returns:: (undocumented)

setDropLast

public OneHotEncoder setDropLast(boolean value)

setInputCol

public OneHotEncoder setInputCol(String value)

setOutputCol

public OneHotEncoder setOutputCol(String value)

transformSchema

public StructType transformSchema(StructType schema)

Description copied from class: PipelineStage

:: DeveloperApi ::

Derives the output schema from the input schema.

Specified by:: transformSchema in class PipelineStage

Parameters:: schema - (undocumented)
Returns:: (undocumented)

transform

public DataFrame transform(DataFrame dataset)

Description copied from class: Transformer

Transforms the input dataset.

Specified by:: transform in class Transformer

Parameters:: dataset - (undocumented)
Returns:: (undocumented)

copy

public OneHotEncoder copy(ParamMap extra)

Description copied from interface: Params

Creates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly.

Specified by:: copy in interface Params
Specified by:: copy in class Transformer

Parameters:: extra - (undocumented)
Returns:: (undocumented)
See Also:: defaultCopy()

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.spark.ml.feature Class OneHotEncoder

OneHotEncoder

OneHotEncoder

uid

dropLast

setDropLast

setInputCol

setOutputCol

transformSchema

transform

copy

org.apache.spark.ml.feature
Class OneHotEncoder