public class VectorAssembler extends Transformer implements HasInputCols, HasOutputCol, HasHandleInvalid, DefaultParamsWritable
This requires one pass over the entire dataset. In case we need to infer column lengths from the data we require an additional call to the 'first' Dataset method, see 'handleInvalid' parameter.
| Constructor and Description |
|---|
VectorAssembler() |
VectorAssembler(String uid) |
| Modifier and Type | Method and Description |
|---|---|
VectorAssembler |
copy(ParamMap extra)
Creates a copy of this instance with the same UID and some extra params.
|
Param<String> |
handleInvalid()
Param for how to handle invalid data (NULL values).
|
StringArrayParam |
inputCols()
Param for input column names.
|
static VectorAssembler |
load(String path) |
Param<String> |
outputCol()
Param for output column name.
|
static MLReader<T> |
read() |
VectorAssembler |
setHandleInvalid(String value) |
VectorAssembler |
setInputCols(String[] value) |
VectorAssembler |
setOutputCol(String value) |
String |
toString() |
Dataset<Row> |
transform(Dataset<?> dataset)
Transforms the input dataset.
|
StructType |
transformSchema(StructType schema)
Check transform validity and derive the output schema from the input schema.
|
String |
uid()
An immutable unique ID for the object and its derivatives.
|
transform, transform, transformparamsgetInputColsgetOutputColgetHandleInvalidclear, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, onParamChange, paramMap, params, set, set, set, setDefault, setDefault, shouldOwnwritesave$init$, initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, initLock, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$internal$Logging$$log__$eq, org$apache$spark$internal$Logging$$log_, uninitializepublic VectorAssembler(String uid)
public VectorAssembler()
public static VectorAssembler load(String path)
public static MLReader<T> read()
public final Param<String> outputCol()
HasOutputColoutputCol in interface HasOutputColpublic final StringArrayParam inputCols()
HasInputColsinputCols in interface HasInputColspublic String uid()
Identifiableuid in interface Identifiablepublic VectorAssembler setInputCols(String[] value)
public VectorAssembler setOutputCol(String value)
public VectorAssembler setHandleInvalid(String value)
public Param<String> handleInvalid()
VectorSizeHint in a pipeline before VectorAssembler. Column lengths can also be inferred
from first rows of the data since it is safe to do so but only in case of 'error' or 'skip'.
Default: "error"handleInvalid in interface HasHandleInvalidpublic Dataset<Row> transform(Dataset<?> dataset)
Transformertransform in class Transformerdataset - (undocumented)public StructType transformSchema(StructType schema)
PipelineStage
We check validity for interactions between parameters during transformSchema and
raise an exception if any parameter value is invalid. Parameter value checks which
do not depend on other parameters are handled by Param.validate().
Typical implementation should first conduct verification on schema change and parameter validity, including complex parameter interaction checks.
transformSchema in class PipelineStageschema - (undocumented)public VectorAssembler copy(ParamMap extra)
ParamsdefaultCopy().copy in interface Paramscopy in class Transformerextra - (undocumented)public String toString()
toString in interface IdentifiabletoString in class Object