public class StandardScaler
extends Object
implements org.apache.spark.internal.Logging
The "unit std" is computed using the corrected sample standard deviation (https://en.wikipedia.org/wiki/Standard_deviation#Corrected_sample_standard_deviation), which is computed as the square root of the unbiased sample variance.
param: withMean False by default. Centers the data with mean before scaling. It will build a dense output, so take care when applying to sparse input. param: withStd True by default. Scales the data to unit standard deviation.
Constructor and Description |
---|
StandardScaler() |
StandardScaler(boolean withMean,
boolean withStd) |
Modifier and Type | Method and Description |
---|---|
StandardScalerModel |
fit(RDD<Vector> data)
Computes the mean and variance and stores as a model to be used for later scaling.
|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
$init$, initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, initLock, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$internal$Logging$$log__$eq, org$apache$spark$internal$Logging$$log_, uninitialize
public StandardScaler(boolean withMean, boolean withStd)
public StandardScaler()
public StandardScalerModel fit(RDD<Vector> data)
data
- The data used to compute the mean and variance to build the transformation model.