org.apache.spark.mllib.feature
Class StandardScaler

Object
  extended by org.apache.spark.mllib.feature.StandardScaler
All Implemented Interfaces:
Logging

public class StandardScaler
extends Object
implements Logging

:: Experimental :: Standardizes features by removing the mean and scaling to unit std using column summary statistics on the samples in the training set.

param: withMean False by default. Centers the data with mean before scaling. It will build a dense output, so this does not work on sparse input and will raise an exception. param: withStd True by default. Scales the data to unit standard deviation.


Constructor Summary
StandardScaler()
           
StandardScaler(boolean withMean, boolean withStd)
           
 
Method Summary
 StandardScalerModel fit(RDD<Vector> data)
          Computes the mean and variance and stores as a model to be used for later scaling.
 
Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.spark.Logging
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning
 

Constructor Detail

StandardScaler

public StandardScaler(boolean withMean,
                      boolean withStd)

StandardScaler

public StandardScaler()
Method Detail

fit

public StandardScalerModel fit(RDD<Vector> data)
Computes the mean and variance and stores as a model to be used for later scaling.

Parameters:
data - The data used to compute the mean and variance to build the transformation model.
Returns:
a StandardScalarModel