org.apache.spark.mllib.feature.StandardScaler

All Implemented Interfaces:: org.apache.spark.internal.Logging

public class StandardScaler extends Object implements org.apache.spark.internal.Logging

Standardizes features by removing the mean and scaling to unit std using column summary statistics on the samples in the training set.

The "unit std" is computed using the corrected sample standard deviation (https://en.wikipedia.org/wiki/Standard_deviation#Corrected_sample_standard_deviation), which is computed as the square root of the unbiased sample variance.

param: withMean False by default. Centers the data with mean before scaling. It will build a dense output, so take care when applying to sparse input. param: withStd True by default. Scales the data to unit standard deviation.

Nested Class Summary

Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging
org.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
Constructor Summary

Constructors

Constructor

Description

StandardScaler()

StandardScaler(boolean withMean, boolean withStd)
Method Summary

Modifier and Type

Method

Description

StandardScalerModel

fit(RDD<Vector> data)

Computes the mean and variance and stores as a model to be used for later scaling.

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.spark.internal.Logging
initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContext

Constructor Details
- StandardScaler
  
  public StandardScaler(boolean withMean, boolean withStd)
- StandardScaler
  
  public StandardScaler()
Method Details
- fit
  
  public StandardScalerModel fit(RDD<Vector> data)
  
  Computes the mean and variance and stores as a model to be used for later scaling.
  
  Parameters:
  
  data - The data used to compute the mean and variance to build the transformation model.
  
  Returns:
  
  a StandardScalarModel

Class StandardScaler

Nested Class Summary

Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface org.apache.spark.internal.Logging

Constructor Details

StandardScaler

StandardScaler

Method Details

fit