org.apache.spark.ml.feature
Class Bucketizer

Object
  extended by org.apache.spark.ml.PipelineStage
      extended by org.apache.spark.ml.Transformer
          extended by org.apache.spark.ml.Model<Bucketizer>
              extended by org.apache.spark.ml.feature.Bucketizer
All Implemented Interfaces:
java.io.Serializable, Logging, Params

public final class Bucketizer
extends Model<Bucketizer>

:: Experimental :: Bucketizer maps a column of continuous features to a column of feature buckets.

See Also:
Serialized Form

Constructor Summary
Bucketizer()
           
Bucketizer(String uid)
           
 
Method Summary
static double binarySearchForBuckets(double[] splits, double feature)
          Binary searching in several buckets to place each data point.
static boolean checkSplits(double[] splits)
          We require splits to be of length >= 3 and to be in strictly increasing order.
 Bucketizer copy(ParamMap extra)
          Creates a copy of this instance with the same UID and some extra params.
 double[] getSplits()
           
 Bucketizer setInputCol(String value)
           
 Bucketizer setOutputCol(String value)
           
 Bucketizer setSplits(double[] value)
           
 DoubleArrayParam splits()
          Parameter for mapping continuous features into buckets.
 DataFrame transform(DataFrame dataset)
          Transforms the input dataset.
 StructType transformSchema(StructType schema)
          :: DeveloperApi ::
 String uid()
           
 
Methods inherited from class org.apache.spark.ml.Model
hasParent, parent, setParent
 
Methods inherited from class org.apache.spark.ml.Transformer
transform, transform, transform
 
Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.spark.ml.param.Params
clear, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, paramMap, params, set, set, set, setDefault, setDefault, setDefault, shouldOwn, validateParams
 
Methods inherited from interface org.apache.spark.Logging
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning
 

Constructor Detail

Bucketizer

public Bucketizer(String uid)

Bucketizer

public Bucketizer()
Method Detail

checkSplits

public static boolean checkSplits(double[] splits)
We require splits to be of length >= 3 and to be in strictly increasing order.


binarySearchForBuckets

public static double binarySearchForBuckets(double[] splits,
                                            double feature)
Binary searching in several buckets to place each data point.

Parameters:
splits - (undocumented)
feature - (undocumented)
Returns:
(undocumented)
Throws:
SparkException - if a feature is < splits.head or > splits.last

uid

public String uid()

splits

public DoubleArrayParam splits()
Parameter for mapping continuous features into buckets. With n+1 splits, there are n buckets. A bucket defined by splits x,y holds values in the range [x,y) except the last bucket, which also includes y. Splits should be strictly increasing. Values at -inf, inf must be explicitly provided to cover all Double values; otherwise, values outside the splits specified will be treated as errors.

Returns:
(undocumented)

getSplits

public double[] getSplits()

setSplits

public Bucketizer setSplits(double[] value)

setInputCol

public Bucketizer setInputCol(String value)

setOutputCol

public Bucketizer setOutputCol(String value)

transform

public DataFrame transform(DataFrame dataset)
Description copied from class: Transformer
Transforms the input dataset.

Specified by:
transform in class Transformer
Parameters:
dataset - (undocumented)
Returns:
(undocumented)

transformSchema

public StructType transformSchema(StructType schema)
Description copied from class: PipelineStage
:: DeveloperApi ::

Derives the output schema from the input schema.

Specified by:
transformSchema in class PipelineStage
Parameters:
schema - (undocumented)
Returns:
(undocumented)

copy

public Bucketizer copy(ParamMap extra)
Description copied from interface: Params
Creates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly.

Specified by:
copy in interface Params
Specified by:
copy in class Model<Bucketizer>
Parameters:
extra - (undocumented)
Returns:
(undocumented)
See Also:
defaultCopy()