package feature
Feature transformers
The ml.feature package provides common feature transformers that help convert raw data or
features into more suitable forms for model fitting.
Most feature transformers are implemented as Transformers, which transform one DataFrame
into another, e.g., HashingTF.
Some feature transformers are implemented as Estimators, because the transformation requires
some aggregated information of the dataset, e.g., document frequencies in IDF.
For those feature transformers, calling Estimator.fit is required to obtain the model first,
e.g., IDFModel, in order to apply transformation.
The transformation is usually done by appending new columns to the input DataFrame, so all
input columns are carried over.
We try to make each transformer minimal, so it becomes flexible to assemble feature transformation pipelines. Pipeline can be used to chain feature transformers, and VectorAssembler can be used to combine multiple feature transformations, for example:
import org.apache.spark.ml.feature._ import org.apache.spark.ml.Pipeline // a DataFrame with three columns: id (integer), text (string), and rating (double). val df = spark.createDataFrame(Seq( (0, "Hi I heard about Spark", 3.0), (1, "I wish Java could use case classes", 4.0), (2, "Logistic regression models are neat", 4.0) )).toDF("id", "text", "rating") // define feature transformers val tok = new RegexTokenizer() .setInputCol("text") .setOutputCol("words") val sw = new StopWordsRemover() .setInputCol("words") .setOutputCol("filtered_words") val tf = new HashingTF() .setInputCol("filtered_words") .setOutputCol("tf") .setNumFeatures(10000) val idf = new IDF() .setInputCol("tf") .setOutputCol("tf_idf") val assembler = new VectorAssembler() .setInputCols(Array("tf_idf", "rating")) .setOutputCol("features") // assemble and fit the feature transformation pipeline val pipeline = new Pipeline() .setStages(Array(tok, sw, tf, idf, assembler)) val model = pipeline.fit(df) // save transformed features with raw data model.transform(df) .select("id", "text", "rating", "features") .write.format("parquet").save("/output/path")
Some feature transformers implemented in MLlib are inspired by those implemented in scikit-learn. The major difference is that most scikit-learn feature transformers operate eagerly on the entire input dataset, while MLlib's feature transformers operate lazily on individual columns, which is more efficient and flexible to handle large and complex datasets.
- Source
- package.scala
- See also
- Alphabetic
- By Inheritance
- feature
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Type Members
- 
      
      
      
        
      
    
      
        final 
        class
      
      
        Binarizer extends Transformer with HasThreshold with HasThresholds with HasInputCol with HasOutputCol with HasInputCols with HasOutputCols with DefaultParamsWritable
      
      
      Binarize a column of continuous features given a threshold. Binarize a column of continuous features given a threshold. Since 3.0.0, Binarizecan map multiple columns at once by setting theinputColsparameter. Note that when both theinputColandinputColsparameters are set, an Exception will be thrown. Thethresholdparameter is used for single column usage, andthresholdsis for multiple columns.- Annotations
- @Since( "1.4.0" )
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        BucketedRandomProjectionLSH extends LSH[BucketedRandomProjectionLSHModel] with BucketedRandomProjectionLSHParams with HasSeed
      
      
      This BucketedRandomProjectionLSH implements Locality Sensitive Hashing functions for Euclidean distance metrics. This BucketedRandomProjectionLSH implements Locality Sensitive Hashing functions for Euclidean distance metrics. The input is dense or sparse vectors, each of which represents a point in the Euclidean distance space. The output will be vectors of configurable dimension. Hash values in the same dimension are calculated by the same hash function. References: 1. Wikipedia on Stable Distributions 2. Wang, Jingdong et al. "Hashing for similarity search: A survey." arXiv preprint arXiv:1408.2927 (2014). - Annotations
- @Since( "2.1.0" )
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        BucketedRandomProjectionLSHModel extends LSHModel[BucketedRandomProjectionLSHModel] with BucketedRandomProjectionLSHParams
      
      
      Model produced by BucketedRandomProjectionLSH, where multiple random vectors are stored. Model produced by BucketedRandomProjectionLSH, where multiple random vectors are stored. The vectors are normalized to be unit vectors and each vector is used in a hash function: h_i(x) = floor(r_i.dot(x) / bucketLength)wherer_iis the i-th random unit vector. The number of buckets will be(max L2 norm of input vectors) / bucketLength.- Annotations
- @Since( "2.1.0" )
 
- 
      
      
      
        
      
    
      
        final 
        class
      
      
        Bucketizer extends Model[Bucketizer] with HasHandleInvalid with HasInputCol with HasOutputCol with HasInputCols with HasOutputCols with DefaultParamsWritable
      
      
      Bucketizermaps a column of continuous features to a column of feature buckets.Bucketizermaps a column of continuous features to a column of feature buckets.Since 2.3.0, Bucketizercan map multiple columns at once by setting theinputColsparameter. Note that when both theinputColandinputColsparameters are set, an Exception will be thrown. Thesplitsparameter is only used for single column usage, andsplitsArrayis for multiple columns.- Annotations
- @Since( "1.4.0" )
 
- 
      
      
      
        
      
    
      
        final 
        class
      
      
        ChiSqSelectorModel extends SelectorModel[ChiSqSelectorModel]
      
      
      Model fitted by ChiSqSelector. Model fitted by ChiSqSelector. - Annotations
- @Since( "1.6.0" )
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        CountVectorizer extends Estimator[CountVectorizerModel] with CountVectorizerParams with DefaultParamsWritable
      
      
      Extracts a vocabulary from document collections and generates a CountVectorizerModel. Extracts a vocabulary from document collections and generates a CountVectorizerModel. - Annotations
- @Since( "1.5.0" )
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        CountVectorizerModel extends Model[CountVectorizerModel] with CountVectorizerParams with MLWritable
      
      
      Converts a text document to a sparse vector of token counts. Converts a text document to a sparse vector of token counts. - Annotations
- @Since( "1.5.0" )
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        DCT extends UnaryTransformer[Vector, Vector, DCT] with DefaultParamsWritable
      
      
      A feature transformer that takes the 1D discrete cosine transform of a real vector. A feature transformer that takes the 1D discrete cosine transform of a real vector. No zero padding is performed on the input vector. It returns a real vector of the same length representing the DCT. The return vector is scaled such that the transform matrix is unitary (aka scaled DCT-II). More information on DCT-II in Discrete cosine transform (Wikipedia). - Annotations
- @Since( "1.5.0" )
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        ElementwiseProduct extends UnaryTransformer[Vector, Vector, ElementwiseProduct] with DefaultParamsWritable
      
      
      Outputs the Hadamard product (i.e., the element-wise product) of each input vector with a provided "weight" vector. Outputs the Hadamard product (i.e., the element-wise product) of each input vector with a provided "weight" vector. In other words, it scales each column of the dataset by a scalar multiplier. - Annotations
- @Since( "1.4.0" )
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        FeatureHasher extends Transformer with HasInputCols with HasOutputCol with HasNumFeatures with DefaultParamsWritable
      
      
      Feature hashing projects a set of categorical or numerical features into a feature vector of specified dimension (typically substantially smaller than that of the original feature space). Feature hashing projects a set of categorical or numerical features into a feature vector of specified dimension (typically substantially smaller than that of the original feature space). This is done using the hashing trick (https://en.wikipedia.org/wiki/Feature_hashing) to map features to indices in the feature vector. The FeatureHasher transformer operates on multiple columns. Each column may contain either numeric or categorical features. Behavior and handling of column data types is as follows: -Numeric columns: For numeric features, the hash value of the column name is used to map the feature value to its index in the feature vector. By default, numeric features are not treated as categorical (even when they are integers). To treat them as categorical, specify the relevant columns in categoricalCols. -String columns: For categorical features, the hash value of the string "column_name=value" is used to map to the vector index, with an indicator value of1.0. Thus, categorical features are "one-hot" encoded (similarly to using OneHotEncoder withdropLast=false). -Boolean columns: Boolean values are treated in the same way as string columns. That is, boolean features are represented as "column_name=true" or "column_name=false", with an indicator value of1.0.Null (missing) values are ignored (implicitly zero in the resulting feature vector). The hash function used here is also the MurmurHash 3 used in HashingTF. Since a simple modulo on the hashed value is used to determine the vector index, it is advisable to use a power of two as the numFeatures parameter; otherwise the features will not be mapped evenly to the vector indices. val df = Seq( (2.0, true, "1", "foo"), (3.0, false, "2", "bar") ).toDF("real", "bool", "stringNum", "string") val hasher = new FeatureHasher() .setInputCols("real", "bool", "stringNum", "string") .setOutputCol("features") hasher.transform(df).show(false) +----+-----+---------+------+------------------------------------------------------+ |real|bool |stringNum|string|features | +----+-----+---------+------+------------------------------------------------------+ |2.0 |true |1 |foo |(262144,[51871,63643,174475,253195],[1.0,1.0,2.0,1.0])| |3.0 |false|2 |bar |(262144,[6031,80619,140467,174475],[1.0,1.0,1.0,3.0]) | +----+-----+---------+------+------------------------------------------------------+ - Annotations
- @Since( "2.3.0" )
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        HashingTF extends Transformer with HasInputCol with HasOutputCol with HasNumFeatures with DefaultParamsWritable
      
      
      Maps a sequence of terms to their term frequencies using the hashing trick. Maps a sequence of terms to their term frequencies using the hashing trick. Currently we use Austin Appleby's MurmurHash 3 algorithm (MurmurHash3_x86_32) to calculate the hash code value for the term object. Since a simple modulo is used to transform the hash function to a column index, it is advisable to use a power of two as the numFeatures parameter; otherwise the features will not be mapped evenly to the columns. - Annotations
- @Since( "1.2.0" )
 
- 
      
      
      
        
      
    
      
        final 
        class
      
      
        IDF extends Estimator[IDFModel] with IDFBase with DefaultParamsWritable
      
      
      Compute the Inverse Document Frequency (IDF) given a collection of documents. Compute the Inverse Document Frequency (IDF) given a collection of documents. - Annotations
- @Since( "1.4.0" )
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        IDFModel extends Model[IDFModel] with IDFBase with MLWritable
      
      
      Model fitted by IDF. Model fitted by IDF. - Annotations
- @Since( "1.4.0" )
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        Imputer extends Estimator[ImputerModel] with ImputerParams with DefaultParamsWritable
      
      
      Imputation estimator for completing missing values, using the mean, median or mode of the columns in which the missing values are located. Imputation estimator for completing missing values, using the mean, median or mode of the columns in which the missing values are located. The input columns should be of numeric type. Currently Imputer does not support categorical features (SPARK-15041) and possibly creates incorrect values for a categorical feature. Note when an input column is integer, the imputed value is casted (truncated) to an integer type. For example, if the input column is IntegerType (1, 2, 4, null), the output will be IntegerType (1, 2, 4, 2) after mean imputation. Note that the mean/median/mode value is computed after filtering out missing values. All Null values in the input columns are treated as missing, and so are also imputed. For computing median, DataFrameStatFunctions.approxQuantile is used with a relative error of 0.001. - Annotations
- @Since( "2.2.0" )
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        ImputerModel extends Model[ImputerModel] with ImputerParams with MLWritable
      
      
      Model fitted by Imputer. Model fitted by Imputer. - Annotations
- @Since( "2.2.0" )
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        IndexToString extends Transformer with HasInputCol with HasOutputCol with DefaultParamsWritable
      
      
      A Transformerthat maps a column of indices back to a new column of corresponding string values.A Transformerthat maps a column of indices back to a new column of corresponding string values. The index-string mapping is either from the ML attributes of the input column, or from user-supplied labels (which take precedence over ML attributes).- Annotations
- @Since( "1.5.0" )
- See also
- StringIndexerfor converting strings into indices
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        Interaction extends Transformer with HasInputCols with HasOutputCol with DefaultParamsWritable
      
      
      Implements the feature interaction transform. Implements the feature interaction transform. This transformer takes in Double and Vector type columns and outputs a flattened vector of their feature interactions. To handle interaction, we first one-hot encode any nominal features. Then, a vector of the feature cross-products is produced. For example, given the input feature values Double(2)andVector(3, 4), the output would beVector(6, 8)if all input features were numeric. If the first feature was instead nominal with four categories, the output would then beVector(0, 0, 0, 0, 3, 4, 0, 0).- Annotations
- @Since( "1.6.0" )
 
- 
      
      
      
        
      
    
      
        
        case class
      
      
        LabeledPoint(label: Double, features: Vector) extends Product with Serializable
      
      
      Class that represents the features and label of a data point. Class that represents the features and label of a data point. - label
- Label for this data point. 
- features
- List of features for this data point. 
 - Annotations
- @Since( "2.0.0" )
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        MaxAbsScaler extends Estimator[MaxAbsScalerModel] with MaxAbsScalerParams with DefaultParamsWritable
      
      
      Rescale each feature individually to range [-1, 1] by dividing through the largest maximum absolute value in each feature. Rescale each feature individually to range [-1, 1] by dividing through the largest maximum absolute value in each feature. It does not shift/center the data, and thus does not destroy any sparsity. - Annotations
- @Since( "2.0.0" )
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        MaxAbsScalerModel extends Model[MaxAbsScalerModel] with MaxAbsScalerParams with MLWritable
      
      
      Model fitted by MaxAbsScaler. Model fitted by MaxAbsScaler. - Annotations
- @Since( "2.0.0" )
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        MinHashLSH extends LSH[MinHashLSHModel] with HasSeed
      
      
      LSH class for Jaccard distance. LSH class for Jaccard distance. The input can be dense or sparse vectors, but it is more efficient if it is sparse. For example, Vectors.sparse(10, Array((2, 1.0), (3, 1.0), (5, 1.0)))means there are 10 elements in the space. This set contains elements 2, 3, and 5. Also, any input vector must have at least 1 non-zero index, and all non-zero values are treated as binary "1" values.References: Wikipedia on MinHash - Annotations
- @Since( "2.1.0" )
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        MinHashLSHModel extends LSHModel[MinHashLSHModel]
      
      
      Model produced by MinHashLSH, where multiple hash functions are stored. Model produced by MinHashLSH, where multiple hash functions are stored. Each hash function is picked from the following family of hash functions, where a_i and b_i are randomly chosen integers less than prime: h_i(x) = ((x \cdot a_i + b_i) \mod prime)This hash family is approximately min-wise independent according to the reference. Reference: Tom Bohman, Colin Cooper, and Alan Frieze. "Min-wise independent linear permutations." Electronic Journal of Combinatorics 7 (2000): R26. - Annotations
- @Since( "2.1.0" )
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        MinMaxScaler extends Estimator[MinMaxScalerModel] with MinMaxScalerParams with DefaultParamsWritable
      
      
      Rescale each feature individually to a common range [min, max] linearly using column summary statistics, which is also known as min-max normalization or Rescaling. Rescale each feature individually to a common range [min, max] linearly using column summary statistics, which is also known as min-max normalization or Rescaling. The rescaled value for feature E is calculated as: $$ Rescaled(e_i) = \frac{e_i - E_{min}}{E_{max} - E_{min}} * (max - min) + min $$ For the case \(E_{max} == E_{min}\), \(Rescaled(e_i) = 0.5 * (max + min)\). - Annotations
- @Since( "1.5.0" )
- Note
- Since zero values will probably be transformed to non-zero values, output of the transformer will be DenseVector even for sparse input. 
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        MinMaxScalerModel extends Model[MinMaxScalerModel] with MinMaxScalerParams with MLWritable
      
      
      Model fitted by MinMaxScaler. Model fitted by MinMaxScaler. - Annotations
- @Since( "1.5.0" )
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        NGram extends UnaryTransformer[Seq[String], Seq[String], NGram] with DefaultParamsWritable
      
      
      A feature transformer that converts the input array of strings into an array of n-grams. A feature transformer that converts the input array of strings into an array of n-grams. Null values in the input array are ignored. It returns an array of n-grams where each n-gram is represented by a space-separated string of words. When the input is empty, an empty array is returned. When the input array length is less than n (number of elements per n-gram), no n-grams are returned. - Annotations
- @Since( "1.5.0" )
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        Normalizer extends UnaryTransformer[Vector, Vector, Normalizer] with DefaultParamsWritable
      
      
      Normalize a vector to have unit norm using the given p-norm. Normalize a vector to have unit norm using the given p-norm. - Annotations
- @Since( "1.4.0" )
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        OneHotEncoder extends Estimator[OneHotEncoderModel] with OneHotEncoderBase with DefaultParamsWritable
      
      
      A one-hot encoder that maps a column of category indices to a column of binary vectors, with at most a single one-value per row that indicates the input category index. A one-hot encoder that maps a column of category indices to a column of binary vectors, with at most a single one-value per row that indicates the input category index. For example with 5 categories, an input value of 2.0 would map to an output vector of [0.0, 0.0, 1.0, 0.0]. The last category is not included by default (configurable viadropLast), because it makes the vector entries sum up to one, and hence linearly dependent. So an input value of 4.0 maps to[0.0, 0.0, 0.0, 0.0].- Annotations
- @Since( "3.0.0" )
- Note
- This is different from scikit-learn's OneHotEncoder, which keeps all categories. The output vectors are sparse. When ,- handleInvalidis configured to 'keep', an extra "category" indicating invalid values is added as last category. So when- dropLastis true, invalid values are encoded as all-zeros vector.- When encoding multi-column by using - inputColsand- outputColsparams, input/output cols come in pairs, specified by the order in the arrays, and each pair is treated independently.
- See also
- StringIndexerfor converting categorical values into category indices
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        OneHotEncoderModel extends Model[OneHotEncoderModel] with OneHotEncoderBase with MLWritable
      
      
      - Annotations
- @Since( "3.0.0" )
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        PCA extends Estimator[PCAModel] with PCAParams with DefaultParamsWritable
      
      
      PCA trains a model to project vectors to a lower dimensional space of the top PCA!.kprincipal components.PCA trains a model to project vectors to a lower dimensional space of the top PCA!.kprincipal components.- Annotations
- @Since( "1.5.0" )
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        PCAModel extends Model[PCAModel] with PCAParams with MLWritable
      
      
      Model fitted by PCA. Model fitted by PCA. Transforms vectors to a lower dimensional space. - Annotations
- @Since( "1.5.0" )
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        PolynomialExpansion extends UnaryTransformer[Vector, Vector, PolynomialExpansion] with DefaultParamsWritable
      
      
      Perform feature expansion in a polynomial space. Perform feature expansion in a polynomial space. As said in wikipedia of Polynomial Expansion, which is available at Polynomial expansion (Wikipedia) , "In mathematics, an expansion of a product of sums expresses it as a sum of products by using the fact that multiplication distributes over addition". Take a 2-variable feature vector as an example: (x, y), if we want to expand it with degree 2, then we get(x, x * x, y, x * y, y * y).- Annotations
- @Since( "1.4.0" )
 
- 
      
      
      
        
      
    
      
        final 
        class
      
      
        QuantileDiscretizer extends Estimator[Bucketizer] with QuantileDiscretizerBase with DefaultParamsWritable
      
      
      QuantileDiscretizertakes a column with continuous features and outputs a column with binned categorical features.QuantileDiscretizertakes a column with continuous features and outputs a column with binned categorical features. The number of bins can be set using thenumBucketsparameter. It is possible that the number of buckets used will be smaller than this value, for example, if there are too few distinct values of the input to create enough distinct quantiles. Since 2.3.0,QuantileDiscretizercan map multiple columns at once by setting theinputColsparameter. If both of theinputColandinputColsparameters are set, an Exception will be thrown. To specify the number of buckets for each column, thenumBucketsArrayparameter can be set, or if the number of buckets should be the same across columns,numBucketscan be set as a convenience. Note that in multiple columns case, relative error is applied to all columns.NaN handling: null and NaN values will be ignored from the column during QuantileDiscretizerfitting. This will produce aBucketizermodel for making predictions. During the transformation,Bucketizerwill raise an error when it finds NaN values in the dataset, but the user can also choose to either keep or remove NaN values within the dataset by settinghandleInvalid. If the user chooses to keep NaN values, they will be handled specially and placed into their own bucket, for example, if 4 buckets are used, then non-NaN data will be put into buckets[0-3], but NaNs will be counted in a special bucket[4].Algorithm: The bin ranges are chosen using an approximate algorithm (see the documentation for org.apache.spark.sql.DataFrameStatFunctions.approxQuantilefor a detailed description). The precision of the approximation can be controlled with therelativeErrorparameter. The lower and upper bin bounds will be-Infinityand+Infinity, covering all real values.- Annotations
- @Since( "1.6.0" )
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        RFormula extends Estimator[RFormulaModel] with RFormulaBase with DefaultParamsWritable
      
      
      Implements the transforms required for fitting a dataset against an R model formula. Implements the transforms required for fitting a dataset against an R model formula. Currently we support a limited subset of the R operators, including '~', '.', ':', '+', '-', '*' and '^'. Also see the R formula docs here: http://stat.ethz.ch/R-manual/R-patched/library/stats/html/formula.html The basic operators are: - ~separate target and terms
- +concat terms, "+ 0" means removing intercept
- -remove a term, "- 1" means removing intercept
- :interaction (multiplication for numeric values, or binarized categorical values)
- .all columns except target
- *factor crossing, includes the terms and interactions between them
- ^- factor crossing to a specified degree
 Suppose aandbare double columns, we use the following simple examples to illustrate the effect ofRFormula:- y ~ a + bmeans model- y ~ w0 + w1 * a + w2 * bwhere- w0is the intercept and- w1, w2are coefficients.
- y ~ a + b + a:b - 1means model- y ~ w1 * a + w2 * b + w3 * a * bwhere- w1, w2, w3are coefficients.
- y ~ a * bmeans model- y ~ w0 + w1 * a + w2 * b + w3 * a * bwhere- w0is the intercept and- w1, w2, w3are coefficients
- y ~ (a + b)^2- means modely ~ w0 + w1 * a + w2 * b + w3 * a * b- wherew0- is the intercept andw1, w2, w3- are coefficients
 RFormula produces a vector column of features and a double or string column of label. Like when formulas are used in R for linear regression, string input columns will be one-hot encoded, and numeric columns will be cast to doubles. If the label column is of type string, it will be first transformed to double with StringIndexer. If the label column does not exist in the DataFrame, the output label column will be created from the specified response variable in the formula.- Annotations
- @Since( "1.5.0" )
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        RFormulaModel extends Model[RFormulaModel] with RFormulaBase with MLWritable
      
      
      Model fitted by RFormula. Model fitted by RFormula. Fitting is required to determine the factor levels of formula terms. - Annotations
- @Since( "1.5.0" )
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        RegexTokenizer extends UnaryTransformer[String, Seq[String], RegexTokenizer] with DefaultParamsWritable
      
      
      A regex based tokenizer that extracts tokens either by using the provided regex pattern to split the text (default) or repeatedly matching the regex (if gapsis false).A regex based tokenizer that extracts tokens either by using the provided regex pattern to split the text (default) or repeatedly matching the regex (if gapsis false). Optional parameters also allow filtering tokens using a minimal length. It returns an array of strings that can be empty.- Annotations
- @Since( "1.4.0" )
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        RobustScaler extends Estimator[RobustScalerModel] with RobustScalerParams with DefaultParamsWritable
      
      
      Scale features using statistics that are robust to outliers. Scale features using statistics that are robust to outliers. RobustScaler removes the median and scales the data according to the quantile range. The quantile range is by default IQR (Interquartile Range, quantile range between the 1st quartile = 25th quantile and the 3rd quartile = 75th quantile) but can be configured. Centering and scaling happen independently on each feature by computing the relevant statistics on the samples in the training set. Median and quantile range are then stored to be used on later data using the transform method. Standardization of a dataset is a common requirement for many machine learning estimators. Typically this is done by removing the mean and scaling to unit variance. However, outliers can often influence the sample mean / variance in a negative way. In such cases, the median and the quantile range often give better results. Note that NaN values are ignored in the computation of medians and ranges. - Annotations
- @Since( "3.0.0" )
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        RobustScalerModel extends Model[RobustScalerModel] with RobustScalerParams with MLWritable
      
      
      Model fitted by RobustScaler. Model fitted by RobustScaler. - Annotations
- @Since( "3.0.0" )
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        SQLTransformer extends Transformer with DefaultParamsWritable
      
      
      Implements the transformations which are defined by SQL statement. Implements the transformations which are defined by SQL statement. Currently we only support SQL syntax like 'SELECT ... FROM THIS ...' where 'THIS' represents the underlying table of the input dataset. The select clause specifies the fields, constants, and expressions to display in the output, it can be any select clause that Spark SQL supports. Users can also use Spark SQL built-in function and UDFs to operate on these selected columns. For example, SQLTransformer supports statements like: SELECT a, a + b AS a_b FROM __THIS__ SELECT a, SQRT(b) AS b_sqrt FROM __THIS__ where a > 5 SELECT a, b, SUM(c) AS c_sum FROM __THIS__ GROUP BY a, b- Annotations
- @Since( "1.6.0" )
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        StandardScaler extends Estimator[StandardScalerModel] with StandardScalerParams with DefaultParamsWritable
      
      
      Standardizes features by removing the mean and scaling to unit variance using column summary statistics on the samples in the training set. Standardizes features by removing the mean and scaling to unit variance using column summary statistics on the samples in the training set. The "unit std" is computed using the corrected sample standard deviation, which is computed as the square root of the unbiased sample variance. - Annotations
- @Since( "1.2.0" )
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        StandardScalerModel extends Model[StandardScalerModel] with StandardScalerParams with MLWritable
      
      
      Model fitted by StandardScaler. Model fitted by StandardScaler. - Annotations
- @Since( "1.2.0" )
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        StopWordsRemover extends Transformer with HasInputCol with HasOutputCol with HasInputCols with HasOutputCols with DefaultParamsWritable
      
      
      A feature transformer that filters out stop words from input. A feature transformer that filters out stop words from input. Since 3.0.0, StopWordsRemovercan filter out multiple columns at once by setting theinputColsparameter. Note that when both theinputColandinputColsparameters are set, an Exception will be thrown.- Annotations
- @Since( "1.5.0" )
- Note
- null values from input array are preserved unless adding null to stopWords explicitly. 
- See also
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        StringIndexer extends Estimator[StringIndexerModel] with StringIndexerBase with DefaultParamsWritable
      
      
      A label indexer that maps string column(s) of labels to ML column(s) of label indices. A label indexer that maps string column(s) of labels to ML column(s) of label indices. If the input columns are numeric, we cast them to string and index the string values. The indices are in [0, numLabels). By default, this is ordered by label frequencies so the most frequent label gets index 0. The ordering behavior is controlled by setting stringOrderType.- Annotations
- @Since( "1.4.0" )
- See also
- IndexToStringfor the inverse transformation
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        StringIndexerModel extends Model[StringIndexerModel] with StringIndexerBase with MLWritable
      
      
      Model fitted by StringIndexer. Model fitted by StringIndexer. - Annotations
- @Since( "1.4.0" )
- Note
- During transformation, if any input column does not exist, - StringIndexerModel.transformwould skip the input column. If all input columns do not exist, it returns the input dataset unmodified. This is a temporary fix for the case when target labels do not exist during prediction.
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        Tokenizer extends UnaryTransformer[String, Seq[String], Tokenizer] with DefaultParamsWritable
      
      
      A tokenizer that converts the input string to lowercase and then splits it by white spaces. A tokenizer that converts the input string to lowercase and then splits it by white spaces. - Annotations
- @Since( "1.2.0" )
- See also
 
- 
      
      
      
        
      
    
      
        final 
        class
      
      
        UnivariateFeatureSelector extends Estimator[UnivariateFeatureSelectorModel] with UnivariateFeatureSelectorParams with DefaultParamsWritable
      
      
      Feature selector based on univariate statistical tests against labels. Feature selector based on univariate statistical tests against labels. Currently, Spark supports three Univariate Feature Selectors: chi-squared, ANOVA F-test and F-value. User can choose Univariate Feature Selector by setting featureTypeandlabelType, and Spark will pick the score function based on the specifiedfeatureTypeandlabelType.The following combination of featureTypeandlabelTypeare supported:- featureType- categoricaland- labelType- categorical: Spark uses chi-squared, i.e. chi2 in sklearn.
- featureType- continuousand- labelType- categorical: Spark uses ANOVA F-test, i.e. f_classif in sklearn.
- featureType- continuousand- labelType- continuous: Spark uses F-value, i.e. f_regression in sklearn.
 The UnivariateFeatureSelectorsupports different selection modes:numTopFeatures,percentile,fpr,fdr,fwe.- numTopFeatureschooses a fixed number of top features according to a hypothesis.
- percentileis similar but chooses a fraction of all features instead of a fixed number.
- fprchooses all features whose p-value are below a threshold, thus controlling the false positive rate of selection.
- fdruses the Benjamini-Hochberg procedure to choose all features whose false discovery rate is below a threshold.
- fwechooses all features whose p-values are below a threshold. The threshold is scaled by 1/numFeatures, thus controlling the family-wise error rate of selection.
 By default, the selection mode is numTopFeatures.- Annotations
- @Since( "3.1.1" )
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        UnivariateFeatureSelectorModel extends Model[UnivariateFeatureSelectorModel] with UnivariateFeatureSelectorParams with MLWritable
      
      
      Model fitted by UnivariateFeatureSelectorModel. Model fitted by UnivariateFeatureSelectorModel. - Annotations
- @Since( "3.1.1" )
 
- 
      
      
      
        
      
    
      
        final 
        class
      
      
        VarianceThresholdSelector extends Estimator[VarianceThresholdSelectorModel] with VarianceThresholdSelectorParams with DefaultParamsWritable
      
      
      Feature selector that removes all low-variance features. Feature selector that removes all low-variance features. Features with a (sample) variance not greater than the threshold will be removed. The default is to keep all features with non-zero variance, i.e. remove the features that have the same value in all samples. - Annotations
- @Since( "3.1.0" )
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        VarianceThresholdSelectorModel extends Model[VarianceThresholdSelectorModel] with VarianceThresholdSelectorParams with MLWritable
      
      
      Model fitted by VarianceThresholdSelector. Model fitted by VarianceThresholdSelector. - Annotations
- @Since( "3.1.0" )
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        VectorAssembler extends Transformer with HasInputCols with HasOutputCol with HasHandleInvalid with DefaultParamsWritable
      
      
      A feature transformer that merges multiple columns into a vector column. A feature transformer that merges multiple columns into a vector column. This requires one pass over the entire dataset. In case we need to infer column lengths from the data we require an additional call to the 'first' Dataset method, see 'handleInvalid' parameter. - Annotations
- @Since( "1.4.0" )
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        VectorIndexer extends Estimator[VectorIndexerModel] with VectorIndexerParams with DefaultParamsWritable
      
      
      Class for indexing categorical feature columns in a dataset of Vector.Class for indexing categorical feature columns in a dataset of Vector.This has 2 usage modes: - Automatically identify categorical features (default behavior)- This helps process a dataset of unknown vectors into a dataset with some continuous features and some categorical features. The choice between continuous and categorical is based upon a maxCategories parameter.
- Set maxCategories to the maximum number of categorical any categorical feature should have.
- E.g.: Feature 0 has unique values {-1.0, 0.0}, and feature 1 values {1.0, 3.0, 5.0}. If maxCategories = 2, then feature 0 will be declared categorical and use indices {0, 1}, and feature 1 will be declared continuous.
 
- Index all features, if all features are categorical- If maxCategories is set to be very large, then this will build an index of unique values for all features.
- Warning: This can cause problems if features are continuous since this will collect ALL unique values to the driver.
- E.g.: Feature 0 has unique values {-1.0, 0.0}, and feature 1 values {1.0, 3.0, 5.0}. If maxCategories is greater than or equal to 3, then both features will be declared categorical.
 
 This returns a model which can transform categorical features to use 0-based indices. Index stability: - This is not guaranteed to choose the same category index across multiple runs.
- If a categorical feature includes value 0, then this is guaranteed to map value 0 to index 0. This maintains vector sparsity.
- More stability may be added in the future.
 TODO: Future extensions: The following functionality is planned for the future: - Preserve metadata in transform; if a feature's metadata is already present, do not recompute.
- Specify certain features to not index, either via a parameter or via existing metadata.
- Add warning if a categorical feature has only 1 category.
 - Annotations
- @Since( "1.4.0" )
 
- Automatically identify categorical features (default behavior)
- 
      
      
      
        
      
    
      
        
        class
      
      
        VectorIndexerModel extends Model[VectorIndexerModel] with VectorIndexerParams with MLWritable
      
      
      Model fitted by VectorIndexer. Model fitted by VectorIndexer. Transform categorical features to use 0-based indices instead of their original values. - Categorical features are mapped to indices.
- Continuous features (columns) are left unchanged. This also appends metadata to the output column, marking features as Numeric (continuous), Nominal (categorical), or Binary (either continuous or categorical). Non-ML metadata is not carried over from the input to the output column.
 This maintains vector sparsity. - Annotations
- @Since( "1.4.0" )
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        VectorSizeHint extends Transformer with HasInputCol with HasHandleInvalid with DefaultParamsWritable
      
      
      A feature transformer that adds size information to the metadata of a vector column. A feature transformer that adds size information to the metadata of a vector column. VectorAssembler needs size information for its input columns and cannot be used on streaming dataframes without this metadata. Note: VectorSizeHint modifies inputColto include size metadata and does not have an outputCol.- Annotations
- @Since( "2.3.0" )
 
- 
      
      
      
        
      
    
      
        final 
        class
      
      
        VectorSlicer extends Transformer with HasInputCol with HasOutputCol with DefaultParamsWritable
      
      
      This class takes a feature vector and outputs a new feature vector with a subarray of the original features. This class takes a feature vector and outputs a new feature vector with a subarray of the original features. The subset of features can be specified with either indices ( setIndices()) or names (setNames()). At least one feature must be selected. Duplicate features are not allowed, so there can be no overlap between selected indices and names.The output vector will order features with the selected indices first (in the order given), followed by the selected names (in the order given). - Annotations
- @Since( "1.5.0" )
 
- 
      
      
      
        
      
    
      
        final 
        class
      
      
        Word2Vec extends Estimator[Word2VecModel] with Word2VecBase with DefaultParamsWritable
      
      
      Word2Vec trains a model of Map(String, Vector), i.e.Word2Vec trains a model of Map(String, Vector), i.e. transforms a word into a code for further natural language processing or machine learning process.- Annotations
- @Since( "1.4.0" )
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        Word2VecModel extends Model[Word2VecModel] with Word2VecBase with MLWritable
      
      
      Model fitted by Word2Vec. Model fitted by Word2Vec. - Annotations
- @Since( "1.4.0" )
 
- 
      
      
      
        
      
    
      
        final 
        class
      
      
        ChiSqSelector extends Selector[ChiSqSelectorModel]
      
      
      Chi-Squared feature selection, which selects categorical features to use for predicting a categorical label. Chi-Squared feature selection, which selects categorical features to use for predicting a categorical label. The selector supports different selection methods: numTopFeatures,percentile,fpr,fdr,fwe.- numTopFeatureschooses a fixed number of top features according to a chi-squared test.
- percentileis similar but chooses a fraction of all features instead of a fixed number.
- fprchooses all features whose p-value are below a threshold, thus controlling the false positive rate of selection.
- fdruses the [Benjamini-Hochberg procedure] (https://en.wikipedia.org/wiki/False_discovery_rate#Benjamini.E2.80.93Hochberg_procedure) to choose all features whose false discovery rate is below a threshold.
- fwechooses all features whose p-values are below a threshold. The threshold is scaled by 1/numFeatures, thus controlling the family-wise error rate of selection. By default, the selection method is- numTopFeatures, with the default number of top features set to 50.
 - Annotations
- @deprecated @Since( "1.6.0" )
- Deprecated
- (Since version 3.1.1) use UnivariateFeatureSelector instead 
 
Value Members
- 
      
      
      
        
      
    
      
        
        object
      
      
        Binarizer extends DefaultParamsReadable[Binarizer] with Serializable
      
      
      - Annotations
- @Since( "1.6.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        BucketedRandomProjectionLSH extends DefaultParamsReadable[BucketedRandomProjectionLSH] with Serializable
      
      
      - Annotations
- @Since( "2.1.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        BucketedRandomProjectionLSHModel extends MLReadable[BucketedRandomProjectionLSHModel] with Serializable
      
      
      - Annotations
- @Since( "2.1.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        Bucketizer extends DefaultParamsReadable[Bucketizer] with Serializable
      
      
      - Annotations
- @Since( "1.6.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        ChiSqSelector extends DefaultParamsReadable[ChiSqSelector] with Serializable
      
      
      - Annotations
- @Since( "1.6.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        ChiSqSelectorModel extends MLReadable[ChiSqSelectorModel] with Serializable
      
      
      - Annotations
- @Since( "1.6.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        CountVectorizer extends DefaultParamsReadable[CountVectorizer] with Serializable
      
      
      - Annotations
- @Since( "1.6.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        CountVectorizerModel extends MLReadable[CountVectorizerModel] with Serializable
      
      
      - Annotations
- @Since( "1.6.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        DCT extends DefaultParamsReadable[DCT] with Serializable
      
      
      - Annotations
- @Since( "1.6.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        ElementwiseProduct extends DefaultParamsReadable[ElementwiseProduct] with Serializable
      
      
      - Annotations
- @Since( "2.0.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        FeatureHasher extends DefaultParamsReadable[FeatureHasher] with Serializable
      
      
      - Annotations
- @Since( "2.3.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        HashingTF extends DefaultParamsReadable[HashingTF] with Serializable
      
      
      - Annotations
- @Since( "1.6.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        IDF extends DefaultParamsReadable[IDF] with Serializable
      
      
      - Annotations
- @Since( "1.6.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        IDFModel extends MLReadable[IDFModel] with Serializable
      
      
      - Annotations
- @Since( "1.6.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        Imputer extends DefaultParamsReadable[Imputer] with Serializable
      
      
      - Annotations
- @Since( "2.2.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        ImputerModel extends MLReadable[ImputerModel] with Serializable
      
      
      - Annotations
- @Since( "2.2.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        IndexToString extends DefaultParamsReadable[IndexToString] with Serializable
      
      
      - Annotations
- @Since( "1.6.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        Interaction extends DefaultParamsReadable[Interaction] with Serializable
      
      
      - Annotations
- @Since( "1.6.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        MaxAbsScaler extends DefaultParamsReadable[MaxAbsScaler] with Serializable
      
      
      - Annotations
- @Since( "2.0.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        MaxAbsScalerModel extends MLReadable[MaxAbsScalerModel] with Serializable
      
      
      - Annotations
- @Since( "2.0.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        MinHashLSH extends DefaultParamsReadable[MinHashLSH] with Serializable
      
      
      - Annotations
- @Since( "2.1.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        MinHashLSHModel extends MLReadable[MinHashLSHModel] with Serializable
      
      
      - Annotations
- @Since( "2.1.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        MinMaxScaler extends DefaultParamsReadable[MinMaxScaler] with Serializable
      
      
      - Annotations
- @Since( "1.6.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        MinMaxScalerModel extends MLReadable[MinMaxScalerModel] with Serializable
      
      
      - Annotations
- @Since( "1.6.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        NGram extends DefaultParamsReadable[NGram] with Serializable
      
      
      - Annotations
- @Since( "1.6.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        Normalizer extends DefaultParamsReadable[Normalizer] with Serializable
      
      
      - Annotations
- @Since( "1.6.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        OneHotEncoder extends DefaultParamsReadable[OneHotEncoder] with Serializable
      
      
      - Annotations
- @Since( "3.0.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        OneHotEncoderModel extends MLReadable[OneHotEncoderModel] with Serializable
      
      
      - Annotations
- @Since( "3.0.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        PCA extends DefaultParamsReadable[PCA] with Serializable
      
      
      - Annotations
- @Since( "1.6.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        PCAModel extends MLReadable[PCAModel] with Serializable
      
      
      - Annotations
- @Since( "1.6.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        PolynomialExpansion extends DefaultParamsReadable[PolynomialExpansion] with Serializable
      
      
      The expansion is done via recursion. The expansion is done via recursion. Given n features and degree d, the size after expansion is (n + d choose d) (including 1 and first-order values). For example, let f([a, b, c], 3) be the function that expands [a, b, c] to their monomials of degree 3. We have the following recursion: $$ f([a, b, c], 3) &= f([a, b], 3) ++ f([a, b], 2) * c ++ f([a, b], 1) * c^2 ++ [c^3] $$ To handle sparsity, if c is zero, we can skip all monomials that contain it. We remember the current index and increment it properly for sparse input. - Annotations
- @Since( "1.6.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        QuantileDiscretizer extends DefaultParamsReadable[QuantileDiscretizer] with Logging with Serializable
      
      
      - Annotations
- @Since( "1.6.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        RFormula extends DefaultParamsReadable[RFormula] with Serializable
      
      
      - Annotations
- @Since( "2.0.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        RFormulaModel extends MLReadable[RFormulaModel] with Serializable
      
      
      - Annotations
- @Since( "2.0.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        RegexTokenizer extends DefaultParamsReadable[RegexTokenizer] with Serializable
      
      
      - Annotations
- @Since( "1.6.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        RobustScaler extends DefaultParamsReadable[RobustScaler] with Serializable
      
      
      - Annotations
- @Since( "3.0.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        RobustScalerModel extends MLReadable[RobustScalerModel] with Serializable
      
      
      - Annotations
- @Since( "3.0.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        SQLTransformer extends DefaultParamsReadable[SQLTransformer] with Serializable
      
      
      - Annotations
- @Since( "1.6.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        StandardScaler extends DefaultParamsReadable[StandardScaler] with Serializable
      
      
      - Annotations
- @Since( "1.6.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        StandardScalerModel extends MLReadable[StandardScalerModel] with Serializable
      
      
      - Annotations
- @Since( "1.6.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        StopWordsRemover extends DefaultParamsReadable[StopWordsRemover] with Serializable
      
      
      - Annotations
- @Since( "1.6.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        StringIndexer extends DefaultParamsReadable[StringIndexer] with Serializable
      
      
      - Annotations
- @Since( "1.6.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        StringIndexerModel extends MLReadable[StringIndexerModel] with Serializable
      
      
      - Annotations
- @Since( "1.6.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        Tokenizer extends DefaultParamsReadable[Tokenizer] with Serializable
      
      
      - Annotations
- @Since( "1.6.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        UnivariateFeatureSelector extends DefaultParamsReadable[UnivariateFeatureSelector] with Serializable
      
      
      - Annotations
- @Since( "3.1.1" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        UnivariateFeatureSelectorModel extends MLReadable[UnivariateFeatureSelectorModel] with Serializable
      
      
      - Annotations
- @Since( "3.1.1" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        VarianceThresholdSelector extends DefaultParamsReadable[VarianceThresholdSelector] with Serializable
      
      
      - Annotations
- @Since( "3.1.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        VarianceThresholdSelectorModel extends MLReadable[VarianceThresholdSelectorModel] with Serializable
      
      
      - Annotations
- @Since( "3.1.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        VectorAssembler extends DefaultParamsReadable[VectorAssembler] with Serializable
      
      
      - Annotations
- @Since( "1.6.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        VectorIndexer extends DefaultParamsReadable[VectorIndexer] with Serializable
      
      
      - Annotations
- @Since( "1.6.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        VectorIndexerModel extends MLReadable[VectorIndexerModel] with Serializable
      
      
      - Annotations
- @Since( "1.6.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        VectorSizeHint extends DefaultParamsReadable[VectorSizeHint] with Serializable
      
      
      - Annotations
- @Since( "2.3.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        VectorSlicer extends DefaultParamsReadable[VectorSlicer] with Serializable
      
      
      - Annotations
- @Since( "1.6.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        Word2Vec extends DefaultParamsReadable[Word2Vec] with Serializable
      
      
      - Annotations
- @Since( "1.6.0" )
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        Word2VecModel extends MLReadable[Word2VecModel] with Serializable
      
      
      - Annotations
- @Since( "1.6.0" )