Class Transformer

Object
org.apache.spark.ml.PipelineStage
org.apache.spark.ml.Transformer
All Implemented Interfaces:
Serializable, org.apache.spark.internal.Logging, Params, Identifiable
Direct Known Subclasses:
Binarizer, ColumnPruner, FeatureHasher, HashingTF, IndexToString, Interaction, Model, SQLTransformer, StopWordsRemover, UnaryTransformer, VectorAssembler, VectorAttributeRewriter, VectorSizeHint, VectorSlicer

public abstract class Transformer extends PipelineStage
Abstract class for transformers that transform one dataset into another.
See Also:
  • Constructor Details

    • Transformer

      public Transformer()
  • Method Details

    • copy

      public abstract Transformer copy(ParamMap extra)
      Description copied from interface: Params
      Creates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. See defaultCopy().
      Specified by:
      copy in interface Params
      Specified by:
      copy in class PipelineStage
      Parameters:
      extra - (undocumented)
      Returns:
      (undocumented)
    • transform

      public Dataset<Row> transform(Dataset<?> dataset, ParamPair<?> firstParamPair, ParamPair<?>... otherParamPairs)
      Transforms the dataset with optional parameters
      Parameters:
      dataset - input dataset
      firstParamPair - the first param pair, overwrite embedded params
      otherParamPairs - other param pairs, overwrite embedded params
      Returns:
      transformed dataset
    • transform

      public Dataset<Row> transform(Dataset<?> dataset, ParamPair<?> firstParamPair, scala.collection.immutable.Seq<ParamPair<?>> otherParamPairs)
      Transforms the dataset with optional parameters
      Parameters:
      dataset - input dataset
      firstParamPair - the first param pair, overwrite embedded params
      otherParamPairs - other param pairs, overwrite embedded params
      Returns:
      transformed dataset
    • transform

      public Dataset<Row> transform(Dataset<?> dataset, ParamMap paramMap)
      Transforms the dataset with provided parameter map as additional parameters.
      Parameters:
      dataset - input dataset
      paramMap - additional parameters, overwrite embedded params
      Returns:
      transformed dataset
    • transform

      public abstract Dataset<Row> transform(Dataset<?> dataset)
      Transforms the input dataset.
      Parameters:
      dataset - (undocumented)
      Returns:
      (undocumented)