Class PipelineStage

All Implemented Interfaces:
Serializable, org.apache.spark.internal.Logging, Params, Identifiable, scala.Serializable
Direct Known Subclasses:
Estimator, Transformer

public abstract class PipelineStage extends Object implements Params, org.apache.spark.internal.Logging
A stage in a pipeline, either an Estimator or a Transformer.
See Also:
  • Constructor Details

    • PipelineStage

      public PipelineStage()
  • Method Details

    • copy

      public abstract PipelineStage copy(ParamMap extra)
      Description copied from interface: Params
      Creates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. See defaultCopy().
      Specified by:
      copy in interface Params
      extra - (undocumented)
    • params

      public Param<?>[] params()
      Description copied from interface: Params
      Returns all params sorted by their names. The default implementation uses Java reflection to list all public methods that have no arguments and return Param.

      Specified by:
      params in interface Params
    • transformSchema

      public abstract StructType transformSchema(StructType schema)
      Check transform validity and derive the output schema from the input schema.

      We check validity for interactions between parameters during transformSchema and raise an exception if any parameter value is invalid. Parameter value checks which do not depend on other parameters are handled by Param.validate().

      Typical implementation should first conduct verification on schema change and parameter validity, including complex parameter interaction checks.

      schema - (undocumented)