org.apache.spark.ml

Pipeline

class Pipeline extends Estimator[PipelineModel]

:: AlphaComponent :: A simple pipeline, which acts as an estimator. A Pipeline consists of a sequence of stages, each of which is either an Estimator or a Transformer. When Pipeline.fit is called, the stages are executed in order. If a stage is an Estimator, its Estimator.fit method will be called on the input dataset to fit a model. Then the model, which is a transformer, will be used to transform the dataset as the input to the next stage. If a stage is a Transformer, its Transformer.transform method will be called to produce the dataset for the next stage. The fitted model from a Pipeline is an PipelineModel, which consists of fitted models and transformers, corresponding to the pipeline stages. If there are no stages, the pipeline acts as an identity transformer.

Annotations
@AlphaComponent()
Linear Supertypes
Estimator[PipelineModel], Params, Identifiable, PipelineStage, Logging, Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. Pipeline
  2. Estimator
  3. Params
  4. Identifiable
  5. PipelineStage
  6. Logging
  7. Serializable
  8. Serializable
  9. AnyRef
  10. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new Pipeline()

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  7. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  8. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  9. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  10. def explainParams(): String

    Returns the documentation of all params.

    Returns the documentation of all params.

    Definition Classes
    Params
  11. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  12. def fit(dataset: SchemaRDD, paramMap: ParamMap): PipelineModel

    Fits the pipeline to the input dataset with additional parameters.

    Fits the pipeline to the input dataset with additional parameters. If a stage is an Estimator, its Estimator.fit method will be called on the input dataset to fit a model. Then the model, which is a transformer, will be used to transform the dataset as the input to the next stage. If a stage is a Transformer, its Transformer.transform method will be called to produce the dataset for the next stage. The fitted model from a Pipeline is an PipelineModel, which consists of fitted models and transformers, corresponding to the pipeline stages. If there are no stages, the output model acts as an identity transformer.

    dataset

    input dataset

    paramMap

    parameter map

    returns

    fitted pipeline

    Definition Classes
    PipelineEstimator
  13. def fit(dataset: JavaSchemaRDD, paramMaps: Array[ParamMap]): List[PipelineModel]

    Fits multiple models to the input data with multiple sets of parameters.

    Fits multiple models to the input data with multiple sets of parameters.

    dataset

    input dataset

    paramMaps

    an array of parameter maps

    returns

    fitted models, matching the input parameter maps

    Definition Classes
    Estimator
  14. def fit(dataset: JavaSchemaRDD, paramMap: ParamMap): PipelineModel

    Fits a single model to the input data with provided parameter map.

    Fits a single model to the input data with provided parameter map.

    dataset

    input dataset

    paramMap

    parameter map

    returns

    fitted model

    Definition Classes
    Estimator
  15. def fit(dataset: JavaSchemaRDD, paramPairs: ParamPair[_]*): PipelineModel

    Fits a single model to the input data with optional parameters.

    Fits a single model to the input data with optional parameters.

    dataset

    input dataset

    paramPairs

    optional list of param pairs (overwrite embedded params)

    returns

    fitted model

    Definition Classes
    Estimator
    Annotations
    @varargs()
  16. def fit(dataset: SchemaRDD, paramMaps: Array[ParamMap]): Seq[PipelineModel]

    Fits multiple models to the input data with multiple sets of parameters.

    Fits multiple models to the input data with multiple sets of parameters. The default implementation uses a for loop on each parameter map. Subclasses could overwrite this to optimize multi-model training.

    dataset

    input dataset

    paramMaps

    an array of parameter maps

    returns

    fitted models, matching the input parameter maps

    Definition Classes
    Estimator
  17. def fit(dataset: SchemaRDD, paramPairs: ParamPair[_]*): PipelineModel

    Fits a single model to the input data with optional parameters.

    Fits a single model to the input data with optional parameters.

    dataset

    input dataset

    paramPairs

    optional list of param pairs (overwrite embedded params)

    returns

    fitted model

    Definition Classes
    Estimator
    Annotations
    @varargs()
  18. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  19. def getStages: Array[PipelineStage]

  20. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  21. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  22. def isSet(param: Param[_]): Boolean

    Checks whether a param is explicitly set.

    Checks whether a param is explicitly set.

    Definition Classes
    Params
  23. def isTraceEnabled(): Boolean

    Attributes
    protected
    Definition Classes
    Logging
  24. def log: Logger

    Attributes
    protected
    Definition Classes
    Logging
  25. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  26. def logDebug(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  27. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  28. def logError(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  29. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  30. def logInfo(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  31. def logName: String

    Attributes
    protected
    Definition Classes
    Logging
  32. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  33. def logTrace(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  34. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  35. def logWarning(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  36. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  37. final def notify(): Unit

    Definition Classes
    AnyRef
  38. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  39. val paramMap: ParamMap

    Internal param map.

    Internal param map.

    Attributes
    protected
    Definition Classes
    Params
  40. def params: Array[Param[_]]

    Returns all params.

    Returns all params.

    Definition Classes
    Params
  41. def setStages(value: Array[PipelineStage]): Pipeline.this.type

  42. val stages: Param[Array[PipelineStage]]

    param for pipeline stages

  43. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  44. def toString(): String

    Definition Classes
    AnyRef → Any
  45. def transformSchema(schema: StructType, paramMap: ParamMap, logging: Boolean): StructType

    Derives the output schema from the input schema and parameters, optionally with logging.

    Derives the output schema from the input schema and parameters, optionally with logging.

    Attributes
    protected
    Definition Classes
    PipelineStage
  46. def validate(): Unit

    Validates parameter values stored internally.

    Validates parameter values stored internally. Raise an exception if any parameter value is invalid.

    Definition Classes
    Params
  47. def validate(paramMap: ParamMap): Unit

    Validates parameter values stored internally plus the input parameter map.

    Validates parameter values stored internally plus the input parameter map. Raises an exception if any parameter is invalid.

    Definition Classes
    Params
  48. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  49. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  50. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Estimator[PipelineModel]

Inherited from Params

Inherited from Identifiable

Inherited from PipelineStage

Inherited from Logging

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped