Package org.apache.spark.ml
Class Estimator<M extends Model<M>>
Object
org.apache.spark.ml.PipelineStage
org.apache.spark.ml.Estimator<M>
- All Implemented Interfaces:
Serializable
,org.apache.spark.internal.Logging
,Params
,Identifiable
,scala.Serializable
- Direct Known Subclasses:
ALS
,BisectingKMeans
,BucketedRandomProjectionLSH
,ChiSqSelector
,CountVectorizer
,CrossValidator
,FPGrowth
,GaussianMixture
,IDF
,Imputer
,IsotonicRegression
,KMeans
,LDA
,MaxAbsScaler
,MinHashLSH
,MinMaxScaler
,OneHotEncoder
,OneVsRest
,PCA
,Pipeline
,Predictor
,QuantileDiscretizer
,RFormula
,RobustScaler
,StandardScaler
,StringIndexer
,TrainValidationSplit
,UnivariateFeatureSelector
,VarianceThresholdSelector
,VectorIndexer
,Word2Vec
Abstract class for estimators that fit models to data.
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging
org.apache.spark.internal.Logging.SparkShellLoggingFilter
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionCreates a copy of this instance with the same UID and some extra params.abstract M
Fits a model to the input data.Fits a single model to the input data with provided parameter map.Fits a single model to the input data with optional parameters.fit
(Dataset<?> dataset, ParamPair<?> firstParamPair, scala.collection.Seq<ParamPair<?>> otherParamPairs) Fits a single model to the input data with optional parameters.scala.collection.Seq<M>
Fits multiple models to the input data with multiple sets of parameters.Methods inherited from class org.apache.spark.ml.PipelineStage
params, transformSchema
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.apache.spark.ml.util.Identifiable
toString, uid
Methods inherited from interface org.apache.spark.internal.Logging
initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq
Methods inherited from interface org.apache.spark.ml.param.Params
clear, copyValues, defaultCopy, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, onParamChange, set, set, set, setDefault, setDefault, shouldOwn
-
Constructor Details
-
Estimator
public Estimator()
-
-
Method Details
-
copy
Description copied from interface:Params
Creates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. SeedefaultCopy()
.- Specified by:
copy
in interfaceParams
- Specified by:
copy
in classPipelineStage
- Parameters:
extra
- (undocumented)- Returns:
- (undocumented)
-
fit
Fits a single model to the input data with optional parameters.- Parameters:
dataset
- input datasetfirstParamPair
- the first param pair, overrides embedded paramsotherParamPairs
- other param pairs. These values override any specified in this Estimator's embedded ParamMap.- Returns:
- fitted model
-
fit
public M fit(Dataset<?> dataset, ParamPair<?> firstParamPair, scala.collection.Seq<ParamPair<?>> otherParamPairs) Fits a single model to the input data with optional parameters.- Parameters:
dataset
- input datasetfirstParamPair
- the first param pair, overrides embedded paramsotherParamPairs
- other param pairs. These values override any specified in this Estimator's embedded ParamMap.- Returns:
- fitted model
-
fit
Fits a single model to the input data with provided parameter map.- Parameters:
dataset
- input datasetparamMap
- Parameter map. These values override any specified in this Estimator's embedded ParamMap.- Returns:
- fitted model
-
fit
Fits a model to the input data.- Parameters:
dataset
- (undocumented)- Returns:
- (undocumented)
-
fit
Fits multiple models to the input data with multiple sets of parameters. The default implementation uses a for loop on each parameter map. Subclasses could override this to optimize multi-model training.- Parameters:
dataset
- input datasetparamMaps
- An array of parameter maps. These values override any specified in this Estimator's embedded ParamMap.- Returns:
- fitted models, matching the input parameter maps
-