Package org.apache.spark.ml.feature
Class ChiSqSelector
Object
org.apache.spark.ml.PipelineStage
org.apache.spark.ml.Estimator<T>
org.apache.spark.ml.feature.ChiSqSelector
- All Implemented Interfaces:
- Serializable,- org.apache.spark.internal.Logging,- SelectorParams,- Params,- HasFeaturesCol,- HasLabelCol,- HasOutputCol,- DefaultParamsWritable,- Identifiable,- MLWritable
Deprecated.
use UnivariateFeatureSelector instead. Since 3.1.1.
Chi-Squared feature selection, which selects categorical features to use for predicting a
 categorical label.
 The selector supports different selection methods: 
numTopFeatures, percentile, fpr,
 fdr, fwe.
  - numTopFeatures chooses a fixed number of top features according to a chi-squared test.
  - percentile is similar but chooses a fraction of all features instead of a fixed number.
  - fpr chooses all features whose p-value are below a threshold, thus controlling the false
    positive rate of selection.
  - fdr uses the [Benjamini-Hochberg procedure]
    (https://en.wikipedia.org/wiki/False_discovery_rate#Benjamini.E2.80.93Hochberg_procedure)
    to choose all features whose false discovery rate is below a threshold.
  - fwe chooses all features whose p-values are below a threshold. The threshold is scaled by
    1/numFeatures, thus controlling the family-wise error rate of selection.
 By default, the selection method is numTopFeatures, with the default number of top features
 set to 50.- See Also:
- 
Nested Class SummaryNested classes/interfaces inherited from interface org.apache.spark.internal.Loggingorg.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
- 
Constructor SummaryConstructors
- 
Method SummaryModifier and TypeMethodDescriptionDeprecated.Creates a copy of this instance with the same UID and some extra params.final DoubleParamfdr()The upper bound of the expected false discovery rate.Param for features column name.Deprecated.Fits a model to the input data.final DoubleParamfpr()The highest p-value for features to be kept.final DoubleParamfwe()The upper bound of the expected family-wise error rate.labelCol()Param for label column name.static ChiSqSelectorDeprecated.final IntParamNumber of features that selector will select, ordered by ascending p-value.Param for output column name.final DoubleParamPercentile of features that selector will select, ordered by ascending p-value.static MLReader<T>read()Deprecated.The selector type.setFdr(double value) Deprecated.setFeaturesCol(String value) Deprecated.setFpr(double value) Deprecated.setFwe(double value) Deprecated.setLabelCol(String value) Deprecated.setNumTopFeatures(int value) Deprecated.setOutputCol(String value) Deprecated.setPercentile(double value) Deprecated.setSelectorType(String value) Deprecated.transformSchema(StructType schema) Deprecated.Check transform validity and derive the output schema from the input schema.uid()Deprecated.An immutable unique ID for the object and its derivatives.Methods inherited from class org.apache.spark.ml.PipelineStageparamsMethods inherited from class java.lang.Objectequals, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.spark.ml.util.DefaultParamsWritablewriteMethods inherited from interface org.apache.spark.ml.param.shared.HasFeaturesColgetFeaturesColMethods inherited from interface org.apache.spark.ml.param.shared.HasLabelColgetLabelColMethods inherited from interface org.apache.spark.ml.param.shared.HasOutputColgetOutputColMethods inherited from interface org.apache.spark.ml.util.IdentifiabletoStringMethods inherited from interface org.apache.spark.internal.LogginginitializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logBasedOnLevel, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, MDC, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContextMethods inherited from interface org.apache.spark.ml.util.MLWritablesaveMethods inherited from interface org.apache.spark.ml.param.Paramsclear, copyValues, defaultCopy, defaultParamMap, estimateMatadataSize, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, onParamChange, paramMap, params, set, set, set, setDefault, setDefault, shouldOwnMethods inherited from interface org.apache.spark.ml.feature.SelectorParamsgetFdr, getFpr, getFwe, getNumTopFeatures, getPercentile, getSelectorType
- 
Constructor Details- 
ChiSqSelectorDeprecated.
- 
ChiSqSelectorpublic ChiSqSelector()Deprecated.
 
- 
- 
Method Details- 
loadDeprecated.
- 
readDeprecated.
- 
uidDeprecated.Description copied from interface:IdentifiableAn immutable unique ID for the object and its derivatives.- Returns:
- (undocumented)
 
- 
setNumTopFeaturesDeprecated.
- 
setPercentileDeprecated.
- 
setFprDeprecated.
- 
setFdrDeprecated.
- 
setFweDeprecated.
- 
setSelectorTypeDeprecated.
- 
setFeaturesColDeprecated.
- 
setOutputColDeprecated.
- 
setLabelColDeprecated.
- 
fitDeprecated.Description copied from class:EstimatorFits a model to the input data.- Parameters:
- dataset- (undocumented)
- Returns:
- (undocumented)
 
- 
transformSchemaDeprecated.Description copied from class:PipelineStageCheck transform validity and derive the output schema from the input schema.We check validity for interactions between parameters during transformSchemaand raise an exception if any parameter value is invalid. Parameter value checks which do not depend on other parameters are handled byParam.validate().Typical implementation should first conduct verification on schema change and parameter validity, including complex parameter interaction checks. - Parameters:
- schema- (undocumented)
- Returns:
- (undocumented)
 
- 
copyDeprecated.Description copied from interface:ParamsCreates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. SeedefaultCopy().
- 
fdrDescription copied from interface:SelectorParamsThe upper bound of the expected false discovery rate. Only applicable when selectorType = "fdr". Default value is 0.05.- Specified by:
- fdrin interface- SelectorParams
- Returns:
- (undocumented)
 
- 
featuresColDescription copied from interface:HasFeaturesColParam for features column name.- Specified by:
- featuresColin interface- HasFeaturesCol
- Returns:
- (undocumented)
 
- 
fprDescription copied from interface:SelectorParamsThe highest p-value for features to be kept. Only applicable when selectorType = "fpr". Default value is 0.05.- Specified by:
- fprin interface- SelectorParams
- Returns:
- (undocumented)
 
- 
fweDescription copied from interface:SelectorParamsThe upper bound of the expected family-wise error rate. Only applicable when selectorType = "fwe". Default value is 0.05.- Specified by:
- fwein interface- SelectorParams
- Returns:
- (undocumented)
 
- 
labelColDescription copied from interface:HasLabelColParam for label column name.- Specified by:
- labelColin interface- HasLabelCol
- Returns:
- (undocumented)
 
- 
numTopFeaturesDescription copied from interface:SelectorParamsNumber of features that selector will select, ordered by ascending p-value. If the number of features is less than numTopFeatures, then this will select all features. Only applicable when selectorType = "numTopFeatures". The default value of numTopFeatures is 50.- Specified by:
- numTopFeaturesin interface- SelectorParams
- Returns:
- (undocumented)
 
- 
outputColDescription copied from interface:HasOutputColParam for output column name.- Specified by:
- outputColin interface- HasOutputCol
- Returns:
- (undocumented)
 
- 
percentileDescription copied from interface:SelectorParamsPercentile of features that selector will select, ordered by ascending p-value. Only applicable when selectorType = "percentile". Default value is 0.1.- Specified by:
- percentilein interface- SelectorParams
- Returns:
- (undocumented)
 
- 
selectorTypeDescription copied from interface:SelectorParamsThe selector type. Supported options: "numTopFeatures" (default), "percentile", "fpr", "fdr", "fwe"- Specified by:
- selectorTypein interface- SelectorParams
- Returns:
- (undocumented)
 
 
-