Package org.apache.spark.mllib.feature
Class ChiSqSelector
Object
org.apache.spark.mllib.feature.ChiSqSelector
- All Implemented Interfaces:
 Serializable
Creates a ChiSquared feature selector.
 The selector supports different selection methods: 
numTopFeatures, percentile, fpr,
 fdr, fwe.
  - numTopFeatures chooses a fixed number of top features according to a chi-squared test.
  - percentile is similar but chooses a fraction of all features instead of a fixed number.
  - fpr chooses all features whose p-values are below a threshold, thus controlling the false
    positive rate of selection.
  - fdr uses the [Benjamini-Hochberg procedure]
    (https://en.wikipedia.org/wiki/False_discovery_rate#Benjamini.E2.80.93Hochberg_procedure)
    to choose all features whose false discovery rate is below a threshold.
  - fwe chooses all features whose p-values are below a threshold. The threshold is scaled by
    1/numFeatures, thus controlling the family-wise error rate of selection.
 By default, the selection method is numTopFeatures, with the default number of top features
 set to 50.- See Also:
 
- 
Constructor Summary
ConstructorsConstructorDescriptionChiSqSelector(int numTopFeatures) The is the same to call this() and setNumTopFeatures(numTopFeatures) - 
Method Summary
Modifier and TypeMethodDescriptiondoublefdr()fit(RDD<LabeledPoint> data) Returns a ChiSquared feature selector.doublefpr()doublefwe()intdoublesetFdr(double value) setFpr(double value) setFwe(double value) setNumTopFeatures(int value) setPercentile(double value) setSelectorType(String value) static String[]Set of selector types that ChiSqSelector supports. 
- 
Constructor Details
- 
ChiSqSelector
public ChiSqSelector() - 
ChiSqSelector
public ChiSqSelector(int numTopFeatures) The is the same to call this() and setNumTopFeatures(numTopFeatures)- Parameters:
 numTopFeatures- (undocumented)
 
 - 
 - 
Method Details
- 
supportedSelectorTypes
Set of selector types that ChiSqSelector supports. - 
numTopFeatures
public int numTopFeatures() - 
percentile
public double percentile() - 
fpr
public double fpr() - 
fdr
public double fdr() - 
fwe
public double fwe() - 
selectorType
 - 
setNumTopFeatures
 - 
setPercentile
 - 
setFpr
 - 
setFdr
 - 
setFwe
 - 
setSelectorType
 - 
fit
Returns a ChiSquared feature selector.- Parameters:
 data- anRDD[LabeledPoint]containing the labeled dataset with categorical features. Real-valued features will be treated as categorical for each distinct value. Apply feature discretizer before using this function.- Returns:
 - (undocumented)
 
 
 -