Package org.apache.spark.mllib.feature
Class ChiSqSelector
Object
org.apache.spark.mllib.feature.ChiSqSelector
- All Implemented Interfaces:
- Serializable
Creates a ChiSquared feature selector.
 The selector supports different selection methods: 
numTopFeatures, percentile, fpr,
 fdr, fwe.
  - numTopFeatures chooses a fixed number of top features according to a chi-squared test.
  - percentile is similar but chooses a fraction of all features instead of a fixed number.
  - fpr chooses all features whose p-values are below a threshold, thus controlling the false
    positive rate of selection.
  - fdr uses the [Benjamini-Hochberg procedure]
    (https://en.wikipedia.org/wiki/False_discovery_rate#Benjamini.E2.80.93Hochberg_procedure)
    to choose all features whose false discovery rate is below a threshold.
  - fwe chooses all features whose p-values are below a threshold. The threshold is scaled by
    1/numFeatures, thus controlling the family-wise error rate of selection.
 By default, the selection method is numTopFeatures, with the default number of top features
 set to 50.- See Also:
- 
Constructor SummaryConstructorsConstructorDescriptionChiSqSelector(int numTopFeatures) The is the same to call this() and setNumTopFeatures(numTopFeatures)
- 
Method SummaryModifier and TypeMethodDescriptiondoublefdr()fit(RDD<LabeledPoint> data) Returns a ChiSquared feature selector.doublefpr()doublefwe()intdoublesetFdr(double value) setFpr(double value) setFwe(double value) setNumTopFeatures(int value) setPercentile(double value) setSelectorType(String value) static String[]Set of selector types that ChiSqSelector supports.
- 
Constructor Details- 
ChiSqSelectorpublic ChiSqSelector()
- 
ChiSqSelectorpublic ChiSqSelector(int numTopFeatures) The is the same to call this() and setNumTopFeatures(numTopFeatures)- Parameters:
- numTopFeatures- (undocumented)
 
 
- 
- 
Method Details- 
supportedSelectorTypesSet of selector types that ChiSqSelector supports.
- 
numTopFeaturespublic int numTopFeatures()
- 
percentilepublic double percentile()
- 
fprpublic double fpr()
- 
fdrpublic double fdr()
- 
fwepublic double fwe()
- 
selectorType
- 
setNumTopFeatures
- 
setPercentile
- 
setFpr
- 
setFdr
- 
setFwe
- 
setSelectorType
- 
fitReturns a ChiSquared feature selector.- Parameters:
- data- an- RDD[LabeledPoint]containing the labeled dataset with categorical features. Real-valued features will be treated as categorical for each distinct value. Apply feature discretizer before using this function.
- Returns:
- (undocumented)
 
 
-