public class ChiSqSelector
extends Object
implements scala.Serializable
numTopFeatures
, percentile
, fpr
,
fdr
, fwe
.
- numTopFeatures
chooses a fixed number of top features according to a chi-squared test.
- percentile
is similar but chooses a fraction of all features instead of a fixed number.
- fpr
chooses all features whose p-values are below a threshold, thus controlling the false
positive rate of selection.
- fdr
uses the [Benjamini-Hochberg procedure]
(https://en.wikipedia.org/wiki/False_discovery_rate#Benjamini.E2.80.93Hochberg_procedure)
to choose all features whose false discovery rate is below a threshold.
- fwe
chooses all features whose p-values are below a threshold. The threshold is scaled by
1/numFeatures, thus controlling the family-wise error rate of selection.
By default, the selection method is numTopFeatures
, with the default number of top features
set to 50.Constructor and Description |
---|
ChiSqSelector() |
ChiSqSelector(int numTopFeatures)
The is the same to call this() and setNumTopFeatures(numTopFeatures)
|
Modifier and Type | Method and Description |
---|---|
double |
fdr() |
ChiSqSelectorModel |
fit(RDD<LabeledPoint> data)
Returns a ChiSquared feature selector.
|
double |
fpr() |
double |
fwe() |
int |
numTopFeatures() |
double |
percentile() |
String |
selectorType() |
ChiSqSelector |
setFdr(double value) |
ChiSqSelector |
setFpr(double value) |
ChiSqSelector |
setFwe(double value) |
ChiSqSelector |
setNumTopFeatures(int value) |
ChiSqSelector |
setPercentile(double value) |
ChiSqSelector |
setSelectorType(String value) |
static String[] |
supportedSelectorTypes()
Set of selector types that ChiSqSelector supports.
|
public ChiSqSelector()
public ChiSqSelector(int numTopFeatures)
numTopFeatures
- (undocumented)public static String[] supportedSelectorTypes()
public int numTopFeatures()
public double percentile()
public double fpr()
public double fdr()
public double fwe()
public String selectorType()
public ChiSqSelector setNumTopFeatures(int value)
public ChiSqSelector setPercentile(double value)
public ChiSqSelector setFpr(double value)
public ChiSqSelector setFdr(double value)
public ChiSqSelector setFwe(double value)
public ChiSqSelector setSelectorType(String value)
public ChiSqSelectorModel fit(RDD<LabeledPoint> data)
data
- an RDD[LabeledPoint]
containing the labeled dataset with categorical features.
Real-valued features will be treated as categorical for each distinct value.
Apply feature discretizer before using this function.