public class ChiSquareTest
extends Object
See Wikipedia for more information on the Chi-squared test.
Constructor and Description |
---|
ChiSquareTest() |
Modifier and Type | Method and Description |
---|---|
static Dataset<Row> |
test(Dataset<Row> dataset,
String featuresCol,
String labelCol)
Conduct Pearson's independence test for every feature against the label.
|
static Dataset<Row> |
test(Dataset<Row> dataset,
String featuresCol,
String labelCol,
boolean flatten) |
public static Dataset<Row> test(Dataset<Row> dataset, String featuresCol, String labelCol)
The null hypothesis is that the occurrence of the outcomes is statistically independent.
dataset
- DataFrame of categorical labels and categorical features.
Real-valued features will be treated as categorical for each distinct value.featuresCol
- Name of features column in dataset, of type Vector
(VectorUDT
)labelCol
- Name of label column in dataset, of any numerical typepValues: Vector
- degreesOfFreedom: Array[Int]
- statistics: Vector
Each of these fields has one value per feature.public static Dataset<Row> test(Dataset<Row> dataset, String featuresCol, String labelCol, boolean flatten)
dataset
- DataFrame of categorical labels and categorical features.
Real-valued features will be treated as categorical for each distinct value.featuresCol
- Name of features column in dataset, of type Vector
(VectorUDT
)labelCol
- Name of label column in dataset, of any numerical typeflatten
- If false, the returned DataFrame contains only a single Row, otherwise, one
row per feature.