public class ChiSquareTest
extends Object
See Wikipedia for more information on the Chi-squared test.
| Constructor and Description |
|---|
ChiSquareTest() |
| Modifier and Type | Method and Description |
|---|---|
static Dataset<Row> |
test(Dataset<Row> dataset,
String featuresCol,
String labelCol)
Conduct Pearson's independence test for every feature against the label.
|
static Dataset<Row> |
test(Dataset<Row> dataset,
String featuresCol,
String labelCol,
boolean flatten) |
public static Dataset<Row> test(Dataset<Row> dataset, String featuresCol, String labelCol)
The null hypothesis is that the occurrence of the outcomes is statistically independent.
dataset - DataFrame of categorical labels and categorical features.
Real-valued features will be treated as categorical for each distinct value.featuresCol - Name of features column in dataset, of type Vector (VectorUDT)labelCol - Name of label column in dataset, of any numerical typepValues: Vector
- degreesOfFreedom: Array[Int]
- statistics: Vector
Each of these fields has one value per feature.public static Dataset<Row> test(Dataset<Row> dataset, String featuresCol, String labelCol, boolean flatten)
dataset - DataFrame of categorical labels and categorical features.
Real-valued features will be treated as categorical for each distinct value.featuresCol - Name of features column in dataset, of type Vector (VectorUDT)labelCol - Name of label column in dataset, of any numerical typeflatten - If false, the returned DataFrame contains only a single Row, otherwise, one
row per feature.