Package org.apache.spark.ml.stat
Class ChiSquareTest
Object
org.apache.spark.ml.stat.ChiSquareTest
Chi-square hypothesis testing for categorical data.
 
See Wikipedia for more information on the Chi-squared test.
- 
Constructor SummaryConstructors
- 
Method Summary
- 
Constructor Details- 
ChiSquareTestpublic ChiSquareTest()
 
- 
- 
Method Details- 
testConduct Pearson's independence test for every feature against the label. For each feature, the (feature, label) pairs are converted into a contingency matrix for which the Chi-squared statistic is computed. All label and feature values must be categorical.The null hypothesis is that the occurrence of the outcomes is statistically independent. - Parameters:
- dataset- DataFrame of categorical labels and categorical features. Real-valued features will be treated as categorical for each distinct value.
- featuresCol- Name of features column in dataset, of type- Vector(- VectorUDT)
- labelCol- Name of label column in dataset, of any numerical type
- Returns:
- DataFrame containing the test result for every feature against the label.
         This DataFrame will contain a single Row with the following fields:
          - pValues: Vector-degreesOfFreedom: Array[Int]-statistics: VectorEach of these fields has one value per feature.
 
- 
testpublic static Dataset<Row> test(Dataset<Row> dataset, String featuresCol, String labelCol, boolean flatten) - Parameters:
- dataset- DataFrame of categorical labels and categorical features. Real-valued features will be treated as categorical for each distinct value.
- featuresCol- Name of features column in dataset, of type- Vector(- VectorUDT)
- labelCol- Name of label column in dataset, of any numerical type
- flatten- If false, the returned DataFrame contains only a single Row, otherwise, one row per feature.
- Returns:
- (undocumented)
 
 
-