Class ChiSqTest

Object
org.apache.spark.mllib.stat.test.ChiSqTest

public class ChiSqTest extends Object
Conduct the chi-squared test for the input RDDs using the specified method. Goodness-of-fit test is conducted on two Vectors, whereas test of independence is conducted on an input of type Matrix in which independence between columns is assessed. We also provide a method for computing the chi-squared statistic between each feature and the label for an input RDD[LabeledPoint], return an Array[ChiSquaredTestResult] of size = number of features in the input RDD.

Supported methods for goodness of fit: pearson (default) Supported methods for independence: pearson (default)

More information on Chi-squared test: http://en.wikipedia.org/wiki/Chi-squared_test

  • Constructor Details

    • ChiSqTest

      public ChiSqTest()
  • Method Details

    • PEARSON

      public static ChiSqTest.Method PEARSON()
    • chiSquaredFeatures

      public static ChiSqTestResult[] chiSquaredFeatures(RDD<LabeledPoint> data, String methodName)
      Conduct Pearson's independence test for each feature against the label across the input RDD. The contingency table is constructed from the raw (feature, label) pairs and used to conduct the independence test. Returns an array containing the ChiSquaredTestResult for every feature against the label.
      Parameters:
      data - (undocumented)
      methodName - (undocumented)
      Returns:
      (undocumented)
    • chiSquared

      public static ChiSqTestResult chiSquared(Vector observed, Vector expected, String methodName)
    • chiSquaredMatrix

      public static ChiSqTestResult chiSquaredMatrix(Matrix counts, String methodName)
    • org$apache$spark$internal$Logging$$log_

      public static org.slf4j.Logger org$apache$spark$internal$Logging$$log_()
    • org$apache$spark$internal$Logging$$log__$eq

      public static void org$apache$spark$internal$Logging$$log__$eq(org.slf4j.Logger x$1)