Package org.apache.spark.mllib.stat.test
Class ChiSqTest
Object
org.apache.spark.mllib.stat.test.ChiSqTest
Conduct the chi-squared test for the input RDDs using the specified method.
Goodness-of-fit test is conducted on two
Vectors
, whereas test of independence is conducted
on an input of type Matrix
in which independence between columns is assessed.
We also provide a method for computing the chi-squared statistic between each feature and the
label for an input RDD[LabeledPoint]
, return an Array[ChiSquaredTestResult]
of size =
number of features in the input RDD.
Supported methods for goodness of fit: pearson
(default)
Supported methods for independence: pearson
(default)
More information on Chi-squared test: http://en.wikipedia.org/wiki/Chi-squared_test
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic class
param: name String name for the method.static class
static class
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionstatic ChiSqTestResult
chiSquared
(Vector observed, Vector expected, String methodName) static ChiSqTestResult[]
chiSquaredFeatures
(RDD<LabeledPoint> data, String methodName) Conduct Pearson's independence test for each feature against the label across the input RDD.static ChiSqTestResult
chiSquaredMatrix
(Matrix counts, String methodName) static org.apache.spark.internal.Logging.LogStringContext
LogStringContext
(scala.StringContext sc) static org.slf4j.Logger
static void
org$apache$spark$internal$Logging$$log__$eq
(org.slf4j.Logger x$1) static ChiSqTest.Method
PEARSON()
-
Constructor Details
-
ChiSqTest
public ChiSqTest()
-
-
Method Details
-
PEARSON
-
chiSquaredFeatures
Conduct Pearson's independence test for each feature against the label across the input RDD. The contingency table is constructed from the raw (feature, label) pairs and used to conduct the independence test. Returns an array containing the ChiSquaredTestResult for every feature against the label.- Parameters:
data
- (undocumented)methodName
- (undocumented)- Returns:
- (undocumented)
-
chiSquared
-
chiSquaredMatrix
-
org$apache$spark$internal$Logging$$log_
public static org.slf4j.Logger org$apache$spark$internal$Logging$$log_() -
org$apache$spark$internal$Logging$$log__$eq
public static void org$apache$spark$internal$Logging$$log__$eq(org.slf4j.Logger x$1) -
LogStringContext
public static org.apache.spark.internal.Logging.LogStringContext LogStringContext(scala.StringContext sc)
-