Package org.apache.spark.ml.stat
Class KolmogorovSmirnovTest
Object
org.apache.spark.ml.stat.KolmogorovSmirnovTest
Conduct the two-sided Kolmogorov Smirnov (KS) test for data sampled from a
continuous distribution. By comparing the largest difference between the empirical cumulative
distribution of the sample data and the theoretical distribution we can provide a test for the
the null hypothesis that the sample data comes from that theoretical distribution.
For more information on KS Test:
- See Also:
-
Constructor Details
-
KolmogorovSmirnovTest
public KolmogorovSmirnovTest()
-
-
Method Details
-
test
public static Dataset<Row> test(Dataset<?> dataset, String sampleCol, String distName, double... params) Convenience function to conduct a one-sample, two-sided Kolmogorov-Smirnov test for probability distribution equality. Currently supports the normal distribution, taking as parameters the mean and standard deviation.- Parameters:
dataset
- ADataset
or aDataFrame
containing the sample of data to testsampleCol
- Name of sample column in dataset, of any numerical typedistName
- aString
name for a theoretical distribution, currently only support "norm".params
-Double*
specifying the parameters to be used for the theoretical distribution. For "norm" distribution, the parameters includes mean and variance.- Returns:
- DataFrame containing the test result for the input sampled data.
This DataFrame will contain a single Row with the following fields:
-
pValue: Double
-statistic: Double
-
test
-
test
-
test
-