Package org.apache.spark.ml.stat
Class KolmogorovSmirnovTest
Object
org.apache.spark.ml.stat.KolmogorovSmirnovTest
Conduct the two-sided Kolmogorov Smirnov (KS) test for data sampled from a
 continuous distribution. By comparing the largest difference between the empirical cumulative
 distribution of the sample data and the theoretical distribution we can provide a test for the
 the null hypothesis that the sample data comes from that theoretical distribution.
 For more information on KS Test:
- See Also:
- 
Constructor SummaryConstructors
- 
Method Summary
- 
Constructor Details- 
KolmogorovSmirnovTestpublic KolmogorovSmirnovTest()
 
- 
- 
Method Details- 
testpublic static Dataset<Row> test(Dataset<?> dataset, String sampleCol, String distName, double... params) Convenience function to conduct a one-sample, two-sided Kolmogorov-Smirnov test for probability distribution equality. Currently supports the normal distribution, taking as parameters the mean and standard deviation.- Parameters:
- dataset- A- Datasetor a- DataFramecontaining the sample of data to test
- sampleCol- Name of sample column in dataset, of any numerical type
- distName- a- Stringname for a theoretical distribution, currently only support "norm".
- params-- Double*specifying the parameters to be used for the theoretical distribution. For "norm" distribution, the parameters includes mean and variance.
- Returns:
- DataFrame containing the test result for the input sampled data.
         This DataFrame will contain a single Row with the following fields:
          - pValue: Double-statistic: Double
 
- 
test
- 
test
- 
test
 
-