Package org.apache.spark.ml.stat
Class KolmogorovSmirnovTest
Object
org.apache.spark.ml.stat.KolmogorovSmirnovTest
Conduct the two-sided Kolmogorov Smirnov (KS) test for data sampled from a
 continuous distribution. By comparing the largest difference between the empirical cumulative
 distribution of the sample data and the theoretical distribution we can provide a test for the
 the null hypothesis that the sample data comes from that theoretical distribution.
 For more information on KS Test:
- See Also:
 
- 
Constructor Summary
Constructors - 
Method Summary
 
- 
Constructor Details
- 
KolmogorovSmirnovTest
public KolmogorovSmirnovTest() 
 - 
 - 
Method Details
- 
test
public static Dataset<Row> test(Dataset<?> dataset, String sampleCol, String distName, double... params) Convenience function to conduct a one-sample, two-sided Kolmogorov-Smirnov test for probability distribution equality. Currently supports the normal distribution, taking as parameters the mean and standard deviation.- Parameters:
 dataset- ADatasetor aDataFramecontaining the sample of data to testsampleCol- Name of sample column in dataset, of any numerical typedistName- aStringname for a theoretical distribution, currently only support "norm".params-Double*specifying the parameters to be used for the theoretical distribution. For "norm" distribution, the parameters includes mean and variance.- Returns:
 - DataFrame containing the test result for the input sampled data.
         This DataFrame will contain a single Row with the following fields:
          - 
pValue: Double-statistic: Double 
 - 
test
 - 
test
 - 
test
 
 -