Class KolmogorovSmirnovTest

Object
org.apache.spark.ml.stat.KolmogorovSmirnovTest

public class KolmogorovSmirnovTest extends Object
Conduct the two-sided Kolmogorov Smirnov (KS) test for data sampled from a continuous distribution. By comparing the largest difference between the empirical cumulative distribution of the sample data and the theoretical distribution we can provide a test for the the null hypothesis that the sample data comes from that theoretical distribution. For more information on KS Test:
See Also:
  • Constructor Details

    • KolmogorovSmirnovTest

      public KolmogorovSmirnovTest()
  • Method Details

    • test

      public static Dataset<Row> test(Dataset<?> dataset, String sampleCol, String distName, double... params)
      Convenience function to conduct a one-sample, two-sided Kolmogorov-Smirnov test for probability distribution equality. Currently supports the normal distribution, taking as parameters the mean and standard deviation.

      Parameters:
      dataset - A Dataset or a DataFrame containing the sample of data to test
      sampleCol - Name of sample column in dataset, of any numerical type
      distName - a String name for a theoretical distribution, currently only support "norm".
      params - Double* specifying the parameters to be used for the theoretical distribution. For "norm" distribution, the parameters includes mean and variance.
      Returns:
      DataFrame containing the test result for the input sampled data. This DataFrame will contain a single Row with the following fields: - pValue: Double - statistic: Double
    • test

      public static Dataset<Row> test(Dataset<?> dataset, String sampleCol, scala.Function1<Object,Object> cdf)
    • test

      public static Dataset<Row> test(Dataset<?> dataset, String sampleCol, Function<Double,Double> cdf)
    • test

      public static Dataset<Row> test(Dataset<?> dataset, String sampleCol, String distName, scala.collection.Seq<Object> params)