Class SamplePathFilter

All Implemented Interfaces:
org.apache.hadoop.conf.Configurable, org.apache.hadoop.fs.PathFilter

public class SamplePathFilter extends org.apache.hadoop.conf.Configured implements org.apache.hadoop.fs.PathFilter
Filter that allows loading a fraction of HDFS files.
  • Constructor Details

    • SamplePathFilter

      public SamplePathFilter()
  • Method Details

    • ratioParam

      public static String ratioParam()
    • seedParam

      public static String seedParam()
    • isFile

      public static boolean isFile(org.apache.hadoop.fs.Path path)
    • withPathFilter

      public static <T> T withPathFilter(double sampleRatio, SparkSession spark, long seed, scala.Function0<T> f)
      Sets the HDFS PathFilter flag and then restores it. Only applies the filter if sampleRatio is less than 1.

      sampleRatio - Fraction of the files that the filter picks
      spark - Existing Spark session
      seed - Random number seed
      f - The function to evaluate after setting the flag
      Returns the evaluation result T of the function
    • random

      public scala.util.Random random()
    • sampleRatio

      public double sampleRatio()
    • setConf

      public void setConf(org.apache.hadoop.conf.Configuration conf)
      Specified by:
      setConf in interface org.apache.hadoop.conf.Configurable
      setConf in class org.apache.hadoop.conf.Configured
    • accept

      public boolean accept(org.apache.hadoop.fs.Path path)
      Specified by:
      accept in interface org.apache.hadoop.fs.PathFilter