HadoopFSUtils (Spark 3.5.5 JavaDoc)

Object
- org.apache.spark.util.HadoopFSUtils

```
public class HadoopFSUtils
extends Object
```
Utility functions to simplify and speed-up file listing.

Constructor Summary

Constructors
Constructor and Description

HadoopFSUtils()

Constructors
Constructor and Description
`HadoopFSUtils()`

Method Summary

All Methods Static Methods Concrete Methods
Modifier and Type	Method and Description
`static void`	`org$apache$spark$internal$Logging$$log__$eq(org.slf4j.Logger x$1)`
`static org.slf4j.Logger`	`org$apache$spark$internal$Logging$$log_()`
`static scala.collection.Seq<scala.Tuple2<org.apache.hadoop.fs.Path,scala.collection.Seq<org.apache.hadoop.fs.FileStatus>>>`	`parallelListLeafFiles(SparkContext sc, scala.collection.Seq<org.apache.hadoop.fs.Path> paths, org.apache.hadoop.conf.Configuration hadoopConf, org.apache.hadoop.fs.PathFilter filter, boolean ignoreMissingFiles, boolean ignoreLocality, int parallelismThreshold, int parallelismMax)` Lists a collection of paths recursively.
`static boolean`	`shouldFilterOutPathName(String pathName)` Checks if we should filter out this path name.

Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail
- HadoopFSUtils
```
public HadoopFSUtils()
```

Method Detail

parallelListLeafFiles

public static scala.collection.Seq<scala.Tuple2<org.apache.hadoop.fs.Path,scala.collection.Seq<org.apache.hadoop.fs.FileStatus>>> parallelListLeafFiles(SparkContext sc,
                                                                                                                                                        scala.collection.Seq<org.apache.hadoop.fs.Path> paths,
                                                                                                                                                        org.apache.hadoop.conf.Configuration hadoopConf,
                                                                                                                                                        org.apache.hadoop.fs.PathFilter filter,
                                                                                                                                                        boolean ignoreMissingFiles,
                                                                                                                                                        boolean ignoreLocality,
                                                                                                                                                        int parallelismThreshold,
                                                                                                                                                        int parallelismMax)

Lists a collection of paths recursively. Picks the listing strategy adaptively depending on the number of paths to list.

This may only be called on the driver.

Parameters:: sc - Spark context used to run parallel listing.; paths - Input paths to list; hadoopConf - Hadoop configuration; filter - Path filter used to exclude leaf files from result; ignoreMissingFiles - Ignore missing files that happen during recursive listing (e.g., due to race conditions); ignoreLocality - Whether to fetch data locality info when listing leaf files. If false, this will return FileStatus without BlockLocation info.; parallelismThreshold - The threshold to enable parallelism. If the number of input paths is smaller than this value, this will fallback to use sequential listing.; parallelismMax - The maximum parallelism for listing. If the number of input paths is larger than this value, parallelism will be throttled to this value to avoid generating too many tasks.
Returns:: for each input path, the set of discovered files for the path

shouldFilterOutPathName
```
public static boolean shouldFilterOutPathName(String pathName)
```
Checks if we should filter out this path name.

org$apache$spark$internal$Logging$$log_

public static org.slf4j.Logger org$apache$spark$internal$Logging$$log_()

org$apache$spark$internal$Logging$$log__$eq

public static void org$apache$spark$internal$Logging$$log__$eq(org.slf4j.Logger x$1)

Class HadoopFSUtils

Constructor Summary

Method Summary

Methods inherited from class Object

Constructor Detail

HadoopFSUtils

Method Detail

parallelListLeafFiles

shouldFilterOutPathName

org$apache$spark$internal$Logging$$log_

org$apache$spark$internal$Logging$$log__$eq