pyspark.sql.DataFrame.inputFiles

DataFrame.inputFiles()[source]

Returns a best-effort snapshot of the files that compose this DataFrame. This method simply asks each constituent BaseRelation for its respective files and takes the union of all results. Depending on the source relations, this may not find all input files. Duplicates are removed.

New in version 3.1.0.

Examples

>>> df = spark.read.load("examples/src/main/resources/people.json", format="json")
>>> len(df.inputFiles())
1