DataStreamReader.parquet(path: str, mergeSchema: Optional[bool] = None, pathGlobFilter: Union[bool, str, None] = None, recursiveFileLookup: Union[bool, str, None] = None, datetimeRebaseMode: Union[bool, str, None] = None, int96RebaseMode: Union[bool, str, None] = None) → DataFrame[source]

Loads a Parquet file stream, returning the result as a DataFrame.

New in version 2.0.0.


the path in any Hadoop supported file system

Other Parameters
Extra options

For the extra options, refer to Data Source Option. in the version you use.


Load a data stream from a temporary Parquet file.

>>> import tempfile
>>> import time
>>> with tempfile.TemporaryDirectory() as d:
...     # Write a temporary Parquet file to read it.
...     spark.range(10).write.mode("overwrite").format("parquet").save(d)
...     # Start a streaming query to read the Parquet file.
...     q = spark.readStream.schema(
...         "id LONG").parquet(d).writeStream.format("console").start()
...     time.sleep(3)
...     q.stop()