pyspark.sql.streaming.DataStreamReader.load

DataStreamReader.load(path: Optional[str] = None, format: Optional[str] = None, schema: Union[pyspark.sql.types.StructType, str, None] = None, **options: OptionalPrimitiveType) → DataFrame[source]

Loads a data stream from a data source and returns it as a DataFrame.

New in version 2.0.0.

Changed in version 3.5.0: Supports Spark Connect.

Parameters
pathstr, optional

optional string for file-system backed data sources.

formatstr, optional

optional string for format of the data source. Default to ‘parquet’.

schemapyspark.sql.types.StructType or str, optional

optional pyspark.sql.types.StructType for the input schema or a DDL-formatted string (For example col0 INT, col1 DOUBLE).

**optionsdict

all other string options

Notes

This API is evolving.

Examples

Load a data stream from a temporary JSON file.

>>> import tempfile
>>> import time
>>> with tempfile.TemporaryDirectory() as d:
...     # Write a temporary JSON file to read it.
...     spark.createDataFrame(
...         [(100, "Hyukjin Kwon"),], ["age", "name"]
...     ).write.mode("overwrite").format("json").save(d)
...
...     # Start a streaming query to read the JSON file.
...     q = spark.readStream.schema(
...         "age INT, name STRING"
...     ).format("json").load(d).writeStream.format("console").start()
...     time.sleep(3)
...     q.stop()