pyspark.sql.streaming.DataStreamReader.json#
- DataStreamReader.json(path, schema=None, primitivesAsString=None, prefersDecimal=None, allowComments=None, allowUnquotedFieldNames=None, allowSingleQuotes=None, allowNumericLeadingZero=None, allowBackslashEscapingAnyCharacter=None, mode=None, columnNameOfCorruptRecord=None, dateFormat=None, timestampFormat=None, multiLine=None, allowUnquotedControlChars=None, lineSep=None, locale=None, dropFieldIfAllNull=None, encoding=None, pathGlobFilter=None, recursiveFileLookup=None, allowNonNumericNumbers=None, useUnsafeRow=None)[source]#
- Loads a JSON file stream and returns the results as a - DataFrame.- JSON Lines (newline-delimited JSON) is supported by default. For JSON (one record per file), set the - multiLineparameter to- true.- If the - schemaparameter is not specified, this function goes through the input once to determine the input schema.- New in version 2.0.0. - Changed in version 3.5.0: Supports Spark Connect. - Parameters
- pathstr
- string represents path to the JSON dataset, or RDD of Strings storing JSON objects. 
- schemapyspark.sql.types.StructTypeor str, optional
- an optional - pyspark.sql.types.StructTypefor the input schema or a DDL-formatted string (For example- col0 INT, col1 DOUBLE).
 
- Other Parameters
- Extra options
- For the extra options, refer to Data Source Option in the version you use. 
 
 - Notes - This API is evolving. - Examples - Load a data stream from a temporary JSON file. - >>> import tempfile >>> import time >>> with tempfile.TemporaryDirectory(prefix="json") as d: ... # Write a temporary JSON file to read it. ... spark.createDataFrame( ... [(100, "Hyukjin Kwon"),], ["age", "name"] ... ).write.mode("overwrite").format("json").save(d) ... ... # Start a streaming query to read the JSON file. ... q = spark.readStream.schema( ... "age INT, name STRING" ... ).json(d).writeStream.format("console").start() ... time.sleep(3) ... q.stop()