pyspark.sql.streaming.DataStreamReader.json

DataStreamReader.json(path, schema=None, primitivesAsString=None, prefersDecimal=None, allowComments=None, allowUnquotedFieldNames=None, allowSingleQuotes=None, allowNumericLeadingZero=None, allowBackslashEscapingAnyCharacter=None, mode=None, columnNameOfCorruptRecord=None, dateFormat=None, timestampFormat=None, multiLine=None, allowUnquotedControlChars=None, lineSep=None, locale=None, dropFieldIfAllNull=None, encoding=None, pathGlobFilter=None, recursiveFileLookup=None, allowNonNumericNumbers=None)[source]

Loads a JSON file stream and returns the results as a DataFrame.

JSON Lines (newline-delimited JSON) is supported by default. For JSON (one record per file), set the multiLine parameter to true.

If the schema parameter is not specified, this function goes through the input once to determine the input schema.

New in version 2.0.0.

Parameters
pathstr

string represents path to the JSON dataset, or RDD of Strings storing JSON objects.

schemapyspark.sql.types.StructType or str, optional

an optional pyspark.sql.types.StructType for the input schema or a DDL-formatted string (For example col0 INT, col1 DOUBLE).

Other Parameters
Extra options

For the extra options, refer to Data Source Option in the version you use.

Notes

This API is evolving.

Examples

>>> json_sdf = spark.readStream.json(tempfile.mkdtemp(), schema = sdf_schema)
>>> json_sdf.isStreaming
True
>>> json_sdf.schema == sdf_schema
True