pyspark.sql.DataFrameReader.json#
- DataFrameReader.json(path, schema=None, primitivesAsString=None, prefersDecimal=None, allowComments=None, allowUnquotedFieldNames=None, allowSingleQuotes=None, allowNumericLeadingZero=None, allowBackslashEscapingAnyCharacter=None, mode=None, columnNameOfCorruptRecord=None, dateFormat=None, timestampFormat=None, multiLine=None, allowUnquotedControlChars=None, lineSep=None, samplingRatio=None, dropFieldIfAllNull=None, encoding=None, locale=None, pathGlobFilter=None, recursiveFileLookup=None, modifiedBefore=None, modifiedAfter=None, allowNonNumericNumbers=None, useUnsafeRow=None)[source]#
- Loads JSON files and returns the results as a - DataFrame.- JSON Lines (newline-delimited JSON) is supported by default. For JSON (one record per file), set the - multiLineparameter to- true.- If the - schemaparameter is not specified, this function goes through the input once to determine the input schema.- New in version 1.4.0. - Changed in version 3.4.0: Supports Spark Connect. - Parameters
- pathstr, list or RDD
- string represents path to the JSON dataset, or a list of paths, or RDD of Strings storing JSON objects. 
- schemapyspark.sql.types.StructTypeor str, optional
- an optional - pyspark.sql.types.StructTypefor the input schema or a DDL-formatted string (For example- col0 INT, col1 DOUBLE).
 
- pathstr, list or 
- Other Parameters
- Extra options
- For the extra options, refer to Data Source Option for the version you use. 
 
 - Examples - Example 1: Write a DataFrame into a JSON file and read it back. - >>> import tempfile >>> with tempfile.TemporaryDirectory(prefix="json1") as d: ... # Write a DataFrame into a JSON file ... spark.createDataFrame( ... [{"age": 100, "name": "Hyukjin"}] ... ).write.mode("overwrite").format("json").save(d) ... ... # Read the JSON file as a DataFrame. ... spark.read.json(d).show() +---+-------+ |age| name| +---+-------+ |100|Hyukjin| +---+-------+ - Example 2: Read JSON from multiple files in a directory - >>> from tempfile import TemporaryDirectory >>> with TemporaryDirectory(prefix="json2") as d1, TemporaryDirectory(prefix="json3") as d2: ... # Write a DataFrame into a JSON file ... spark.createDataFrame( ... [{"age": 30, "name": "Bob"}] ... ).write.mode("overwrite").format("json").save(d1) ... ... # Read the JSON files as a DataFrame. ... spark.createDataFrame( ... [{"age": 25, "name": "Alice"}] ... ).write.mode("overwrite").format("json").save(d2) ... spark.read.json([d1, d2]).show() +---+-----+ |age| name| +---+-----+ | 25|Alice| | 30| Bob| +---+-----+ - Example 3: Read JSON with a custom schema - >>> import tempfile >>> with tempfile.TemporaryDirectory(prefix="json4") as d: ... # Write a DataFrame into a JSON file ... spark.createDataFrame( ... [{"age": 30, "name": "Bob"}] ... ).write.mode("overwrite").format("json").save(d) ... custom_schema = "name STRING, age INT" ... spark.read.json(d, schema=custom_schema).show() +----+---+ |name|age| +----+---+ | Bob| 30| +----+---+