pyspark.sql.DataFrameWriter.csv#

DataFrameWriter.csv(path, mode=None, compression=None, sep=None, quote=None, escape=None, header=None, nullValue=None, escapeQuotes=None, quoteAll=None, dateFormat=None, timestampFormat=None, ignoreLeadingWhiteSpace=None, ignoreTrailingWhiteSpace=None, charToEscapeQuoteEscaping=None, encoding=None, emptyValue=None, lineSep=None)[source]#

Saves the content of the DataFrame in CSV format at the specified path.

New in version 2.0.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters

pathstr

the path in any Hadoop supported file system

modestr, optional

specifies the behavior of the save operation when data already exists.

append: Append contents of this DataFrame to existing data.
overwrite: Overwrite existing data.
ignore: Silently ignore this operation if data already exists.
error or errorifexists (default case): Throw an exception if data already
exists.

Other Parameters

Extra options: For the extra options, refer to Data Source Option for the version you use.

Examples

Write a DataFrame into a CSV file and read it back.

>>> import tempfile
>>> with tempfile.TemporaryDirectory(prefix="csv") as d:
...     # Write a DataFrame into a CSV file
...     df = spark.createDataFrame([{"age": 100, "name": "Hyukjin Kwon"}])
...     df.write.csv(d, mode="overwrite")
...
...     # Read the CSV file as a DataFrame with 'nullValue' option set to 'Hyukjin Kwon'.
...     spark.read.schema(df.schema).format("csv").option(
...         "nullValue", "Hyukjin Kwon").load(d).show()
+---+----+
|age|name|
+---+----+
|100|NULL|
+---+----+