pyspark.sql.DataFrameWriter.orc#

DataFrameWriter.orc(path, mode=None, partitionBy=None, compression=None)[source]#

Saves the content of the DataFrame in ORC format at the specified path.

New in version 1.5.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
pathstr

the path in any Hadoop supported file system

modestr, optional

specifies the behavior of the save operation when data already exists.

  • append: Append contents of this DataFrame to existing data.

  • overwrite: Overwrite existing data.

  • ignore: Silently ignore this operation if data already exists.

  • error or errorifexists (default case): Throw an exception if data already exists.

partitionBystr or list, optional

names of partitioning columns

Other Parameters
Extra options

For the extra options, refer to Data Source Option for the version you use.

Examples

Write a DataFrame into a ORC file and read it back.

>>> import tempfile
>>> with tempfile.TemporaryDirectory(prefix="orc") as d:
...     # Write a DataFrame into a ORC file
...     spark.createDataFrame(
...         [{"age": 100, "name": "Hyukjin Kwon"}]
...     ).write.orc(d, mode="overwrite")
...
...     # Read the Parquet file as a DataFrame.
...     spark.read.format("orc").load(d).show()
+---+------------+
|age|        name|
+---+------------+
|100|Hyukjin Kwon|
+---+------------+