pyspark.sql.DataFrameWriter.parquet

DataFrameWriter.parquet(path, mode=None, partitionBy=None, compression=None)[source]

Saves the content of the DataFrame in Parquet format at the specified path.

New in version 1.4.0.

Parameters
pathstr

the path in any Hadoop supported file system

modestr, optional

specifies the behavior of the save operation when data already exists.

  • append: Append contents of this DataFrame to existing data.

  • overwrite: Overwrite existing data.

  • ignore: Silently ignore this operation if data already exists.

  • error or errorifexists (default case): Throw an exception if data already exists.

partitionBystr or list, optional

names of partitioning columns

compressionstr, optional

compression codec to use when saving to file. This can be one of the known case-insensitive shorten names (none, uncompressed, snappy, gzip, lzo, brotli, lz4, and zstd). This will override spark.sql.parquet.compression.codec. If None is set, it uses the value specified in spark.sql.parquet.compression.codec.

Examples

>>> df.write.parquet(os.path.join(tempfile.mkdtemp(), 'data'))