pyspark.sql.DataFrame.foreachPartition

DataFrame.foreachPartition(f: Callable[[Iterator[pyspark.sql.types.Row]], None]) → None[source]

Applies the f function to each partition of this DataFrame.

This a shorthand for df.rdd.foreachPartition().

New in version 1.3.0.

Parameters
ffunction

A function that accepts one parameter which will receive each partition to process.

Examples

>>> df = spark.createDataFrame(
...     [(14, "Tom"), (23, "Alice"), (16, "Bob")], ["age", "name"])
>>> def func(itr):
...     for person in itr:
...         print(person.name)
...
>>> df.foreachPartition(func)