pyspark.RDD.cleanShuffleDependencies

RDD.cleanShuffleDependencies(blocking: bool = False) → None[source]

Removes an RDD’s shuffles and it’s non-persisted ancestors.

When running without a shuffle service, cleaning up shuffle files enables downscaling. If you use the RDD after this call, you should checkpoint and materialize it first.

New in version 3.3.0.

Parameters
blockingbool, optional

block on shuffle cleanup tasks. Disabled by default.

Notes

This API is a developer API.