pyspark.RDD.localCheckpoint¶
- 
RDD.localCheckpoint() → None[source]¶
- Mark this RDD for local checkpointing using Spark’s existing caching layer. - This method is for users who wish to truncate RDD lineages while skipping the expensive step of replicating the materialized data in a reliable distributed file system. This is useful for RDDs with long lineages that need to be truncated periodically (e.g. GraphX). - Local checkpointing sacrifices fault-tolerance for performance. In particular, checkpointed data is written to ephemeral local storage in the executors instead of to a reliable, fault-tolerant storage. The effect is that if an executor fails during the computation, the checkpointed data may no longer be accessible, causing an irrecoverable job failure. - This is NOT safe to use with dynamic allocation, which removes executors along with their cached blocks. If you must use both features, you are advised to set spark.dynamicAllocation.cachedExecutorIdleTimeout to a high value. - The checkpoint directory set through - SparkContext.setCheckpointDir()is not used.- New in version 2.2.0. - Examples - >>> rdd = sc.range(5) >>> rdd.isLocallyCheckpointed() False - >>> rdd.localCheckpoint() >>> rdd.isLocallyCheckpointed() True