pyspark.RDD.checkpoint#
- RDD.checkpoint()[source]#
Mark this RDD for checkpointing. It will be saved to a file inside the checkpoint directory set with
SparkContext.setCheckpointDir()
and all references to its parent RDDs will be removed. This function must be called before any job has been executed on this RDD. It is strongly recommended that this RDD is persisted in memory, otherwise saving it on a file will require recomputation.New in version 0.7.0.
See also
Examples
>>> rdd = sc.range(5) >>> rdd.is_checkpointed False >>> rdd.getCheckpointFile() == None True
>>> rdd.checkpoint() >>> rdd.is_checkpointed True >>> rdd.getCheckpointFile() == None True
>>> rdd.count() 5 >>> rdd.is_checkpointed True >>> rdd.getCheckpointFile() == None False