pyspark.streaming.StreamingContext.checkpoint

StreamingContext.checkpoint(directory)[source]

Sets the context to periodically checkpoint the DStream operations for master fault-tolerance. The graph will be checkpointed every batch interval.

Parameters
directorystr

HDFS-compatible directory where the checkpoint data will be reliably stored