CheckpointRDD (Spark 1.2.1 JavaDoc)

Object
- org.apache.spark.rdd.RDD<T>
- - org.apache.spark.rdd.CheckpointRDD<T>

All Implemented Interfaces:

java.io.Serializable, Logging
```
public class CheckpointRDD<T>
extends RDD<T>
```
This RDD represents a RDD checkpoint file (similar to HadoopRDD).

See Also:
Serialized Form

Constructor Summary

Constructors
Constructor and Description

CheckpointRDD(SparkContext sc, String checkpointPath, scala.reflect.ClassTag<T> evidence$1)

Constructors
Constructor and Description
`CheckpointRDD(SparkContext sc, String checkpointPath, scala.reflect.ClassTag<T> evidence$1)`

Method Summary

Methods
Modifier and Type	Method and Description
`Broadcast<SerializableWritable<org.apache.hadoop.conf.Configuration>>`	`broadcastedConf()`
`void`	`checkpoint()` Mark this RDD for checkpointing.
`String`	`checkpointPath()`
`scala.collection.Iterator<T>`	`compute(Partition split, TaskContext context)` :: DeveloperApi :: Implemented by subclasses to compute a given partition.
`org.apache.hadoop.fs.FileSystem`	`fs()`
`Partition[]`	`getPartitions()` Implemented by subclasses to return the set of partitions in this RDD.
`scala.collection.Seq<String>`	`getPreferredLocations(Partition split)` Optionally overridden by subclasses to specify placement preferences.
`static void`	`main(String[] args)`
`static <T> scala.collection.Iterator<T>`	`readFromFile(org.apache.hadoop.fs.Path path, Broadcast<SerializableWritable<org.apache.hadoop.conf.Configuration>> broadcastedConf, TaskContext context)`
`static String`	`splitIdToFile(int splitId)`
`static <T> void`	`writeToFile(String path, Broadcast<SerializableWritable<org.apache.hadoop.conf.Configuration>> broadcastedConf, int blockSize, TaskContext ctx, scala.collection.Iterator<T> iterator, scala.reflect.ClassTag<T> evidence$2)`

Methods inherited from class org.apache.spark.rdd.RDD
aggregate, cache, cartesian, checkpointData, coalesce, collect, collect, collectPartitions, computeOrReadCheckpoint, conf, context, count, countApprox, countApproxDistinct, countApproxDistinct, countByValue, countByValueApprox, creationSite, dependencies, distinct, distinct, doCheckpoint, elementClassTag, filter, filterWith, first, flatMap, flatMapWith, fold, foreach, foreachPartition, foreachWith, getCheckpointFile, getCreationSite, getNarrowAncestors, getStorageLevel, glom, groupBy, groupBy, groupBy, id, intersection, intersection, intersection, isCheckpointed, iterator, keyBy, map, mapPartitions, mapPartitionsWithContext, mapPartitionsWithIndex, mapPartitionsWithSplit, mapWith, markCheckpointed, max, min, name, partitioner, partitions, persist, persist, pipe, pipe, pipe, preferredLocations, randomSplit, reduce, repartition, retag, retag, sample, saveAsObjectFile, saveAsTextFile, saveAsTextFile, setName, sortBy, sparkContext, subtract, subtract, subtract, take, takeOrdered, takeSample, toArray, toDebugString, toJavaRDD, toLocalIterator, top, toString, union, unpersist, zip, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipWithIndex, zipWithUniqueId

Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait

Methods inherited from interface org.apache.spark.Logging
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning

Constructor Detail

CheckpointRDD

public CheckpointRDD(SparkContext sc,
             String checkpointPath,
             scala.reflect.ClassTag<T> evidence$1)

Method Detail

splitIdToFile

public static String splitIdToFile(int splitId)

writeToFile

public static <T> void writeToFile(String path,
                   Broadcast<SerializableWritable<org.apache.hadoop.conf.Configuration>> broadcastedConf,
                   int blockSize,
                   TaskContext ctx,
                   scala.collection.Iterator<T> iterator,
                   scala.reflect.ClassTag<T> evidence$2)

readFromFile

public static <T> scala.collection.Iterator<T> readFromFile(org.apache.hadoop.fs.Path path,
                                            Broadcast<SerializableWritable<org.apache.hadoop.conf.Configuration>> broadcastedConf,
                                            TaskContext context)

main

public static void main(String[] args)

checkpointPath
```
public String checkpointPath()
```

broadcastedConf

public Broadcast<SerializableWritable<org.apache.hadoop.conf.Configuration>> broadcastedConf()

fs

public org.apache.hadoop.fs.FileSystem fs()

getPartitions
```
public Partition[] getPartitions()
```
Description copied from class: RDD

Implemented by subclasses to return the set of partitions in this RDD. This method will only be called once, so it is safe to implement a time-consuming computation in it.

getPreferredLocations
```
public scala.collection.Seq<String> getPreferredLocations(Partition split)
```
Description copied from class: RDD

Optionally overridden by subclasses to specify placement preferences.

compute

public scala.collection.Iterator<T> compute(Partition split,
                                   TaskContext context)

Description copied from class: RDD

:: DeveloperApi :: Implemented by subclasses to compute a given partition.

Specified by:: compute in class RDD<T>

checkpoint
```
public void checkpoint()
```
Description copied from class: RDD

Mark this RDD for checkpointing. It will be saved to a file inside the checkpoint directory set with SparkContext.setCheckpointDir() and all references to its parent RDDs will be removed. This function must be called before any job has been executed on this RDD. It is strongly recommended that this RDD is persisted in memory, otherwise saving it on a file will require recomputation.

Overrides:

checkpoint in class RDD<T>

Class CheckpointRDD<T>

Constructor Summary

Method Summary

Methods inherited from class org.apache.spark.rdd.RDD

Methods inherited from class Object

Methods inherited from interface org.apache.spark.Logging

Constructor Detail

CheckpointRDD

Method Detail

splitIdToFile

writeToFile

readFromFile

main

checkpointPath

broadcastedConf

fs

getPartitions

getPreferredLocations

compute

checkpoint