pyspark.SparkContext.pickleFile

SparkContext.pickleFile(name, minPartitions=None)[source]

Load an RDD previously saved using RDD.saveAsPickleFile() method.

Examples

>>> tmpFile = NamedTemporaryFile(delete=True)
>>> tmpFile.close()
>>> sc.parallelize(range(10)).saveAsPickleFile(tmpFile.name, 5)
>>> sorted(sc.pickleFile(tmpFile.name, 3).collect())
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]