pyspark.RDD.flatMap#

RDD.flatMap(f, preservesPartitioning=False)[source]#

Return a new RDD by first applying a function to all elements of this RDD, and then flattening the results.

New in version 0.7.0.

Parameters

ffunction: a function to turn a T into a sequence of U
preservesPartitioningbool, optional, default False: indicates whether the input function preserves the partitioner, which should be False unless this is a pair RDD and the input function doesn’t modify the keys

Returns

RDD: a new RDD by applying a function to all elements

See also

RDD.map()
RDD.mapPartitions()
RDD.mapPartitionsWithIndex()
RDD.mapPartitionsWithSplit()

Examples

>>> rdd = sc.parallelize([2, 3, 4])
>>> sorted(rdd.flatMap(lambda x: range(1, x)).collect())
[1, 1, 1, 2, 2, 3]
>>> sorted(rdd.flatMap(lambda x: [(x, x), (x, x)]).collect())
[(2, 2), (2, 2), (3, 3), (3, 3), (4, 4), (4, 4)]