pyspark.SparkContext.broadcast

SparkContext.broadcast(value: T) → pyspark.broadcast.Broadcast[T][source]

Broadcast a read-only variable to the cluster, returning a Broadcast object for reading it in distributed functions. The variable will be sent to each cluster only once.

New in version 0.7.0.

Parameters
valueT

value to broadcast to the Spark nodes

Returns
Broadcast

Broadcast object, a read-only variable cached on each machine

Examples

>>> mapping = {1: 10001, 2: 10002}
>>> bc = sc.broadcast(mapping)
>>> rdd = sc.range(5)
>>> rdd2 = rdd.map(lambda i: bc.value[i] if i in bc.value else -1)
>>> rdd2.collect()
[-1, 10001, 10002, -1, -1]
>>> bc.destroy()