pyspark.Broadcast

class pyspark.Broadcast(sc: Optional[SparkContext] = None, value: Optional[T] = None, pickle_registry: Optional[BroadcastPickleRegistry] = None, path: Optional[str] = None, sock_file: Optional[BinaryIO] = None)[source]

A broadcast variable created with SparkContext.broadcast(). Access its value through value.

Examples

>>> from pyspark.context import SparkContext
>>> sc = SparkContext('local', 'test')
>>> b = sc.broadcast([1, 2, 3, 4, 5])
>>> b.value
[1, 2, 3, 4, 5]
>>> sc.parallelize([0, 0]).flatMap(lambda x: b.value).collect()
[1, 2, 3, 4, 5, 1, 2, 3, 4, 5]
>>> b.unpersist()
>>> large_broadcast = sc.broadcast(range(10000))

Methods

destroy([blocking])

Destroy all data and metadata related to this broadcast variable.

dump(value, f)

load(file)

load_from_path(path)

unpersist([blocking])

Delete cached copies of this broadcast on the executors.

Attributes

value

Return the broadcasted value