Main entry point for Spark functionality. A SparkContext represents
the connection to a Spark cluster, and can be used to create RDDs and broadcast
variables on that cluster.
|
__init__(self,
master,
jobName,
sparkHome=None,
pyFiles=None,
environment=None,
batchSize=1024)
Create a new SparkContext. |
source code
|
|
|
|
|
|
|
|
|
parallelize(self,
c,
numSlices=None)
Distribute a local Python collection to form an RDD. |
source code
|
|
|
textFile(self,
name,
minSplits=None)
Read a text file from HDFS, a local file system (available on all
nodes), or any Hadoop-supported file system URI, and return it as an
RDD of Strings. |
source code
|
|
|
union(self,
rdds)
Build the union of a list of RDDs. |
source code
|
|
|
broadcast(self,
value)
Broadcast a read-only variable to the cluster, returning a
Broadcast object for reading it in distributed
functions. |
source code
|
|
|
|
|
addFile(self,
path)
Add a file to be downloaded with this Spark job on every node. |
source code
|
|
|
clearFiles(self)
Clear the job's list of files added by addFile or addPyFile so that they do not get downloaded to any
new nodes. |
source code
|
|
|
addPyFile(self,
path)
Add a .py or .zip dependency for all tasks to be executed on this
SparkContext in the future. |
source code
|
|
|
|
Inherited from object :
__delattr__ ,
__format__ ,
__getattribute__ ,
__hash__ ,
__new__ ,
__reduce__ ,
__reduce_ex__ ,
__repr__ ,
__setattr__ ,
__sizeof__ ,
__str__ ,
__subclasshook__
|