install.spark {SparkR}R Documentation

Download and Install Apache Spark to a Local Directory


install.spark downloads and installs Spark to a local directory if it is not found. If SPARK_HOME is set in the environment, and that directory is found, that is returned. The Spark version we use is the same as the SparkR version. Users can specify a desired Hadoop version, the remote mirror site, and the directory where the package is installed locally.


install.spark(hadoopVersion = "2.7", mirrorUrl = NULL, localDir = NULL,
  overwrite = FALSE)



Version of Hadoop to install. Default is "2.7". It can take other version number in the format of "x.y" where x and y are integer. If hadoopVersion = "without", "Hadoop free" build is installed. See "Hadoop Free" Build for more information. Other patched version names can also be used, e.g. "cdh4"


base URL of the repositories to use. The directory layout should follow Apache mirrors.


a local directory where Spark is installed. The directory contains version-specific folders of Spark packages. Default is path to the cache directory:

  • Mac OS X: ‘~/Library/Caches/spark

  • Unix: $XDG_CACHE_HOME if defined, otherwise ‘~/.cache/spark

  • Windows: ‘%LOCALAPPDATA%\Apache\Spark\Cache’.


If TRUE, download and overwrite the existing tar file in localDir and force re-install Spark (in case the local directory or file is corrupted)


The full url of remote file is inferred from mirrorUrl and hadoopVersion. mirrorUrl specifies the remote path to a Spark folder. It is followed by a subfolder named after the Spark version (that corresponds to SparkR), and then the tar filename. The filename is composed of four parts, i.e. [Spark version]-bin-[Hadoop version].tgz. For example, the full path for a Spark 2.0.0 package for Hadoop 2.7 from has path: For hadoopVersion = "without", [Hadoop version] in the filename is then without-hadoop.


the (invisible) local directory where Spark is found or installed


install.spark since 2.1.0

See Also

See available Hadoop versions: Apache Spark


## Not run: 
##D install.spark()
## End(Not run)

[Package SparkR version 2.3.0 Index]