install.spark downloads and installs Spark to a local directory if it is not found. If SPARK_HOME is set in the environment, and that directory is found, that is returned. The Spark version we use is the same as the SparkR version. Users can specify a desired Hadoop version, the remote mirror site, and the directory where the package is installed locally.


  hadoopVersion = "3",
  mirrorUrl = NULL,
  localDir = NULL,
  overwrite = FALSE



Version of Hadoop to install. Default is "3". If hadoopVersion = "without", "Hadoop free" build is installed. See "Hadoop Free" Build for more information. Other patched version names can also be used.


base URL of the repositories to use. The directory layout should follow Apache mirrors.


a local directory where Spark is installed. The directory contains version-specific folders of Spark packages. Default is path to the cache directory:

  • Mac OS X: ~/Library/Caches/spark

  • Unix: $XDG_CACHE_HOME if defined, otherwise ~/.cache/spark

  • Windows: %LOCALAPPDATA%\Apache\Spark\Cache.


If TRUE, download and overwrite the existing tar file in localDir and force re-install Spark (in case the local directory or file is corrupted)


the (invisible) local directory where Spark is found or installed


The full url of remote file is inferred from mirrorUrl and hadoopVersion. mirrorUrl specifies the remote path to a Spark folder. It is followed by a subfolder named after the Spark version (that corresponds to SparkR), and then the tar filename. The filename is composed of four parts, i.e. [Spark version]-bin-[Hadoop version].tgz. For example, the full path for a Spark 3.3.1 package from has path: For hadoopVersion = "without", [Hadoop version] in the filename is then without-hadoop.


install.spark since 2.1.0

