spark.lapply {SparkR}R Documentation

Run a function over a list of elements, distributing the computations with Spark.

Description

Applies a function in a manner that is similar to doParallel or lapply to elements of a list. The computations are distributed using Spark. It is conceptually the same as the following code: lapply(list, func)

Known limitations: - variable scoping and capture: compared to R's rich support for variable resolutions, the

- loading external packages: In order to use a package, you need to load it inside the closure. For example, if you rely on the MASS module, here is how you would use it: ## Not run: train <- function(hyperparam) { library(MASS) lm.ridge(“y ~ x+z”, data, lambda=hyperparam) model } ## End(Not run)

Usage

spark.lapply(sc, list, func)

Arguments

sc

Spark Context to use

list

the list of elements

func

a function that takes one argument.

Value

a list of results (the exact type being determined by the function)

Examples

## Not run: 
##D doubled <- spark.lapply(1:10, function(x){2 * x})
## End(Not run)

[Package SparkR version 2.0.0 Index]