Alternating Least Squares (ALS) for Collaborative Filtering
spark.als.Rd
spark.als
learns latent factors in collaborative filtering via alternating least
squares. Users can call summary
to obtain fitted latent factors, predict
to make predictions on new data, and write.ml
/read.ml
to save/load fitted models.
Usage
spark.als(data, ...)
# S4 method for SparkDataFrame
spark.als(
data,
ratingCol = "rating",
userCol = "user",
itemCol = "item",
rank = 10,
regParam = 0.1,
maxIter = 10,
nonnegative = FALSE,
implicitPrefs = FALSE,
alpha = 1,
numUserBlocks = 10,
numItemBlocks = 10,
checkpointInterval = 10,
seed = 0
)
# S4 method for ALSModel
summary(object)
# S4 method for ALSModel
predict(object, newData)
# S4 method for ALSModel,character
write.ml(object, path, overwrite = FALSE)
Arguments
- data
a SparkDataFrame for training.
- ...
additional argument(s) passed to the method.
- ratingCol
column name for ratings.
- userCol
column name for user ids. Ids must be (or can be coerced into) integers.
- itemCol
column name for item ids. Ids must be (or can be coerced into) integers.
- rank
rank of the matrix factorization (> 0).
- regParam
regularization parameter (>= 0).
- maxIter
maximum number of iterations (>= 0).
- nonnegative
logical value indicating whether to apply nonnegativity constraints.
- implicitPrefs
logical value indicating whether to use implicit preference.
- alpha
alpha parameter in the implicit preference formulation (>= 0).
- numUserBlocks
number of user blocks used to parallelize computation (> 0).
- numItemBlocks
number of item blocks used to parallelize computation (> 0).
- checkpointInterval
number of checkpoint intervals (>= 1) or disable checkpoint (-1). Note: this setting will be ignored if the checkpoint directory is not set.
- seed
integer seed for random number generation.
- object
a fitted ALS model.
- newData
a SparkDataFrame for testing.
- path
the directory where the model is saved.
- overwrite
logical value indicating whether to overwrite if the output path already exists. Default is FALSE which means throw exception if the output path exists.
Value
spark.als
returns a fitted ALS model.
summary
returns summary information of the fitted model, which is a list.
The list includes user
(the names of the user column),
item
(the item column), rating
(the rating column), userFactors
(the estimated user factors), itemFactors
(the estimated item factors),
and rank
(rank of the matrix factorization model).
predict
returns a SparkDataFrame containing predicted values.
Details
For more details, see MLlib: Collaborative Filtering.
Note
spark.als since 2.1.0
the input rating dataframe to the ALS implementation should be deterministic.
Nondeterministic data can cause failure during fitting ALS model. For example,
an order-sensitive operation like sampling after a repartition makes dataframe output
nondeterministic, like sample(repartition(df, 2L), FALSE, 0.5, 1618L)
.
Checkpointing sampled dataframe or adding a sort before sampling can help make the
dataframe deterministic.
summary(ALSModel) since 2.1.0
predict(ALSModel) since 2.1.0
write.ml(ALSModel, character) since 2.1.0
Examples
if (FALSE) {
ratings <- list(list(0, 0, 4.0), list(0, 1, 2.0), list(1, 1, 3.0), list(1, 2, 4.0),
list(2, 1, 1.0), list(2, 2, 5.0))
df <- createDataFrame(ratings, c("user", "item", "rating"))
model <- spark.als(df, "rating", "user", "item")
# extract latent factors
stats <- summary(model)
userFactors <- stats$userFactors
itemFactors <- stats$itemFactors
# make predictions
predicted <- predict(model, df)
showDF(predicted)
# save and load the model
path <- "path/to/model"
write.ml(model, path)
savedModel <- read.ml(path)
summary(savedModel)
# set other arguments
modelS <- spark.als(df, "rating", "user", "item", rank = 20,
regParam = 0.1, nonnegative = TRUE)
statsS <- summary(modelS)
}