MatrixFactorizationModel¶

class pyspark.mllib.recommendation.MatrixFactorizationModel(java_model: py4j.java_gateway.JavaObject)[source]¶

A matrix factorisation model trained by regularized alternating least-squares.

New in version 0.9.0.

Examples

>>> r1 = (1, 1, 1.0)
>>> r2 = (1, 2, 2.0)
>>> r3 = (2, 1, 2.0)
>>> ratings = sc.parallelize([r1, r2, r3])
>>> model = ALS.trainImplicit(ratings, 1, seed=10)
>>> model.predict(2, 2)
0.4...

>>> testset = sc.parallelize([(1, 2), (1, 1)])
>>> model = ALS.train(ratings, 2, seed=0)
>>> model.predictAll(testset).collect()
[Rating(user=1, product=1, rating=1.0...), Rating(user=1, product=2, rating=1.9...)]

>>> model = ALS.train(ratings, 4, seed=10)
>>> model.userFeatures().collect()
[(1, array('d', [...])), (2, array('d', [...]))]

>>> model.recommendUsers(1, 2)
[Rating(user=2, product=1, rating=1.9...), Rating(user=1, product=1, rating=1.0...)]
>>> model.recommendProducts(1, 2)
[Rating(user=1, product=2, rating=1.9...), Rating(user=1, product=1, rating=1.0...)]
>>> model.rank
4

>>> first_user = model.userFeatures().take(1)[0]
>>> latents = first_user[1]
>>> len(latents)
4

>>> model.productFeatures().collect()
[(1, array('d', [...])), (2, array('d', [...]))]

>>> first_product = model.productFeatures().take(1)[0]
>>> latents = first_product[1]
>>> len(latents)
4

>>> products_for_users = model.recommendProductsForUsers(1).collect()
>>> len(products_for_users)
2
>>> products_for_users[0]
(1, (Rating(user=1, product=2, rating=...),))

>>> users_for_products = model.recommendUsersForProducts(1).collect()
>>> len(users_for_products)
2
>>> users_for_products[0]
(1, (Rating(user=2, product=1, rating=...),))

>>> model = ALS.train(ratings, 1, nonnegative=True, seed=123456789)
>>> model.predict(2, 2)
3.73...

>>> df = sqlContext.createDataFrame([Rating(1, 1, 1.0), Rating(1, 2, 2.0), Rating(2, 1, 2.0)])
>>> model = ALS.train(df, 1, nonnegative=True, seed=123456789)
>>> model.predict(2, 2)
3.73...

>>> model = ALS.trainImplicit(ratings, 1, nonnegative=True, seed=123456789)
>>> model.predict(2, 2)
0.4...

>>> import os, tempfile
>>> path = tempfile.mkdtemp()
>>> model.save(sc, path)
>>> sameModel = MatrixFactorizationModel.load(sc, path)
>>> sameModel.predict(2, 2)
0.4...
>>> sameModel.predictAll(testset).collect()
[Rating(...
>>> from shutil import rmtree
>>> try:
...     rmtree(path)
... except OSError:
...     pass

Methods

`call`(name, *a)	Call method of java_model
`load`(sc, path)	Load a model from the given path
`predict`(user, product)	Predicts rating for the given user and product.
`predictAll`(user_product)	Returns a list of predicted ratings for input user and product pairs.
`productFeatures`()	Returns a paired RDD, where the first element is the product and the second is an array of features corresponding to that product.
`recommendProducts`(user, num)	Recommends the top “num” number of products for a given user and returns a list of Rating objects sorted by the predicted rating in descending order.
`recommendProductsForUsers`(num)	Recommends the top “num” number of products for all users.
`recommendUsers`(product, num)	Recommends the top “num” number of users for a given product and returns a list of Rating objects sorted by the predicted rating in descending order.
`recommendUsersForProducts`(num)	Recommends the top “num” number of users for all products.
`save`(sc, path)	Save this model to the given path.
`userFeatures`()	Returns a paired RDD, where the first element is the user and the second is an array of features corresponding to that user.

Attributes

rank

Rank for the features in this model

Methods Documentation

call(name: str, *a: Any) → Any¶: Call method of java_model

classmethod load(sc: pyspark.context.SparkContext, path: str) → pyspark.mllib.recommendation.MatrixFactorizationModel [source]¶: Load a model from the given path

New in version 1.3.1.

predict(user: int, product: int) → float[source]¶: Predicts rating for the given user and product.

New in version 0.9.0.

predictAll(user_product: pyspark.rdd.RDD[Tuple[int, int]]) → pyspark.rdd.RDD[pyspark.mllib.recommendation.Rating][source]¶: Returns a list of predicted ratings for input user and product pairs.

New in version 0.9.0.

productFeatures() → pyspark.rdd.RDD[Tuple[int, array.array]][source]¶: Returns a paired RDD, where the first element is the product and the second is an array of features corresponding to that product.

New in version 1.2.0.

recommendProducts(user: int, num: int) → List[pyspark.mllib.recommendation.Rating][source]¶: Recommends the top “num” number of products for a given user and returns a list of Rating objects sorted by the predicted rating in descending order.

New in version 1.4.0.

recommendProductsForUsers(num: int) → pyspark.rdd.RDD[Tuple[int, Tuple[pyspark.mllib.recommendation.Rating, …]]][source]¶: Recommends the top “num” number of products for all users. The number of recommendations returned per user may be less than “num”.

recommendUsers(product: int, num: int) → List[pyspark.mllib.recommendation.Rating][source]¶: Recommends the top “num” number of users for a given product and returns a list of Rating objects sorted by the predicted rating in descending order.

New in version 1.4.0.

recommendUsersForProducts(num: int) → pyspark.rdd.RDD[Tuple[int, Tuple[pyspark.mllib.recommendation.Rating, …]]][source]¶: Recommends the top “num” number of users for all products. The number of recommendations returned per product may be less than “num”.

save(sc: pyspark.context.SparkContext, path: str) → None¶: Save this model to the given path.

New in version 1.3.0.

userFeatures() → pyspark.rdd.RDD[Tuple[int, array.array]][source]¶: Returns a paired RDD, where the first element is the user and the second is an array of features corresponding to that user.

New in version 1.2.0.

Attributes Documentation

rank¶: Rank for the features in this model

New in version 1.4.0.

RandomRDDs

ALS