MatrixFactorizationModel¶
-
class
pyspark.mllib.recommendation.
MatrixFactorizationModel
(java_model: py4j.java_gateway.JavaObject)[source]¶ A matrix factorisation model trained by regularized alternating least-squares.
New in version 0.9.0.
Examples
>>> r1 = (1, 1, 1.0) >>> r2 = (1, 2, 2.0) >>> r3 = (2, 1, 2.0) >>> ratings = sc.parallelize([r1, r2, r3]) >>> model = ALS.trainImplicit(ratings, 1, seed=10) >>> model.predict(2, 2) 0.4...
>>> testset = sc.parallelize([(1, 2), (1, 1)]) >>> model = ALS.train(ratings, 2, seed=0) >>> model.predictAll(testset).collect() [Rating(user=1, product=1, rating=1.0...), Rating(user=1, product=2, rating=1.9...)]
>>> model = ALS.train(ratings, 4, seed=10) >>> model.userFeatures().collect() [(1, array('d', [...])), (2, array('d', [...]))]
>>> model.recommendUsers(1, 2) [Rating(user=2, product=1, rating=1.9...), Rating(user=1, product=1, rating=1.0...)] >>> model.recommendProducts(1, 2) [Rating(user=1, product=2, rating=1.9...), Rating(user=1, product=1, rating=1.0...)] >>> model.rank 4
>>> first_user = model.userFeatures().take(1)[0] >>> latents = first_user[1] >>> len(latents) 4
>>> model.productFeatures().collect() [(1, array('d', [...])), (2, array('d', [...]))]
>>> first_product = model.productFeatures().take(1)[0] >>> latents = first_product[1] >>> len(latents) 4
>>> products_for_users = model.recommendProductsForUsers(1).collect() >>> len(products_for_users) 2 >>> products_for_users[0] (1, (Rating(user=1, product=2, rating=...),))
>>> users_for_products = model.recommendUsersForProducts(1).collect() >>> len(users_for_products) 2 >>> users_for_products[0] (1, (Rating(user=2, product=1, rating=...),))
>>> model = ALS.train(ratings, 1, nonnegative=True, seed=123456789) >>> model.predict(2, 2) 3.73...
>>> df = sqlContext.createDataFrame([Rating(1, 1, 1.0), Rating(1, 2, 2.0), Rating(2, 1, 2.0)]) >>> model = ALS.train(df, 1, nonnegative=True, seed=123456789) >>> model.predict(2, 2) 3.73...
>>> model = ALS.trainImplicit(ratings, 1, nonnegative=True, seed=123456789) >>> model.predict(2, 2) 0.4...
>>> import os, tempfile >>> path = tempfile.mkdtemp() >>> model.save(sc, path) >>> sameModel = MatrixFactorizationModel.load(sc, path) >>> sameModel.predict(2, 2) 0.4... >>> sameModel.predictAll(testset).collect() [Rating(... >>> from shutil import rmtree >>> try: ... rmtree(path) ... except OSError: ... pass
Methods
call
(name, *a)Call method of java_model
load
(sc, path)Load a model from the given path
predict
(user, product)Predicts rating for the given user and product.
predictAll
(user_product)Returns a list of predicted ratings for input user and product pairs.
Returns a paired RDD, where the first element is the product and the second is an array of features corresponding to that product.
recommendProducts
(user, num)Recommends the top “num” number of products for a given user and returns a list of Rating objects sorted by the predicted rating in descending order.
Recommends the top “num” number of products for all users.
recommendUsers
(product, num)Recommends the top “num” number of users for a given product and returns a list of Rating objects sorted by the predicted rating in descending order.
Recommends the top “num” number of users for all products.
save
(sc, path)Save this model to the given path.
Returns a paired RDD, where the first element is the user and the second is an array of features corresponding to that user.
Attributes
Rank for the features in this model
Methods Documentation
-
call
(name: str, *a: Any) → Any¶ Call method of java_model
-
classmethod
load
(sc: pyspark.context.SparkContext, path: str) → pyspark.mllib.recommendation.MatrixFactorizationModel[source]¶ Load a model from the given path
New in version 1.3.1.
-
predict
(user: int, product: int) → float[source]¶ Predicts rating for the given user and product.
New in version 0.9.0.
-
predictAll
(user_product: pyspark.rdd.RDD[Tuple[int, int]]) → pyspark.rdd.RDD[pyspark.mllib.recommendation.Rating][source]¶ Returns a list of predicted ratings for input user and product pairs.
New in version 0.9.0.
-
productFeatures
() → pyspark.rdd.RDD[Tuple[int, array.array]][source]¶ Returns a paired RDD, where the first element is the product and the second is an array of features corresponding to that product.
New in version 1.2.0.
-
recommendProducts
(user: int, num: int) → List[pyspark.mllib.recommendation.Rating][source]¶ Recommends the top “num” number of products for a given user and returns a list of Rating objects sorted by the predicted rating in descending order.
New in version 1.4.0.
-
recommendProductsForUsers
(num: int) → pyspark.rdd.RDD[Tuple[int, Tuple[pyspark.mllib.recommendation.Rating, …]]][source]¶ Recommends the top “num” number of products for all users. The number of recommendations returned per user may be less than “num”.
-
recommendUsers
(product: int, num: int) → List[pyspark.mllib.recommendation.Rating][source]¶ Recommends the top “num” number of users for a given product and returns a list of Rating objects sorted by the predicted rating in descending order.
New in version 1.4.0.
-
recommendUsersForProducts
(num: int) → pyspark.rdd.RDD[Tuple[int, Tuple[pyspark.mllib.recommendation.Rating, …]]][source]¶ Recommends the top “num” number of users for all products. The number of recommendations returned per product may be less than “num”.
-
save
(sc: pyspark.context.SparkContext, path: str) → None¶ Save this model to the given path.
New in version 1.3.0.
-
userFeatures
() → pyspark.rdd.RDD[Tuple[int, array.array]][source]¶ Returns a paired RDD, where the first element is the user and the second is an array of features corresponding to that user.
New in version 1.2.0.
Attributes Documentation
-
rank
¶ Rank for the features in this model
New in version 1.4.0.
-