ARTICLE N PRESS Knowledge-Based Systems xxx(2011)XXX-XXX Contents lists available at SciVerse Science Direct Knowledge-Based Systems ELSEVIER journalhomepagewww.elsevier.com/locate/knosys Incremental Collaborative Filtering recommender based on Regularized Matrix Factorization Xin Luo", Yunni Xia, Qingsheng Zhu College of Computer Science, Chongqing Universiry, Chongqing 400044, China ARTICLE INFO A BSTRACT The Matrix-Factorization( MF) based models have become popular when building Collaborative Filtering 6 March 2011 (CF)recommenders, due to the high accuracy and scalability. However, most of the current MF based revised form 7 September 2011 9 September 2011 models are batch models that are incapable of being incrementally updated; while in real world applica Available online xxxx tions users always enjoy receiving quick responses from the system once they have made feedbacks In this work, we aim to design an incremental cf recommender based on the regularized matrix Factoriza- tion(RMF). To achieve this objective, we first simplify the training rule of RMF to propose the SI-RMF nder system which provides a simple mathematic form for further investigation; whereby we design two Incremental RMF models, respectively are the Incremental RMF (IRMF)and the Incremental RMF with linear biases (IRMF-B) The experiments on two large, real datasets suggest positive results, which prove the efficiency Matrix Factorization our e 2011 Elsevier B V. All rights reserved. Incremental learning 1 Introduction ences, and then makes recommendations according to the personal tastes. Compared with CB. CF requires no domain knowledge; in In this era of information, online consumption is indispensable addition, CF offers the potential to uncover patterns that would s daily life. Millions of commodities are provided online be difficult or impossible to profile using CB techniques. Both of by numerous web retailers, who are suffering from a serious prob- these characteristics make real world applications usually rely on lem: how to help their potential customers find their personal CF recommenders 8-13]. favorites? Personalized recommender systems, a specified cate- nside a CF recommender, the user preferences (either explicit gory of knowledge based systems which can provide people with ratings or implicit behaviors such as clicks )on involved items are suggestions according to individual interests, have emerged to ad- quantized into a user-item rating matrix, where high ratings suggest dress this concern since the early 1990s [ 1-6. With recommender strong preferences. So the problem of CF can be regarded as the miss systems, the web merchants can make their online stores more hu- ing data estimation, where the main task is to estimate the unknown man-oriented by offering personalized recommendations which user-item pairs of the rating matrix based on the known entries. can greatly enhance the user experience. According to a recent sur- According to the recent progress in Cf techniques, current CF algo- vey, this thriving subfield of data mining is becoming more and rithms are primarily based on two models: the Neighborhood Based more important to e-commerce providers]. Model (NBM)[ 14-18 and the Latent Factor Model (LFM)[19-27). Generally speaking, there are two kinds of subjects within a NBM constructs the relationships between users or items to build recommender system: the users who will benefit from the the neighborhoods of corresponding entities, and makes predictions recommendation service and the items which the system is going based on the known ratings by the active users neighbors, or those n the active items neighbors. Different from NBM, LFM transforms ecommender systems fall into two groups: the Content Based both items and users into the same latent factor space, characterizes (CB)and the Collaborative Filtering(CF)[5.CB profiles items based each entity with a feature vector inferred from the existing ratings, their contents, and then associates users with content-matching and makes predictions for unknown ratings using the inner products products by comparing the corresponding profiles. On the other of the corresponding vector pairs. Compared with NBM, LFM offers a plates users 'behaviors to model individual prefer- higher expressive ability to describe aspects of data, usually along with higher accuracy; LFM has when designing CF recommenders. Corresponding author. Tel. +86 13996169379 Luo). xiayunni@yahoo con (Y. Xia). qszhuecqueducn(Q. Zhu) trix Factorization (MF). The earliest work of this kind is proposed by 0950-7051/S-see front matter o 2011 Elsevier B.V. All rights reserved. doi:10.1016/ knosys.2011.09006 Please cite this article in press as: X. Luo et al. Incremental Collaborative Filtering recommender based on Regularized Matrix Factorization, Knowl. Based syst(2011.do:10.016/ j knosys:201109006
Incremental Collaborative Filtering recommender based on Regularized Matrix Factorization Xin Luo ⇑ , Yunni Xia, Qingsheng Zhu College of Computer Science, Chongqing University, Chongqing 400044, China article info Article history: Received 16 March 2011 Received in revised form 7 September 2011 Accepted 9 September 2011 Available online xxxx Keywords: Recommender system Collaborative Filtering Latent Factor Model Matrix Factorization Regularized Incremental learning abstract The Matrix-Factorization (MF) based models have become popular when building Collaborative Filtering (CF) recommenders, due to the high accuracy and scalability. However, most of the current MF based models are batch models that are incapable of being incrementally updated; while in real world applications users always enjoy receiving quick responses from the system once they have made feedbacks. In this work, we aim to design an incremental CF recommender based on the Regularized Matrix Factorization (RMF). To achieve this objective, we first simplify the training rule of RMF to propose the SI-RMF, which provides a simple mathematic form for further investigation; whereby we design two Incremental RMF models, respectively are the Incremental RMF (IRMF) and the Incremental RMF with linear biases (IRMF-B). The experiments on two large, real datasets suggest positive results, which prove the efficiency of our strategy. 2011 Elsevier B.V. All rights reserved. 1. Introduction In this era of information, online consumption is indispensable to people’s daily life. Millions of commodities are provided online by numerous web retailers, who are suffering from a serious problem: how to help their potential customers find their personal favorites? Personalized recommender systems, a specified category of knowledge based systems which can provide people with suggestions according to individual interests, have emerged to address this concern since the early 1990s [1–6]. With recommender systems, the web merchants can make their online stores more human-oriented by offering personalized recommendations which can greatly enhance the user experience. According to a recent survey, this thriving subfield of data mining is becoming more and more important to e-commerce providers [7]. Generally speaking, there are two kinds of subjects within a recommender system: the users who will benefit from the recommendation service and the items which the system is going to recommend. According to pioneering research, online recommender systems fall into two groups: the Content Based (CB) and the Collaborative Filtering (CF) [5]. CB profiles items based on their contents, and then associates users with content-matching products by comparing the corresponding profiles. On the other hand, CF accumulates users’ behaviors to model individual preferences, and then makes recommendations according to the personal tastes. Compared with CB, CF requires no domain knowledge; in addition, CF offers the potential to uncover patterns that would be difficult or impossible to profile using CB techniques. Both of these characteristics make real world applications usually rely on CF recommenders [8–13]. Inside a CF recommender, the user preferences (either explicit ratings or implicit behaviors such as clicks) on involved items are quantized into a user-item rating matrix, where high ratings suggest strong preferences. So the problem of CF can be regarded as the missing data estimation, where the main task is to estimate the unknown user-item pairs of the rating matrix based on the known entries. According to the recent progress in CF techniques, current CF algorithms are primarily based on two models: the Neighborhood Based Model (NBM) [14–18] and the Latent Factor Model (LFM) [19–27]. NBM constructs the relationships between users or items to build the neighborhoods of corresponding entities, and makes predictions based on the known ratings by the active user’s neighbors, or those on the active item’s neighbors. Different from NBM, LFM transforms both items and users into the same latent factor space, characterizes each entity with a feature vector inferred from the existing ratings, and makes predictions for unknown ratings using the inner products of the corresponding vector pairs. Compared with NBM, LFM offers a higher expressive ability to describe various aspects of data, usually along with higher accuracy; LFM has consequently become popular when designing CF recommenders. One most successful kind of approaches to LFM is based on Matrix Factorization (MF). The earliest work of this kind is proposed by 0950-7051/$ - see front matter 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.knosys.2011.09.006 ⇑ Corresponding author. Tel.: +86 13996169379. E-mail addresses: luoxin21@cqu.edu.cn (X. Luo), xiayunni@yahoo.com.cn (Y. Xia), qszhu@cqu.edu.cn (Q. Zhu). Knowledge-Based Systems xxx (2011) xxx–xxx Contents lists available at SciVerse ScienceDirect Knowledge-Based Systems journal homepage: www.elsevier.com/locate/knosys Please cite this article in press as: X. Luo et al., Incremental Collaborative Filtering recommender based on Regularized Matrix Factorization, Knowl. Based Syst. (2011), doi:10.1016/j.knosys.2011.09.006
ARTICLE N PRESS X Luo et al/ Knowledge-Based Systems xxx(2011)xx-xx Sarwar et al. employing the Singular Value Decomposition(SVD) avoid overfitting, it is natural that the data of V cannot be used dur- [19 More recently, several MF techniques have been successfully ing the creation of this recommender as VnT=o applied to implementing LFM, including the Probabilistic Latent During the NetFlix Prize, one of the most popular approaches for Semantic Analysis by Hofmann [ 20, the Maximum Margin Matrix solving the CF problem is based on Matrix Factorization. This ctorization by Srebro et al. [21, and the Expectation Maximiza- should be primarily credited to Webb's publication of the rMF tion for Matrix Factorization by Kuzucz et al. [22]. During the Netf- model [23 Similar to the Svd based approach proposed by Sarwar Prize, Brandyn Webb published the Regularized Matrix et al. [19. the goal of rMf is also to construct a low rank approx Factorization [23(RMF, which is based on Gorrell et al.s works imation of R. However, different from traditional SvD techniques. on Generalized Hebbian Learning 28, 29 under the pseudonym Si- the RMF model can deal with matrices with missing values. As de- mon Funk, which is accurate highly scalable and easy to imple scribed in [23, let f denote the dimension of the feature space, ent. Inspired by RMF. many researchers have implemented and PE Rmx denote the user feature matrix where each row represents further investigated MF based approaches. Paterek [24 Takacs et a specified user, and Q ER/xm denote the item feature matrix al. [25], Salakhutdinov et al. [26 and Koren 27 all have proposed where each column represents a specified item(naturally f< m phisticated MF based on the CF recommenders, which possess andf n), then the approximation of the rating by user u on item high recommendation performance i could be confined to the inner product of the corresponding user Although the aforementioned LFM-based recommenders have item feature vector pair given by high scalability and accuracy. they all require a batch-training pr cess based on static training datasets. However, in real-world Tui puq applications, this pre-condition can be rarely satisfied. In modern Thus the task concentrates on appropriately estimating the val- e-commerce systems, user feedbacks vary at every millisecond, ues of parameters in P and Q In RMF, the parameter training is per constantly being generated. One strategy to tackle this rapid data formed by applying the stochastic gradient decent (SgD)to the expansion is to totally rebuild the model if the data growth exceeds approximation error, which is denoted by the regularized squared a pre-defined threshold; however, this strategy will lead to a huge error(RSE)on T: rapid response to user feedback. In fact, the desirable way to cope RSE=2( q1)+A(pull+l1). rith such situations is to enable the recommenders to learn incre- uIET mentally through updating the current model in accordance with where ll denotes the standard Euclidean norm. Note that the bulk the newly arriving data. behind the parameter i is the Tikhonov regularizing term for avoid- In this work we focus on implementing an incremental CF rec- ing overfitting. Then the local minimum of the rSe by Eq (2)can be ommender based on MF techniques. To achieve this objective, we reached by employing the SGD solver, which loops through all rat intend to investigate the training process of mf based recom- ings in T and modifies the involved parameters by moving in the mender, and incrementally update the trained parameters accord- opposite direction of the gradient for each training example ing to the newly arriving data. The main contributions of this There are two methods for updating the feature matrices when ork include learning from each training example. One is to initialize each feature in the feature matrices with a constant value, and train a single fea- The sequence independent Regularized Matrix Factorization ture based on the rating residuals of already trained features for each SI-RMF)model, an RMF variant with the removal of sensitivity training example. The other is to initialize each entry in P and Q with to the input sequence of training examples, and a simple math random values uniformly chosen from a pre-defined scale, and up- ematic form for making incremental updates. date the involved latent features simultaneously at each training The Incremental RMF (IRMF), which can incrementally learn example Through empirical studies[25]. Takacs et al. find that these from new examples without retraining the entire model. two attitudes can lead to approximate prediction accuracy: how RMF with linear biases(IRMF-B), an incremental CF recom- ever, the latter can provide a faster convergence rate For each train- mender which incorporates the linear biases for providing mo ing example, when using the latter method to update the involved accurate predictions than IRMf while maintaining the incre- latent feature vectors, the updating rule is formulated by mental learning ability. Empirical studies on two large, real datasets. (P,Q)= arg min rsE→ ∫RSE=-2(m-P29+21p & RSE=-2(Tui-Puqi)Pu+ 2iqi Section 2 gives the preliminaries. Then Section 3 describes our pu←pa+n(ra-pq)1-ip) methods. The empirical validations are given in Section 4. Finally. Section 5 concludes q←q+n(Ta-p2q)u-/q where n denotes the learning rate. As a standard stochastic gradient 2. Preliminaries solver, the system will train all involved feature factors with Eq. 3) on the given training instances sequentially. This training process and a user set U, then users' preferences on items could be denoted epoch. It is common that several training epochs are needed till y a Ul x Il matrix R where each row vector denotes a specified the model converges: and the epoch number mainly depends on er, each column vector denotes a specified item and each enti the values of n and 7, as well as the size of the training dataset. Also ruie r denotes the user u,s preference on item i, usually with high according to Takacs et al.[25, an early stopping criterion is needed alues indicating strong preferences(note that the entries in r for avoiding overfitting could depend on either explicit user preferences like ratings, or im- Due to its accuracy, efficiency and ease of implementation, RMF plicit user preferences like clicks). Let Rx denote the set consisting of is implemented and further investigated by many researchers 24- known entries in R. Given a finite set of ratings T C Rx as the training 27]. Several variants of this model have been proposed, among set, the objective is to construct a heuristic estimator, so-called rec- which a very successful one is the rmf with linear biases(hereaf- mender, which can estimate the unknown ratings with minima ter referred to as RMF-B)proposed by Paterek [24], and Koren [27] accumulative error based on known ratings in T. The accumulative RMF-B incorporates linear biases into the prediction rule(1)in or- error of the recommender is evaluated on a validation set V Rk to der to decompose the observed rating into several component Please cite this article in press as: X. Luo et al. Incremental Collaborative Filtering recommender based on Regularized Matrix Factorization, Knowl. Base yst.(2011).doi:10.1016/ knosys2011.09.006
Sarwar et al. employing the Singular Value Decomposition (SVD) [19]. More recently, several MF techniques have been successfully applied to implementing LFM, including the Probabilistic Latent Semantic Analysis by Hofmann [20], the Maximum Margin Matrix Factorization by Srebro et al. [21], and the Expectation Maximization for Matrix Factorization by Kuzucz et al. [22]. During the Netflix Prize, Brandyn Webb published the Regularized Matrix Factorization [23] (RMF, which is based on Gorrell et al.’s works on Generalized Hebbian Learning [28,29]) under the pseudonym Simon Funk, which is accurate, highly scalable and easy to implement. Inspired by RMF, many researchers have implemented and further investigated MF based approaches. Paterek [24], Takács et al. [25], Salakhutdinov et al. [26] and Koren [27] all have proposed sophisticated MF based on the CF recommenders, which possess high recommendation performance. Although the aforementioned LFM-based recommenders have high scalability and accuracy, they all require a batch-training process based on static training datasets. However, in real-world applications, this pre-condition can be rarely satisfied. In modern e-commerce systems, user feedbacks vary at every millisecond, constantly being generated. One strategy to tackle this rapid data expansion is to totally rebuild the model if the data growth exceeds a pre-defined threshold; however, this strategy will lead to a huge amount of repetitive training overhead, whilst it still cannot make rapid response to user feedback. In fact, the desirable way to cope with such situations is to enable the recommenders to learn incrementally through updating the current model in accordance with the newly arriving data. In this work we focus on implementing an incremental CF recommender based on MF techniques. To achieve this objective, we intend to investigate the training process of MF based recommender, and incrementally update the trained parameters according to the newly arriving data. The main contributions of this work include: – The sequence independent Regularized Matrix Factorization (SI-RMF) model, an RMF variant with the removal of sensitivity to the input sequence of training examples, and a simple mathematic form for making incremental updates. – The Incremental RMF (IRMF), which can incrementally learn from new examples without retraining the entire model. – IRMF with linear biases (IRMF-B), an incremental CF recommender which incorporates the linear biases for providing more accurate predictions than IRMF while maintaining the incremental learning ability. – Empirical studies on two large, real datasets. Section 2 gives the preliminaries. Then Section 3 describes our methods. The empirical validations are given in Section 4. Finally, Section 5 concludes. 2. Preliminaries The problem of CF can be defined as follows: given an item set I and a user set U, then users’ preferences on items could be denoted by a jUjjIj matrix R where each row vector denotes a specified user, each column vector denotes a specified item and each entry rui 2 R denotes the user u’s preference on item i, usually with high values indicating strong preferences (note that the entries in R could depend on either explicit user preferences like ratings, or implicit user preferences like clicks). Let RK denote the set consisting of known entries in R. Given a finite set of ratings T RK as the training set, the objective is to construct a heuristic estimator, so-called recommender, which can estimate the unknown ratings with minimal accumulative error based on known ratings in T. The accumulative error of the recommender is evaluated on a validation set V RK; to avoid overfitting, it is natural that the data of V cannot be used during the creation of this recommender as V \ T = /. During the NetFlix Prize, one of the most popular approaches for solving the CF problem is based on Matrix Factorization. This should be primarily credited to Webb’s publication of the RMF model [23]. Similar to the SVD based approach proposed by Sarwar et al. [19], the goal of RMF is also to construct a low rank approximation of R. However, different from traditional SVD techniques, the RMF model can deal with matrices with missing values. As described in [23], let f denote the dimension of the feature space, P 2 Rmf denote the user feature matrix where each row represents a specified user, and Q 2 Rfn denote the item feature matrix where each column represents a specified item (naturally f m and f n), then the approximation of the rating by user u on item i could be confined to the inner product of the corresponding useritem feature vector pair given by: ^rui ¼ puqi: ð1Þ Thus the task concentrates on appropriately estimating the values of parameters in P and Q. In RMF, the parameter training is performed by applying the stochastic gradient decent (SGD) to the approximation error, which is denoted by the regularized squared error (RSE) on T: RSE ¼ X u;i2T rui pT uqi þ kðkpuk 2 þ kqik2 Þ; ð2Þ where kk denotes the standard Euclidean norm. Note that the bulk behind the parameter k is the Tikhonov regularizing term for avoiding overfitting. Then the local minimum of the RSE by Eq. (2) can be reached by employing the SGD solver, which loops through all ratings in T and modifies the involved parameters by moving in the opposite direction of the gradient for each training example. There are two methods for updating the feature matrices when learning from each training example. One is to initialize each feature in the feature matrices with a constant value, and train a single feature based on the rating residuals of already trained features for each training example. The other is to initialize each entry in P and Q with random values uniformly chosen from a pre-defined scale, and update the involved latent features simultaneously at each training example. Through empirical studies [25], Takacs et al. find that these two attitudes can lead to approximate prediction accuracy; however, the latter can provide a faster convergence rate. For each training example, when using the latter method to update the involved latent feature vectors, the updating rule is formulated by: ðP; QÞ ¼ arg min p;q RSE ) @ @pu RSE ¼ 2ðrui puqiÞqi þ 2kpu @ @qi RSE ¼ 2ðrui puqiÞpu þ 2kqi ( ) pu pu þ gððrui puqiÞqi kpuÞ qi qi þ gððrui puqiÞpu kqiÞ ; ð3Þ where g denotes the learning rate. As a standard stochastic gradient solver, the system will train all involved feature factors with Eq. (3) on the given training instances sequentially. This training process would be repeated for several times, each time is called a training epoch. It is common that several training epochs are needed till the model converges; and the epoch number mainly depends on the values of g and k, as well as the size of the training dataset. Also, according to Takács et al. [25], an early stopping criterion is needed for avoiding overfitting. Due to its accuracy, efficiency and ease of implementation, RMF is implemented and further investigated by many researchers [24– 27]. Several variants of this model have been proposed, among which a very successful one is the RMF with linear biases (hereafter referred to as RMF-B) proposed by Paterek [24], and Koren [27]. RMF-B incorporates linear biases into the prediction rule (1) in order to decompose the observed rating into several components, 2 X. Luo et al. / Knowledge-Based Systems xxx (2011) xxx–xxx Please cite this article in press as: X. Luo et al., Incremental Collaborative Filtering recommender based on Regularized Matrix Factorization, Knowl. Based Syst. (2011), doi:10.1016/j.knosys.2011.09.006
ARTICLE N PRESS h of which explains only the relevant part of a signal. Paterek iterative learning method, where during each training epoch the integrates two linear biases, bu and bi, which respectively denote training examples are learned sequentially. Let r(u)denote the rat- the observed biases of user u and item i [24 In addition, Koren ing set of a specified user u, then each training instance in R(u)can suggests incorporating the global average u for specifying the sta- be orderly marked as ru. l, ru.2,..., uk, . rux (here we only mark the tistical characteristics of the training data [27]. Thus, the prediction known ratings by user u, so the subscript k denotes the kth item rule(1)turns into ated by user u, but not the kth item in the rating matrix)where Tui=u+bu+ bi+puqi K=R(u). Let pl denote the initial value of pu at the beginning (4) of one training epoch, and pa denote the value of Pu after the train- turns into ing example TuI is learned, by applying feature updating rule( 3). RSE=∑u--b0-b-93+(6+的2+p+n Pa=Po)+n((rui-proqi where the linear biases bu and bi are trained along with the latent features pu and qi: and the global average u is estimated by the ob- (1-na)p(o)+n(ru.1-poqh,)a1 served average of the training dataset since it captures the statistic where h, denotes the current status of qr.Let characteristic of the known data The aforementioned RMF based models have been proved to be highly accurate and scalable. However, since they are batch learning Then Eq(6)is transformed into: methods, they naturally require batch training processes based on static training datasets. The item and user sets are supposed to be pl=cp)+n(rui-pio qf )afb stationary, and once the rating data expands to a certain extent, iscover new patterns from the new data is to totally Similarly, we can infer the situation from ru2 to Tux as follows: hatural that the training data are gradually accumulated, whilst p2=pd+n(u2-plq22)922 i users always expect to receive quick feedback once they evaluate a p3)=p(2)+n(ru3-P2)9 3 3)93 specified item. In such situation, recommenders being able to learn from the newly arriving data incrementally are necessary Incremental recommendation is an attractive topic in the area of Pu)=cP(-1)+/(Tux-plK-1qkx )ak k) recommender systems, and there are several relevant pioneering By combining(8)and(9), we obtain the expression of the train- works. Papagelis et al. study incremental CF in the context of sim larity-based K-Nearest-Neighborhood(KNN) model, and propose ing result of pu after one training epoch: an incremental user-oriented KNN based recommender 130). Mir- pl )=cpa+ck-in(ru -poq( )qf)ck-2in(ruz-po q2a)92) KNN, and propose the incremental item-oriented KNN model [31. The empirical studies in [31 show that these two models where qh a) denotes the temporal state of qk when the system is going can achieve similar accuracy, while the latter provides better scala- to learn the kth example rated by user u. Similarly, let R()denote the bility due to the relatively small number of items. Nonetheless, rating set on item i and mark each training instance in r(i) as r1.T2. based KNN models can be easily outperformed by global-optimistic b..,,h.i,..., THj where H= R(i)I(here we only mark the known rat- ings on item i, so the subscript h denotes the hth user rating on menders based on the state-of-the-art models. Agarwal et al. pro item i, but not the hth user in the rating matrix), we can then infer pose a powerful incremental recommender named FOBFM based the expression of the training result of q after one training epoch as: ch9+“(-)+“(-r2P)r unavailable under many circumstances like in the Neflix Prize Since mF based recommenders are highly accurate and scalable, +…+(m-rq“1)e (11) it is expectable to implement incremental recommenders based By combining(10)and(11), we locate the task of incremental MF techniques. There are also some pioneering works referring to learning in finding the expression of pl+) and oH+l) based on this issue. Sawar et al. propose the incremental SVD based recom- pi) and g( h) respectively. However, Eqs. (10)and(11)are too com- mender based on the fold-in technique [33 Takacs et aL. [25], and plicated for implementing the incremental update, in the following update by fixing the current model and training the latent factors of new users/items on arrival of corresponding feedbacks. However, The final outputs are sensitive to the input sequence of the current MF based incremental recommenders can only update the examples. Since c<1, the impact of the early inputted example ding del roughly by including the new items/users will be gradually weakened as the training process is going the already trained latent factors. In this work, we aim at designing an MF based incremental recommender, which can not only deal The variance of a latent feature learnt from the current example is not only dependent on its initial value, but also on the tem with the new users/items, but also update the already trained fea tures. In the following section, we will discuss our method in detail. ing sample; e.g. the value of p() depends on the value of gth) 3. Method Note that usually it will take several training epochs to reach the model converge; and the training results of the current epoch First of all, we focus on designing an incremental CF recom- would be used as the initial value of the subsequent epoch. So, it mender based on the original RMF model From the prediction rule would be intractable to trace the variance of the features during (1), RSE (2)and feature updating rule(3), we see that RMF is an the training process based on(10)and(11) Please cite this article in press as: X. Luo et al. Incremental Collaborative Filtering recommender based on Regularized Matrix Factorization, Knowl. Based syst(2011.do:10.016/ j knosys:201109006
each of which explains only the relevant part of a signal. Paterek integrates two linear biases, bu and bi, which respectively denote the observed biases of user u and item i [24]. In addition, Koren suggests incorporating the global average l for specifying the statistical characteristics of the training data [27]. Thus, the prediction rule (1) turns into: ^rui ¼ l þ bu þ bi þ puqi ; ð4Þ and the RSE (2) turns into: RSE¼X u;i2T ðrui lbu bi puqiÞ 2 þk b2 u þb2 i þ kpuk2 þ kqik2 ; ð5Þ where the linear biases bu and bi are trained along with the latent features pu and qi; and the global average l is estimated by the observed average of the training dataset since it captures the statistic characteristic of the known data. The aforementioned RMF based models have been proved to be highly accurate and scalable. However, since they are batch learning methods, they naturally require batch training processes based on static training datasets. The item and user sets are supposed to be stationary, and once the rating data expands to a certain extent, the only way to discover new patterns from the new data is to totally rebuild the recommender. However, in real-world applications, it is natural that the training data are gradually accumulated, whilst users always expect to receive quick feedback once they evaluate a specified item. In such situation, recommenders being able to learn from the newly arriving data incrementally are necessary. Incremental recommendation is an attractive topic in the area of recommender systems, and there are several relevant pioneering works. Papagelis et al. study incremental CF in the context of similarity-based K-Nearest-Neighborhood (KNN) model, and propose an incremental user-oriented KNN based recommender [30]. Miranda et al. further investigate this incremental user-oriented KNN, and propose the incremental item-oriented KNN model [31]. The empirical studies in [31] show that these two models can achieve similar accuracy, while the latter provides better scalability due to the relatively small number of items. Nonetheless, based on the progress in CF area during the Netflix Prize, similarity based KNN models can be easily outperformed by global-optimistic methods like RMF, which calls for designing incremental recommenders based on the state-of-the-art models. Agarwal et al. propose a powerful incremental recommender named FOBFM based on bilinear factors [32], yet their model requires the extra information about users/items such as the demographic data, which are unavailable under many circumstances like in the Neflix Prize. Since MF based recommenders are highly accurate and scalable, it is expectable to implement incremental recommenders based on MF techniques. There are also some pioneering works referring to this issue. Sawar et al. propose the incremental SVD based recommender based on the fold-in technique [33]; Takács et al. [25], and Rendle and Schmidt-Thieme [34], suggest implementing the online update by fixing the current model and training the latent factors of new users/items on arrival of corresponding feedbacks. However, current MF based incremental recommenders can only update the model roughly by including the new items/users but not updating the already trained latent factors. In this work, we aim at designing an MF based incremental recommender, which can not only deal with the new users/items, but also update the already trained features. In the following section, we will discuss our method in detail. 3. Method First of all, we focus on designing an incremental CF recommender based on the original RMF model. From the prediction rule (1), RSE (2) and feature updating rule (3), we see that RMF is an iterative learning method, where during each training epoch the training examples are learned sequentially. Let R(u) denote the rating set of a specified user u, then each training instance in R(u) can be orderly marked as ru,1,ru,2,...,ru,k,...,ru,K (here we only mark the known ratings by user u, so the subscript k denotes the kth item rated by user u, but not the kth item in the rating matrix) where K = jR(u)j. Let pð0Þ u denote the initial value of pu at the beginning of one training epoch, and pð1Þ u denote the value of pu after the training example ru,1 is learned, by applying feature updating rule (3), we get: pð1Þ u ¼ pð0Þ u þ g ru;1 pð0Þ u qðh1Þ 1 qðh1Þ 1 kp0 u ¼ ð1 gkÞpð0Þ u þ gðru;1 pð0Þ u qðh1Þ 1 Þqðh1Þ 1 ; ð6Þ where h1 denotes the current status of q1. Let c ¼ 1 gk; ð7Þ Then Eq. (6) is transformed into: pð1Þ u ¼ cpð0Þ u þ g ru;1 pð0Þ u qðh1Þ 1 qðh1Þ 1 : ð8Þ Similarly, we can infer the situation from ru,2 to ru,K as follows: pð2Þ u ¼ cpð1Þ u þ gðru;2 pð1Þ u qðh2Þ 2 Þqðh2Þ 2 ; pð3Þ u ¼ cpð2Þ u þ gðru;3 pð2Þ u qðh3Þ 3 Þqðh3Þ 3 ; ; pðKÞ u ¼ cpðK1Þ u þ gðru;K pðK1Þ u qðhK Þ K ÞqðhK Þ K : ð9Þ By combining (8) and (9), we obtain the expression of the training result of pu after one training epoch: pðKÞ u ¼ cK pð0Þ u þ cK1g ru;1 pð0Þ u qðh1Þ 1 qðh1Þ 1 þ cK2g ru;2 pð1Þ u qðh2Þ 2 qðh2Þ 2 þþ gðru;K pðK1Þ u qðhK Þ K ÞqðhK Þ K ; ð10Þ where qðhkÞ k denotes the temporal state of qk when the system is going to learn the kth example rated by user u. Similarly, let R(i) denote the rating set on item i and mark each training instance in R(i) as r1,i,r2,- i,...,rh,i,...,rH,i where H = jR(i)j (here we only mark the known ratings on item i, so the subscript h denotes the hth user rating on item i, but not the hth user in the rating matrix), we can then infer the expression of the training result of qi after one training epoch as: qðHÞ i ¼ cHqð0Þ i þ cH1g r1;i pðk1Þ 1 qð0Þ i pðk1Þ 1 þ cH2g r2;i pðk2Þ 2 qð1Þ i pðk2Þ 2 þþ g rH;i pðkHÞ H qðH1Þ i pðkHÞ H : ð11Þ By combining (10) and (11), we locate the task of incremental learning in finding the expression of pðKþ1Þ u and qðHþ1Þ i based on pðKÞ u and qðHÞ i respectively. However, Eqs. (10) and (11) are too complicated for implementing the incremental update, in the following two aspects: – The final outputs are sensitive to the input sequence of the examples. Since c < 1, the impact of the early inputted examples will be gradually weakened as the training process is going on. – The variance of a latent feature learnt from the current example is not only dependent on its initial value, but also on the temporal state of the counter latent factor relevant to the active training sample; e.g., the value of pðKÞ u depends on the value of qðhk Þ k . Note that usually it will take several training epochs to reach the model converge; and the training results of the current epoch would be used as the initial value of the subsequent epoch. So, it would be intractable to trace the variance of the features during the training process based on (10) and (11). X. Luo et al. / Knowledge-Based Systems xxx (2011) xxx–xxx 3 Please cite this article in press as: X. Luo et al., Incremental Collaborative Filtering recommender based on Regularized Matrix Factorization, Knowl. Based Syst. (2011), doi:10.1016/j.knosys.2011.09.006
ARTICLE N PRESS X Luo et aL/ Knowledge-Based Systems xxx(2011)xxxXx Nonetheless, the complexity of the training process is mainly where Hk denotes the size of the rating set on the kth item rated by caused by the rmf models dependence on the input sequence of user u. Similarly, we can obtain the training results of qi after each xamples. Thus, we can simplify Eqs.(10) and(11) straightfor- training epoch as follows: wardly by removing this sequence dependence. A common way to realize the sequence-independence is randomizing the ler of the training instances [35]: however, this approach q-cq9+Mn∑(n-pq9)p simplify the training process since it will not alternate the nature of RMF. Thus, we intend to eliminate the sequence depen- (17) dence of RMF model in a different way. +pHll 'n)o(H)n(Kn) 3. 1 Sequence independent RMF le start with rewriting(10)and(11)as where kh denotes the size of the rating set by the hth user who rates item l 'qi)(u)+.+(rux-p(K-Dakh)) For implementing an incremental model, we can construct q=cq)+B, B=c-n(r1i-pk q()p()++n(TH-PkHq)pHa pu and gh+ a depending on the current trained results pl ' and Note that a and b are both expressed in an iterative form, where form, where q in each training epoch, and iteratively repeat this process to fi- the results having been trained so far would be used as the initial nally reach the desired parameters pu t' and value on the successive example. So by assuming that each training examples are learned simultaneously, we can transform the designing the incremental update rule of user feature vector pu by analyzing the training process in de A'=akn(TuI-p1o91 )90+.+dxn(/ux-Peoak k 3.2.1.Constructing pu+)on based on puo) B=Sun(ru-Pa 1 )Po)+.+BH/(THi-Pqi8 )ps Given that the batch training has been completed, which means, for user u, the feature vector p( has been successfully con- where ak and BH, which depend on the sizes of R(u )and r() denoted structed; and the interim training results pi (1<d<K-1)have by k and h respectively, are parameters for averaging the impor ance of each training example. In order to ensure that both A been cached. Then, once a new example related to user u,marked and B maintain a consistent magnitude to that of A and b, we choose ax and SH by averaging the sum of the parameters related K +1, has arrived, by deducing the batch training result of pi+I) to c in(12), formulated by from(16), we can build pu ti)with pl as follows 2KK≈=3/H=1c (1-64p=(+an>(-r8) By combining(12)-(14), we can obtain: p)=Cp+A,A=甲 (rut-poqk )a to p1)=点"9+xn∑(-pq)q +B,B=H(=GE(MH-PIOq1 )Ps “()述(m erefer to the model employing the feature updating rule(15)as the sequence independent RMF(SI-RMF)model. In practice we find =ck+(p 2-cp(o)+ck+p(o)+ox+1 (Tux+1-p( q u)qrl that the removal of sequence sensitivity can make the model slightly Including the newly arriving example. more accurate than the primitive RMe, alongside having much sim- pler mathematic form for incremental update. However, the change of the training rule will also lead to the restriction on the learning Note that pi), plo)and g, are cached interim values: since K is rate-the SI-RMF model can only converge when the value of n is rel- atively small. We will further discuss this issue in Section 4. known, so k+1 and ak can be inferred from(14); c is the constant given in(7). As we marked, Eq(18)can be divided into two parts 3. 2. Incremental rme the first part rebalances the trained parameters, and the second part Based on the feature updating rule(15). for a user feature vector can obtain nk+l) by making incremental update on plf? Ch(18)we includes pu, by denoting the training result after each training epoch wit From(16)we see that the training result of one training epoch (ruk-Po q ko )ato. will be used as the initial value of the subsequent one. So when constructing pik +l)based ve must take the difference be. (b) tween p() and P(+1) into account. Let AP, =P(k+1)-P(,then p=cp+3n∑(rx-p28q") pu +l)can be given by Please cite this article in press as: X. Luo et al. Incremental Collaborative Filtering recommender based on Regularized Matrix Factorization, Knowl. Base yst.(2011).doi:10.1016/ knosys2011.09.006
Nonetheless, the complexity of the training process is mainly caused by the RMF model’s dependence on the input sequence of examples. Thus, we can simplify Eqs. (10) and (11) straightforwardly by removing this sequence dependence. A common way to realize the sequence–independence is randomizing the input order of the training instances [35]; however, this approach cannot simplify the training process since it will not alternate the iterative nature of RMF. Thus, we intend to eliminate the sequence dependence of RMF model in a different way. 3.1. Sequence independent RMF We start with rewriting (10) and (11) as follows: pðKÞ u ¼cK pð0Þ u þA; A¼cK1gðru;1 pð0Þ u qðh1Þ 1 Þqðh1Þ 1 þþgðru;K pðK1Þ u qðhK Þ K ÞqðhK Þ K ; qðHÞ i ¼cHqð0Þ i þB; B¼cH1gðr1;i pðk1Þ 1 qð0Þ i Þpðk1Þ 1 þþgðrH;i pðkH Þ H qðHÞ i ÞpðkH Þ H : ð12Þ Note that A and B are both expressed in an iterative form, where the results having been trained so far would be used as the initial value on the successive example. So by assuming that each training examples are learned simultaneously, we can transform the expressions of A and B into: A0 ¼ aK g ru;1 pð0Þ u qð0Þ 1 qð0Þ 1 þþ aK g ru;K pð0Þ u qð0Þ K qð0Þ K ; B0 ¼ bHg r1;i pð0Þ u qð0Þ 1 pð0Þ u þþ bHg rH;i pð0Þ u qð0Þ H pð0Þ u ; ð13Þ where aK and bH, which depend on the sizes of R(u) and R(i) denoted by K and H respectively, are parameters for averaging the importance of each training example. In order to ensure that both A0 and B0 maintain a consistent magnitude to that of A and B, we choose aK and bH by averaging the sum of the parameters related to c in (12), formulated by: aK ¼ XK1 k¼0 ck , K ¼ 1 cK Kð1 cÞ ; bH ¼ XH1 h¼0 ch , H ¼ 1 cH Hð1 cÞ : ð14Þ By combining (12)–(14), we can obtain: pðKÞ u ¼ cK pð0Þ u þ A0 ; A0 ¼ g 1 cK Kð1 cÞ XK k¼1 ruk pð0Þ u qð0Þ k qð0Þ k ; qðHÞ i ¼ cHqð0Þ i þ B0 ; B0 ¼ g 1 cH Hð1 cÞ XH h¼1 rhi pð0Þ h qð0Þ i pð0Þ h : ð15Þ We refer to themodel employing the feature updating rule (15) as the sequence independent RMF (SI-RMF) model. In practice we find that the removal of sequence sensitivity can make the model slightly more accurate than the primitive RMF, alongside having much simpler mathematic form for incremental update. However, the change of the training rule will also lead to the restriction on the learning rate – the SI-RMF model can only converge when the value of g is relatively small. We will further discuss this issue in Section 4. 3.2. Incremental RMF Based on the feature updating rule (15), for a user feature vector pu, by denoting the training result after each training epoch with pðKÞ u ðdÞ , we can get: pðKÞ u ð1Þ ¼ cK pð0Þ u þ aK g XK k¼1 ru;k pð0Þ u qð0Þ k qð0Þ k ; pðKÞ u ðDÞ ¼ cK pðKÞ u ðD1Þ þaK g XK k¼1 ru;k pðKÞ u ðD1Þ qðHkÞ k ðD1Þ ! qðHkÞ k ðD1Þ : ð16Þ where Hk denotes the size of the rating set on the kth item rated by user u. Similarly, we can obtain the training results of qi after each training epoch as follows: qðHÞ i ð1Þ ¼ cHqð0Þ i þ bHg XH h¼1 rh;i pð0Þ h qð0Þ i pð0Þ h ; qðHÞ i ðDÞ ¼ cH qðHÞ i ðD1Þ þbHg XH h¼1 rh;i pðKhÞ h ðD1Þ qðHÞ i ðD1Þ ! pðKhÞ h ðD1Þ : ð17Þ where Kh denotes the size of the rating set by the hth user who rates item i. For implementing an incremental model, we can construct pðKþ1Þ u ðdÞ and qðHþ1Þ i ðdÞ depending on the current trained results pðKÞ u ðdÞ and qðHÞ i ðdÞ in each training epoch, and iteratively repeat this process to fi- nally reach the desired parameters pðKþ1Þ u ðDÞ and qðHþ1Þ i ðDÞ . We begin by designing the incremental update rule of user feature vector pu by analyzing the training process in detail. 3.2.1. Constructing pðKþ1Þ u ð1Þ based on pðKÞ u ð1Þ Given that the batch training has been completed, which means, for user u, the feature vector pðKÞ u ðDÞ has been successfully constructed; and the interim training results pðKÞ u ðdÞ ð1 6 d 6 K 1Þ have been cached. Then, once a new example related to user u, marked ru,K+1, has arrived, by deducing the batch training result of pðKþ1Þ u ð1Þ from (16), we can build pðKþ1Þ u ð1Þ with pðKÞ u ð1Þ as follows: pðKÞ u ð1Þ ¼cK pð0Þ u þaK g XK k¼1 ru;k pð0Þ u qð0Þ k qð0Þ k ; pðKþ1Þ u ð1Þ ¼cKþ1pð0Þ u þaKþ1g XKþ1 k¼1 ru;k pð0Þ u qð0Þ k qð0Þ k ¼cKþ1pð0Þ u þaKþ1g XK k¼1 ru;k pð0Þ u qð0Þ k qð0Þ k þ ru;Kþ1 pð0Þ u qð0Þ Kþ1 qð0Þ Kþ1 " # ¼aKþ1 aK ðpðKÞ u ð1Þ cK pð0Þ u ÞþcKþ1pð0Þ u |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} Rebalancing the trained parameter: þaKþ1gðru;Kþ1 pð0Þ u qð0Þ Kþ1Þqð0Þ Kþ1 |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} Including the newly arriving example: : ð18Þ Note that pðKÞ u ð1Þ ; pð0Þ u and qð0Þ Kþ1 are cached interim values; since K is known, so ak+1 and ak can be inferred from (14); c is the constant given in (7). As we marked, Eq. (18) can be divided into two parts: the first part rebalances the trained parameters, and the second part includes variance on the newly arriving example. So, with (18) we can obtain pðKþ1Þ u ð1Þ by making incremental update on pðKÞ u ð1Þ . 3.2.2. Constructing pðKþ1Þ u ð2Þ based on pðKÞ u ð2Þ From (16) we see that the training result of one training epoch will be used as the initial value of the subsequent one. So when constructing pðKþ1Þ u ð2Þ based on pðKÞ u ð2Þ , we must take the difference between pðKÞ u ð1Þ and pðKþ1Þ u ð1Þ into account. Let Dpu ð1Þ ¼ pðKþ1Þ u ð1Þ pðKÞ u ð1Þ , then pðKþ1Þ u ð2Þ can be given by: 4 X. Luo et al. / Knowledge-Based Systems xxx (2011) xxx–xxx Please cite this article in press as: X. Luo et al., Incremental Collaborative Filtering recommender based on Regularized Matrix Factorization, Knowl. Based Syst. (2011), doi:10.1016/j.knosys.2011.09.006
ARTICLE N PRESS After the first training epoch, for Pu: P(+1)=K+1 p(o)-<po)+ck+ploy q=)+A∑(n pio ak )a%, rp;p=(-p)+ ) +ak+1x+-p+ )(g)g (21) ke the initial difference for each item k∈R(u,k≠K+1 Eq (19)can be decomposed into three parts: the first part and the cludes the variance on the newly arriving example; moreover, since wq )+r "I( uk -(x+q ")4p. -(Ap. aa) the initial value changes at the very beginning of this training epoch, so we need the third part to tackle the initial difference on Similarly, we can infer the feature updating rule of item i on a newly arriving example rH+1.i as: In the meanwhile, from(17)we see that for each training exam- After the first training epoch, item feature vector q o) will be impacted by the incremental dif- for q q"+=1 9 -")+d#,on on pk+I). Formally, for each item k e R(u),k+K+1. the dif- +H+1(H+1-p+1qO)p ference on gha brought by plato)can be given by After the dth(d>1) training epoch. for qr: q +1 p+1)=HPx1)+2x1∑[tn-p4+)q +H+1(m+1-px)q =cx4pg++xk+∑(rak-(p+△pn)q")q for each user h∈R().h≠H+1 p)=p8) ra-pA)q"+)△ (d+1) (d) (d)/(d) Based on the incremental updating rule for latent features in Including the newly arriving (21)and (22), we design the training process of our Incremental Regularized Matrix Factorization (IRMF) model in Table 1.As ∑(△pq段) shown, irMF divides the training process into two phases, respec- tively are the batch training phase and the incremental updating phase. Tackle the initial difference onp During the batch training phase, the feature training rule is the Eq (20)denotes that for each item kE R(u), k+K+ 1, we can apply same as in the primitive RMF model, whilst IRMF requires cach- the bulk behind BH, to include the incremental difference brought by pik+I)based on the current trained result ing the interim training results after each training epoch for incremental updates. This implies that the storage complexity 3.2.3.Constructing puk(a based on pu a(d>2) During the subsequent training epochs, the interactions be- ween the latent features will become very complicated. After ning epoch, for each user v∈R(k){k∈R(u),k≠K+1} p] would be affected by the incremental difference on ql)in poch, the affect of the incremental; Level I difference will spread to each item j∈R(b)(U∈R(k)(k∈R(u) kK+1). Actually, the broadcast of the affect caused by a newly arriving example can be described by the chain depicted in Fig. 1 @1 Since the 3nd training/E k0术+明 Since our objective is to design an incremental recommender Since t aInIng D∈R(k)(k∈R(u),k=K+1 able to respond quickly to user feedback, we only consider the fea ture interactions at Levels 1 and 2 shown in Fig. 1. Thus, for a new k∈R(u)k≠K+ arriving example Fux+1. the incremental feature updating rule of pu fig 1. The broadcast of the affect by the incremental difference on p caused by a can be summarized as follows Please cite this article in press as: X. Luo et al. Incremental Collaborative Filtering recommender based on Regularized Matrix Factorization, Knowl. Based syst(2011.do:10.016/ j knosys:201109006
qðHk Þ k ð2Þ ¼ cHk qðHk Þ k ð1Þ þbHk g XHk h–u rh;k pðKhÞ h ð1Þ qðHk Þ k ð1Þ !pðKhÞ h ð1Þ þ ru;k pðKÞ u ð1Þ qðHk Þ k ð1Þ !pðKÞ h ð1Þ " # q0ðHk Þ k ð2Þ ¼ cHk qðHk Þ k ð1Þ þbHk g XHk h–u rh;k pðKhÞ h ð1Þ qðHk Þ k ð1Þ !pðKhÞ h ð1Þ þ ru;k pðKþ1Þ u ð1Þ qðHk Þ k ð1Þ !pðKþ1Þ h ð1Þ " # ¼ cHk qðHk Þ k ð1Þ þbHk g XHk h–u rh;k pðKhÞ h ð1Þ qðHk Þ k ð1Þ !pðKhÞ h ð1Þ " þ ru;k pðKÞ u ð1Þ þDpu ð1Þ !qðHk Þ k ð1Þ ! pðKÞ u ð1Þ þDpu ð1Þ !# ¼ qðHk Þ k ð2Þ þbHk g ru;k pðKþ1Þ u ð1Þ qðHk Þ i ð1Þ ÞDp u ð1Þ Dp u ð1Þ qðHk Þ i ð1Þ !pðKÞ u ð1Þ " # |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} Tackle the initial difference onpu: : ð19Þ Eq. (19) can be decomposed into three parts: the first part and the second part respectively rebalances the trained parameter and includes the variance on the newly arriving example; moreover, since the initial value changes at the very beginning of this training epoch, so we need the third part to tackle the initial difference on pu. In the meanwhile, from (17) we see that for each training example ru,k in R(u) except ru,K+1, the training result of the corresponding item feature vector qðHk Þ k ð2Þ will be impacted by the incremental difference on pðKþ1Þ u ð1Þ . Formally, for each item k 2 R(u), k – K + 1, the difference on qðHkÞ k ð2Þ brought by pðKþ1Þ u ð1Þ can be given by: pðKÞ u ð2Þ ¼cK pðKÞ uð1Þ þaK g XK k¼1 ru;k pðKÞ u ð1Þ qðHk Þ k ð1Þ !qðHk Þ k ð1Þ ; pðKþ1Þ u ð2Þ ¼cKþ1 pðKþ1Þ u ð1Þ þaKþ1g XKþ1 k¼1 ru;k pðKþ1Þ u ð1Þ qðHk Þ k ð1Þ !qðHk Þ k ð1Þ ¼cKþ1 pðKþ1Þ u ð1Þ þaKþ1g XK k¼1 ðru;k ðpðKÞ u ð1Þ þDpu ð1Þ ÞqðHk Þ k ð1Þ ÞqðHk Þ k ð1Þ " þ ru;Kþ1 pðKþ1Þ u ð1Þ qðHKþ1Þ Kþ1 ð1Þ !qðHKþ1Þ Kþ1 ð1Þ # ; ¼aKþ1 aK pðKÞ u ð2Þ cK pðKÞ u ð1Þ !þcKþ1pðKþ1Þ u ð1Þ |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} Rebalancing the trained parameter: þaKþ1g ru;Kþ1 pðKþ1Þ u ð1Þ qðHKþ1 Þ Kþ1 ð1Þ !qðHKþ1Þ Kþ1 ð1Þ " |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} Including the newly arriving example: XK k¼1 ðDp u ð1Þ qðHk Þ k ð1Þ ÞqðHk Þ k ð1Þ |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} Tackle the initial difference onpu: 3 7 7 7 7 5 : ð20Þ Eq. (20) denotes that for each item k 2 R(u), k – K + 1, we can apply the bulk behind bHk to include the incremental difference brought by pðKþ1Þ u ð1Þ based on the current trained result. 3.2.3. Constructing pðKþ1Þ u ðdÞ based on pðKÞ u ðdÞ(d > 2) During the subsequent training epochs, the interactions between the latent features will become very complicated. After the 3rd training epoch, for each user v 2 RðkÞfk 2 RðuÞ; k–K þ 1g; pðKv Þ v ð3Þ would be affected by the incremental difference on qðHk Þ k ð2Þ in (20); and after the 4th training epoch, the affect of the incremental difference will spread to each item j 2 R(v){v 2 R(k){k 2 R(u), k – K + 1}}. Actually, the broadcast of the affect caused by a newly arriving example can be described by the chain depicted in Fig. 1. Since our objective is to design an incremental recommender able to respond quickly to user feedback, we only consider the feature interactions at Levels 1 and 2 shown in Fig. 1. Thus, for a newly arriving example ru,K+1, the incremental feature updating rule of pu can be summarized as follows: After the first training epoch, for pu : pðKþ1Þ u ð1Þ ¼ aKþ1 aK pðKÞ u ð1Þ cK pð0Þ u ! þ cKþ1pð0Þ u þ aKþ1g ru;Kþ1 pð0Þ u qð0Þ Kþ1 qð0Þ Kþ1: After the dth (d > 1) training epoch, for pu :pðKþ1Þ u ðdÞ ¼aKþ1 aK pðKÞ u ðdÞ cK pðKÞ u ðd1Þ !þcKþ1 pðKþ1Þ u ðd1Þ þaKþ1g ru;Kþ1 pðKþ1Þ u ðd1Þ qðHKþ1Þ Kþ1 ðd1Þ !qðHKþ1Þ Kþ1 ðd1Þ XK k¼1 Dpu ðd1Þ qðHk Þ k ðd1Þ !qðHk Þ k ðd1Þ " #; ð21Þ for each item k 2 R(u), k – K + 1: q0ðHkÞ k ðdþ1Þ ¼ qðHkÞ k ðdþ1Þ þbHk g ru;k pðKþ1Þ u ðdÞ qðHkÞ i ðdÞ ! Dpu ðdÞ Dpu ðdÞ qðHkÞ i ðdÞ ! pðKÞ u ðdÞ " #: Similarly, we can infer the feature updating rule of item i on a newly arriving example rH+1,i as: After the first training epoch, for qi : qðHþ1Þ i ð1Þ ¼ bHþ1 bH qðHÞ i ð1Þ cHqð0Þ i ! þ cHþ1qð0Þ i þ bHþ1gðrHþ1;i pð0Þ Hþ1qð0Þ i Þpð0Þ h : After the dth (d > 1) training epoch, for qi : qðHþ1Þ i ðdÞ ¼bHþ1 bH qðHÞ i ðdÞ cH qðHÞ i ðd1Þ !þcHþ1 qðHþ1Þ i ðd1Þ þbHþ1g rHþ1;i pðKHþ1Þ Hþ1 ðd1Þ qðHþ1Þ i ðd1Þ !pðKHþ1Þ Hþ1 ðd1Þ XH h¼1 pðKhÞ h ðd1Þ Dqi ðd1Þ !pðKhÞ h ðd1Þ " #; ð22Þ for each user h 2 R(i), h – H + 1: p0ðKhÞ h ðdþ1Þ ¼ pðKhÞ h ðdþ1Þ þaKh g rh;i pðKhÞ h ðdÞ qðHþ1Þ i ðdÞ ! Dqi ðdÞ pðKhÞ h ðdÞ Dqi ðdÞ ! qðHÞ i ðdÞ " #: Based on the incremental updating rule for latent features in (21) and (22), we design the training process of our Incremental Regularized Matrix Factorization (IRMF) model in Table 1. As shown, IRMF divides the training process into two phases, respectively are the batch training phase and the incremental updating phase. – During the batch training phase, the feature training rule is the same as in the primitive RMF model, whilst IRMF requires caching the interim training results after each training epoch for incremental updates. This implies that the storage complexity Fig. 1. The broadcast of the affect by the incremental difference on pu caused by a new example. X. Luo et al. / Knowledge-Based Systems xxx (2011) xxx–xxx 5 Please cite this article in press as: X. Luo et al., Incremental Collaborative Filtering recommender based on Regularized Matrix Factorization, Knowl. Based Syst. (2011), doi:10.1016/j.knosys.2011.09.006