Information Sciences 181(2011)1552-1572 Contents lists available at Science Direct Information sciences ELSEVIER journalhomepagewww.elsevier.com/locate/ins Personalized recommendation of popular blog article for mobile applications Duen-Ren liu. Pei-Yun tsai. Po-Huan Chiu Institute of Information Management, National Chiao Tung University, Hsinchu, Taiwan ARTICLE INFO A BSTRACT Article history: logs have emerged as a new communication and publication medium on the Internet eived 5 october 2009 for diffusing the latest useful information. Providing value-added mobile services, such as eived in revised form 22 October 2010 blog articles, is increasingly important to attract mobile users to mobile commerce, i Available online 9 January 2011 order to benefit from the proliferation and convenience of using mobile devices to receive information any time and anywhere. However, there are a tremendous number of blog arti- les, and mobile users generally have difficulty in browsing weblogs owing to the limita tions of mobile devices. Accordingly, providing mobile users with blog articles that suit heir particular interests is an important issue. Very little research, however, has focused on this issue Collaborative filtering In this work, we propose a novel Customized Content Service on a mobile device(m-CCS to filter and push blog articles to mobile users. The m-CCS includes a novel forecasting approach to predict the latest popular blog topics based on the trend of time-sensitive pop- ularity of weblogs. Mobile users may, however, have different interests regarding the latest popular blog topics. Thus, the m-CCS further analyzes the mobile users' browsing logs to determine their interests, which are then combined with the latest popular blog topics to derive their preferred blog topics and articles. A novel hybrid approach is proposed to recommend blog articles by integrating personalized popularity of topic clusters, item- based collaborative filtering(CF)and attention degree(click times) of blog articles. The experiment result demonstrates that the m-CCS system can effectively recommend mobile users'desired blog articles with respect to both popularity and personal interests. e 2011 Elsevier Inc. All rights reserved. 1 Introduction Weblogs have emerged as a new communication and publication medium on the Internet for diffusing the latest useful information. Blog articles represent the opinions of the populace and constitute a reaction to current events(e. g, news )on the Internet [13 Accordingly, looking for the latest popular issues discussed by blogs and attracting readers attention is an interesting subject. Moreover, providing value-added mobile services, such as blog articles, is increasingly important to attract mobile users to mobile commerce, in order to benefit from the proliferation and convenience of using mobile devices to receive information anytime and anywhere. There are however, a tremendous number of blog articles, and mobile users generally have difficulty in browsing weblogs owing to the inherent limitations of mobile devices, such as small screens, short usage time and poor input mechanisms. Accordingly, providing mobile users with blog articles that suit their interests nportant issue. Very little research, however, has focused on this issue. onding author. tel:+88635131245;fax:+88635723792. du tw. dliueiim. nctu. edu. tw(D -R Liu). 0020-0255/S-see front matter o 2011 Elsevier Inc. All rights reserved. doi:10.1016ins2011.01.005
Personalized recommendation of popular blog articles for mobile applications Duen-Ren Liu ⇑ , Pei-Yun Tsai, Po-Huan Chiu Institute of Information Management, National Chiao Tung University, Hsinchu, Taiwan article info Article history: Received 5 October 2009 Received in revised form 22 October 2010 Accepted 1 January 2011 Available online 9 January 2011 Keywords: Mobile service Blog recommenders Time-sensitive topic Collaborative filtering abstract Weblogs have emerged as a new communication and publication medium on the Internet for diffusing the latest useful information. Providing value-added mobile services, such as blog articles, is increasingly important to attract mobile users to mobile commerce, in order to benefit from the proliferation and convenience of using mobile devices to receive information any time and anywhere. However, there are a tremendous number of blog articles, and mobile users generally have difficulty in browsing weblogs owing to the limitations of mobile devices. Accordingly, providing mobile users with blog articles that suit their particular interests is an important issue. Very little research, however, has focused on this issue. In this work, we propose a novel Customized Content Service on a mobile device (m-CCS) to filter and push blog articles to mobile users. The m-CCS includes a novel forecasting approach to predict the latest popular blog topics based on the trend of time-sensitive popularity of weblogs. Mobile users may, however, have different interests regarding the latest popular blog topics. Thus, the m-CCS further analyzes the mobile users’ browsing logs to determine their interests, which are then combined with the latest popular blog topics to derive their preferred blog topics and articles. A novel hybrid approach is proposed to recommend blog articles by integrating personalized popularity of topic clusters, itembased collaborative filtering (CF) and attention degree (click times) of blog articles. The experiment result demonstrates that the m-CCS system can effectively recommend mobile users’ desired blog articles with respect to both popularity and personal interests. 2011 Elsevier Inc. All rights reserved. 1. Introduction Weblogs have emerged as a new communication and publication medium on the Internet for diffusing the latest useful information. Blog articles represent the opinions of the populace and constitute a reaction to current events (e.g., news) on the Internet [13]. Accordingly, looking for the latest popular issues discussed by blogs and attracting readers’ attention is an interesting subject. Moreover, providing value-added mobile services, such as blog articles, is increasingly important to attract mobile users to mobile commerce, in order to benefit from the proliferation and convenience of using mobile devices to receive information anytime and anywhere. There are, however, a tremendous number of blog articles, and mobile users generally have difficulty in browsing weblogs owing to the inherent limitations of mobile devices, such as small screens, short usage time and poor input mechanisms. Accordingly, providing mobile users with blog articles that suit their interests is an important issue. Very little research, however, has focused on this issue. 0020-0255/$ - see front matter 2011 Elsevier Inc. All rights reserved. doi:10.1016/j.ins.2011.01.005 ⇑ Corresponding author. Tel.: +886 3 5131245; fax: +886 3 5723792. E-mail addresses: dliu@mail.nctu.edu.tw, dliu@iim.nctu.edu.tw (D.-R. Liu). Information Sciences 181 (2011) 1552–1572 Contents lists available at ScienceDirect Information Sciences journal homepage: www.elsevier.com/locate/ins
D-R. Liu et aL/ Information Sciences 181(2011)1552-1572 There are three main types of research regarding blogs. The first type of research focuses on analyzing the link structure between blogs to form a community [19, 20]. Through the hyperlinks between blogs, people can communicate across blogs by publishing content related to other blogs. Nakajima et al. [31 proposed a method to identify the important bloggers in the conversations, based on their roles in preceding blog threads, and identify"hot "conversation. The second type of research focuses on content analysis to derive the propagation of topics and trends in the blogsphere. Gruhl et al. [11, 12] modeled the information propagation of topics among blogs based on blog text. With the analysis of tracking topic and user drift, Hayes et al. [13] examined the relationship between blogs over time. Mei et al. [28 proposed a method to discover the distributions and evolution patterns across time and space. Although existing studies have investigated the evolution of blog topics, they have not considered how to predict the degree of popularity of blog topics. The last type of research focuses on how to model the bloggers and derive their interests in order to generate personal recommendations [38.40]. A variety of methods has been proposed to model the bloggers interests and provide recommended content which is similar to their earlier experi ences[15, 24. te The majority of previous studies on blogs have ignored the hot topics and popular articles discussed by mass groups of aders, who engage in browsing actions related to the blog articles. Moreover, existing studies do not consider recommend ng blog articles to mobile readers in mobile environments. with more and more blog articles continually being published on the Internet, the scale and complexity of blog contents are growing rapidly, resulting in information overload for blog read- ers Mobile readers could only browse a very limited number of blog articles because of the restrictions of mobile devices. Accordingly, traditional recommendation methods, such as the collaborative filtering approach [1, 2, 5, 17, 25, 35]. may suffer the sparsity problem of finding similar users or items due to insufficient historical records of browsing blog articles by mo- bile readers. To address the sparsity issue and blog information overload. it is essential to design an appropriate mechanism for recommending blog articles in mobile environments. Blog readers are often interested in browsing emerging and popular blog topics, from which the popularity of blogs can be inferred according to the accumulated click times on blogs. Popularity based solely on click times, however, cannot truly reflect popularity trends. For example, a new event may trigger emerging discussions such that the number of related blog articles and browsing actions is small at the beginning and rapidly increases as time goes on. Thus, it is important to analyze the trend of time-sensitive popularity of blogs to predict emerging hot blog topics. In addition, blog readers may have different interests regarding the emerging popular blog topics. Nevertheless, exist- ing researches have not addressed such issues of how to predict the popularity trend of blog topics and personalized popular More specifically, several studies have been proposed to model the blogger's interest and provide personal recommenda tions [15, 24, 38, 40. Traditional approaches of recommender systems can also be adopted to recommend blog articles to mo- bile users. However, existing researches have not addressed the issue of recommending personalized popular blog articles which is especially important for mobile environments where mobile users can not freely browse a tremendous amount of blog articles on the Internet due to the restriction of mobile devices, and therefore must rely on service providers'recom- endations to browse a small and feasible subset of blog articles. many blog articles are new articles to the system, since hey have not been viewed by any mobile user in the system due to the limitation of mobile devices. Traditional recommen- dation methods may suffer from the new item problem, in which there is no record on new items by which to deriv the prediction [1. It means that most new articles, which are popular on the Internet and to which the masses of Internet users pay attention, may be ignored by conventional recommendation methods. Accordingly, the recommended feasible set of blog articles should contain those articles which are new articles to the system but are popular with Internet users and also suit mobile users' personal interests. Existing recommendation approaches have neither addressed such issues nor con- sidered the popularity degree of blog articles. In this work, we propose a novel Customized Content Service on a mobile device(m-CCS)to recommend personalized and popular blog articles to mobile users. Conventional recommender systems mainly employ the users'behavior logs recorded in the systems to make recommendations. Differing from existing recommender systems, we use an additional data source collected from the Internet, 1. e, the Internet users' click times on blog articles, to identify the popularity degree of blog arti- cles which are integrated with recommendation approach to improve the recommender quality in mobile recommender ervices First, we propose a novel approach to predict the trend of time-sensitive popularity of blog topics. We analyze blog con- tents retrieved by co-RSS to derive topic clusters, i. e, blog topics. We define a topic as a set of significant terms that are clus tered together based on aspects of similarity. By examining the clusters, we can extract the salient features of topics. Moreover, we analyze the click times of Internet readers accessing articles. For each topic cluster, we modified a double exponential smoothing method [6, 7 to predict the popularity degree of the topic according to the variation in trends of click times by Internet readers. Second, mobile users may have different interests regarding the latest popular blog topics. Thus, re further propose a novel approach to infer mobile users' preferred(personalized popular blog topics based on the pre dicted popularity degree of blog topics and mobile users' personal interests, derived by analyzing their browsing logs. Third, a novel hybrid recommendation approach is proposed to recommend blog articles by integrating personalized popularity o topic clusters, item-based collaborative filtering(CF)and attention degree(click times )of blog articles. The major novel ideas are as follows. The hybrid prediction is derived according to the clarity of personal preference derived from collaborative tering, based on the historical behavior of the mobile user. with clear preference, ie. more browsing records of mobile users,the hybrid prediction will be influenced more by user preference prediction based on collaborative filtering. The hy brid prediction is, however, dominated by Internet attention degree of articles for the mobile users who have very few
There are three main types of research regarding blogs. The first type of research focuses on analyzing the link structure between blogs to form a community [19,20]. Through the hyperlinks between blogs, people can communicate across blogs by publishing content related to other blogs. Nakajima et al. [31] proposed a method to identify the important bloggers in the conversations, based on their roles in preceding blog threads, and identify ‘‘hot’’ conversation. The second type of research focuses on content analysis to derive the propagation of topics and trends in the blogsphere. Gruhl et al. [11,12] modeled the information propagation of topics among blogs based on blog text. With the analysis of tracking topic and user drift, Hayes et al. [13] examined the relationship between blogs over time. Mei et al. [28] proposed a method to discover the distributions and evolution patterns across time and space. Although existing studies have investigated the evolution of blog topics, they have not considered how to predict the degree of popularity of blog topics. The last type of research focuses on how to model the bloggers and derive their interests in order to generate personal recommendations [38,40]. A variety of methods has been proposed to model the blogger’s interests and provide recommended content which is similar to their earlier experiences [15,24]. The majority of previous studies on blogs have ignored the hot topics and popular articles discussed by mass groups of readers, who engage in browsing actions related to the blog articles. Moreover, existing studies do not consider recommending blog articles to mobile readers in mobile environments. With more and more blog articles continually being published on the Internet, the scale and complexity of blog contents are growing rapidly, resulting in information overload for blog readers. Mobile readers could only browse a very limited number of blog articles because of the restrictions of mobile devices. Accordingly, traditional recommendation methods, such as the collaborative filtering approach [1,2,5,17,25,35], may suffer the sparsity problem of finding similar users or items due to insufficient historical records of browsing blog articles by mobile readers. To address the sparsity issue and blog information overload, it is essential to design an appropriate mechanism for recommending blog articles in mobile environments. Blog readers are often interested in browsing emerging and popular blog topics, from which the popularity of blogs can be inferred according to the accumulated click times on blogs. Popularity based solely on click times, however, cannot truly reflect popularity trends. For example, a new event may trigger emerging discussions such that the number of related blog articles and browsing actions is small at the beginning and rapidly increases as time goes on. Thus, it is important to analyze the trend of time-sensitive popularity of blogs to predict emerging hot blog topics. In addition, blog readers may have different interests regarding the emerging popular blog topics. Nevertheless, existing researches have not addressed such issues of how to predict the popularity trend of blog topics and personalized popular topics. More specifically, several studies have been proposed to model the blogger’s interest and provide personal recommendations [15,24,38,40]. Traditional approaches of recommender systems can also be adopted to recommend blog articles to mobile users. However, existing researches have not addressed the issue of recommending personalized popular blog articles, which is especially important for mobile environments where mobile users can not freely browse a tremendous amount of blog articles on the Internet due to the restriction of mobile devices, and therefore must rely on service providers’ recommendations to browse a small and feasible subset of blog articles. Many blog articles are new articles to the system, since they have not been viewed by any mobile user in the system due to the limitation of mobile devices. Traditional recommendation methods may suffer from the new item problem, in which there is no rating record on new items by which to derive the prediction [1]. It means that most new articles, which are popular on the Internet and to which the masses of Internet users pay attention, may be ignored by conventional recommendation methods. Accordingly, the recommended feasible set of blog articles should contain those articles which are new articles to the system but are popular with Internet users and also suit mobile users’ personal interests. Existing recommendation approaches have neither addressed such issues nor considered the popularity degree of blog articles. In this work, we propose a novel Customized Content Service on a mobile device (m-CCS) to recommend personalized and popular blog articles to mobile users. Conventional recommender systems mainly employ the users’ behavior logs recorded in the systems to make recommendations. Differing from existing recommender systems, we use an additional data source collected from the Internet, i.e., the Internet users’ click times on blog articles, to identify the popularity degree of blog articles which are integrated with recommendation approach to improve the recommender quality in mobile recommender services. First, we propose a novel approach to predict the trend of time-sensitive popularity of blog topics. We analyze blog contents retrieved by co-RSS to derive topic clusters, i.e., blog topics. We define a topic as a set of significant terms that are clustered together based on aspects of similarity. By examining the clusters, we can extract the salient features of topics. Moreover, we analyze the click times of Internet readers accessing articles. For each topic cluster, we modified a double exponential smoothing method [6,7] to predict the popularity degree of the topic according to the variation in trends of click times by Internet readers. Second, mobile users may have different interests regarding the latest popular blog topics. Thus, we further propose a novel approach to infer mobile users’ preferred (personalized) popular blog topics based on the predicted popularity degree of blog topics and mobile users’ personal interests, derived by analyzing their browsing logs. Third, a novel hybrid recommendation approach is proposed to recommend blog articles by integrating personalized popularity of topic clusters, item-based collaborative filtering (CF) and attention degree (click times) of blog articles. The major novel ideas are as follows. The hybrid prediction is derived according to the clarity of personal preference derived from collaborative filtering, based on the historical behavior of the mobile user. With clear preference, i.e. more browsing records of mobile users, the hybrid prediction will be influenced more by user preference prediction based on collaborative filtering. The hybrid prediction is, however, dominated by Internet attention degree of articles for the mobile users who have very few D.-R. Liu et al. / Information Sciences 181 (2011) 1552–1572 1553
1554 D -R Liu et aL/Information Sciences 181(2011)1552-1572 popup g records with which to infer their preferences. Moreover, hybrid prediction considers the predictive personalized ty degree of the topic cluster to which each article belongs: the more popular the topic of an article is, the more numerous the users who are interested in the article The filtered articles are sent to the individuals mobile device via a WAP Push service. This allows the user to receive per- sonalized and relevant articles, satisfying the demand for instant information. Finally, we conduct on-line experiments to compare different strategies: unified push of articles selected by experts and personalized push of articles selected by the m-CCS systems novel recommendation service. The experiment result shows that our proposed approach considering cus- tomized predictive popularity degree can increase the click rates of blog articles to enhance the quality of recommendation. The proposed m-CCS system can effectively recommend desirable blog articles to mobile users based on popularity and per conal interests The remainder of this paper is organized as follows. Section 2 introduces works related to blogs, forecasting and recom- endations; a brief introduction to our system is given in Section 3: detailed descriptions of the processing module of our system are presented in Sections 4 and 5: Section 6 illustrates how to integrate different modules of our system to develop recommendation methods: the system architecture is illustrated in Section 7: Section 8 presents the evaluation of the use- alness of m-CCS empirically and practically; and the conclusions and suggestions for future work are presented in Section 9 2. Literature review 2.1 Discovering the trend of blog topics Blog content represents the opinions of the populace and reactions to current events(e.g. news)on the Internet [13 With Web 2.0, blogs have become such a powerful force that mainstream media cannot help but take notice[ 9]. Several re- searches focus on analyzing blog content to derive the propagation of topics and trends in the blogsphere. Gruhl et al. [11, 12 modeled the information propagation of blog topics, based on blog texts. The patterns they proposed for topic propagation were useful for predicting sales forecasts. In addition, more and more researches have recently been paying attention to studies on blog content. Blog text analysis focuses on eliciting useful information from blog entry collections, and determin- ng certain trends in the blogosphere. A Natural Language Processing(NLP)algorithm has been used to determine the most important keywords within a definite time period; it can automatically discover trends across blogs [9 Nevertheless, the above mentioned researches emphasize assigning blog articles to only one topic, while blogs, in fact, contain many topic Mei et al. [28] focused on a mixture of subtopics and recognize the spatiotemporal topic patterns within blog documents. They proposed a probabilistic method to model the most salient topics from a text collection, and discover the distributions nd evolution patterns across time and space. To track topic and user drift, Hayes et al. [13] examined the relationship be- tween blogs over time. Some studies have investigated the evolution of blog topics. However, most researches have not con- sidered how to predict the popularity degree of blog topics. In addition, researches mainly analyze the content of blog articles to discover the evolution and trend of blog topics without considering the Internet readers' perspective, i.e, the click times of Internet readers on blog articles. Differing from other studies, we identify blog topics by clustering similar blog arti les into clusters (topics), and then use the accumulated Internet readers' click times of blog articles for generating topic clusters by which to predict the popularity degree of blog topics 2.2. Recommending blog articles Several studies investigated user modeling and personal recommendation in blog space. A variety of methods [38, 40] has been proposed to model bloggers'interest, such as classifying articles into predefined categories to identify the author,'s pref- erence[24], and thereby automatically recommend the blog articles which suit their interest, by analyzing the contents to which bloggers have reacted. Huang et al. [15 proposed an approach to extract terms relevant to users from blog articles, and then recommend blog articles explored by Google's search engine. While bloggers can receive recommended content which is similar to that their earlier experiences, the method ignores the hot topics and popular articles discussed by the bulk of readers which can attract mobile users'interest. These studies mainly examined the interests of bloggers and iden- tified which topics were widely discussed by the bloggers without considering the perspectives of Internet readers. They did not address the issue of how to predict the popularity trend of blog topics. Moreover, existing approaches on recommending blog articles did not investigate the recommendation of popular blog articles by considering the popularity degree of blog topics. Differing from existing studies, we recommend personalized and popular blog articles by considering Internet read- ers' click times on blog articles and the predictive popularity degree of blog topics 2.3. Forecasting Forecasting methods mainly use historical data to infer future de bservation values by time order to construct a suitable model to fo he exponential smoothing method [6 is easy to understand and highl le. this method can also use less data to l erbed hort term predictions. The exponential smoothing method assumes ty and regularity in the trend of time series
browsing records with which to infer their preferences. Moreover, hybrid prediction considers the predictive personalized popularity degree of the topic cluster to which each article belongs; the more popular the topic of an article is, the more numerous the users who are interested in the article. The filtered articles are sent to the individual’s mobile device via a WAP Push service. This allows the user to receive personalized and relevant articles, satisfying the demand for instant information. Finally, we conduct on-line experiments to compare different strategies: unified push of articles selected by experts and personalized push of articles selected by the m-CCS system’s novel recommendation service. The experiment result shows that our proposed approach considering customized predictive popularity degree can increase the click rates of blog articles to enhance the quality of recommendation. The proposed m-CCS system can effectively recommend desirable blog articles to mobile users based on popularity and personal interests. The remainder of this paper is organized as follows. Section 2 introduces works related to blogs, forecasting and recommendations; a brief introduction to our system is given in Section 3; detailed descriptions of the processing module of our system are presented in Sections 4 and 5; Section 6 illustrates how to integrate different modules of our system to develop recommendation methods; the system architecture is illustrated in Section 7; Section 8 presents the evaluation of the usefulness of m-CCS empirically and practically; and the conclusions and suggestions for future work are presented in Section 9. 2. Literature review 2.1. Discovering the trend of blog topics Blog content represents the opinions of the populace and reactions to current events (e.g., news) on the Internet [13]. With Web 2.0, blogs have become such a powerful force that mainstream media cannot help but take notice [9]. Several researches focus on analyzing blog content to derive the propagation of topics and trends in the blogsphere. Gruhl et al. [11,12] modeled the information propagation of blog topics, based on blog texts. The patterns they proposed for topic propagation were useful for predicting sales forecasts. In addition, more and more researches have recently been paying attention to studies on blog content. Blog text analysis focuses on eliciting useful information from blog entry collections, and determining certain trends in the blogosphere. A Natural Language Processing (NLP) algorithm has been used to determine the most important keywords within a definite time period; it can automatically discover trends across blogs [9]. Nevertheless, the above mentioned researches emphasize assigning blog articles to only one topic, while blogs, in fact, contain many topics. Mei et al. [28] focused on a mixture of subtopics and recognize the spatiotemporal topic patterns within blog documents. They proposed a probabilistic method to model the most salient topics from a text collection, and discover the distributions and evolution patterns across time and space. To track topic and user drift, Hayes et al. [13] examined the relationship between blogs over time. Some studies have investigated the evolution of blog topics. However, most researches have not considered how to predict the popularity degree of blog topics. In addition, researches mainly analyze the content of blog articles to discover the evolution and trend of blog topics without considering the Internet readers’ perspective, i.e., the click times of Internet readers on blog articles. Differing from other studies, we identify blog topics by clustering similar blog articles into clusters (topics), and then use the accumulated Internet readers’ click times of blog articles for generating topic clusters by which to predict the popularity degree of blog topics. 2.2. Recommending blog articles Several studies investigated user modeling and personal recommendation in blog space. A variety of methods [38,40] has been proposed to model bloggers’ interest, such as classifying articles into predefined categories to identify the author’s preference [24], and thereby automatically recommend the blog articles which suit their interest, by analyzing the contents to which bloggers have reacted. Huang et al. [15] proposed an approach to extract terms relevant to users from blog articles, and then recommend blog articles explored by Google’s search engine. While bloggers can receive recommended content which is similar to that their earlier experiences, the method ignores the hot topics and popular articles discussed by the bulk of readers which can attract mobile users’ interest. These studies mainly examined the interests of bloggers and identified which topics were widely discussed by the bloggers without considering the perspectives of Internet readers. They did not address the issue of how to predict the popularity trend of blog topics. Moreover, existing approaches on recommending blog articles did not investigate the recommendation of popular blog articles by considering the popularity degree of blog topics. Differing from existing studies, we recommend personalized and popular blog articles by considering Internet readers’ click times on blog articles and the predictive popularity degree of blog topics. 2.3. Forecasting Forecasting methods mainly use historical data to infer future development trends. Time series prediction uses a set of observation values by time order to construct a suitable model to forecast future trends. Within the variety of methods, the exponential smoothing method [6] is easy to understand and highly reliable; this method can also use less data to make short term predictions. The exponential smoothing method assumes stability and regularity in the trend of time series. 1554 D.-R. Liu et al. / Information Sciences 181 (2011) 1552–1572
D-R. Liu et aL/ Information Sciences 181(2011)1552-1572 A standard exponential smoothing method 30] assigns exponentially decreasing weights to previous observations. In other words, recent observations are given relatively more weight in forecasting than are the older observations. The exponential moothing method has been widely used in short term or medium term economic development trend forecasting In the sim- ple exponential smoothing method, the current prediction value is derived from the prediction value and actual value of the preceding time period. Simple exponential smoothing is suitable for stationary time series which do not exhibit trend effect. The double exponential smoothing approach is usually used to process the time series data with trend effect, and is pre- dicted using Eq (1)[7. For preceding time series, x(t) is the actual value at time t, and x(t)is the prediction value at time t: and b(t)represents the trend effect at time t. To forecast the current value for time t+1, x(t +1)is the average value be- tween two parameters, x(t)and (t)+b(t)], weighted by a which is a smoothing constant. Therefore, the difference of soothing constant would determine which parameter has greater influence in affecting the prediction value. Learning from the formula, each prediction value is weighted from the series value within the past period. The more recent the historical data, the greater the weight of the prediction R(+1)=mx()+(1-)(t)+b(t) b(t)=(t)-X(t-1)+(1-Bb(t-1) The trend effect at time t, b(t) is calculated as Eq (2). The value B is used to weight the difference between two prediction values: x(t) and x(t-1), belonging to adjacent days and the preceding trend effect b(t-1). For the double exponential is to make exponential smoothing method to predict the popularity degree of the topic according to the variation in trends of click times by Internet readers. 4. Re The recommender system is widely used to provide suitable personalized information to users according to their needs and preferences[1-3, 17, 18, 22, 29, 35 ]. The recommender system has been applied in many different areas [36], such as prod ucts [8, 23]. movies [32]. books [10] and music [37], and not only offers personalized recommendation service for each cus- tomer, but also benefits business marketing strategies. Generally, the recommender system mainly includes content-based filtering and collaborative filtering. 6, The content-based filtering(CBF)approach analyzes customers' preferences regarding the items attribute features to ld up a personal feature profile, and then predict which items the customer will like[ 14, 41 In other words, this approach recommends items with similar attribute features to the customer profiles according to their past preferences; it is more likely to be used for document webpage and news article recommendations. However, this method still has some restrictions which need to be improved; it is not easy to analyze the features of items, and users can only receive recommended items [21] The collaborative filtering(CF) approach is one of the most popular recommending approaches, and it has been success fully applied in many areas 4, 32. This method can solve some problems of content-based method mentioned before. There is no need to analyze the contents of an item; the recommended items are identified for target users solely based on the imilarities to the historical profiles of other users. Furthermore, it can deal with items with content dissimilar to those in the past Based on the relationship between items or users, the CF method can be classified into two types [35 user-based CF and item-based CF. User-based CF calculates the similarity between users, and predicts the target user's preference regarding dif- ferent items: GroupLens is an example of such a system [32]. The CF approach involves two steps: neighborhood formati and prediction. The neighborhood of a target user is selected according to his her similarity to other users, and is computed by Pearson correlation coefficient or the cosine measure. Either the k-NN (nearest neighbor) approach or a threshold-based approach is used to choose k users who are most similar to the target user With the numbers of users and items exploding, determining how to quickly produce high quality recommendations and search a large amount of potential neighbors in real time are important issues, especially for commercial systems. The item- based CF method has been proposed to identify the relationships between different items that users had already rated and then ranking recommended items each user has not viewed before; this method has already been applied on the amazon tform [10, achieving good performance. The item-based collaborative filtering(ICF)algorithm [34] first analyzes the relationships between items(e.g, docu- ments ), rather than the relationships between users. Then, the item relationships are used to compute recommendations for users indirectly, by finding items that are similar to other items which the user has previously accessed. Thus, the pre- diction for item j for user u is calculated by the weighted sum of the ratings given by the user for items similar to j and weighted by item similarity, as shown in Eq ( 3: Puj=2i1 wG. i)x Tu ∑1w(
A standard exponential smoothing method [30] assigns exponentially decreasing weights to previous observations. In other words, recent observations are given relatively more weight in forecasting than are the older observations. The exponential smoothing method has been widely used in short term or medium term economic development trend forecasting. In the simple exponential smoothing method, the current prediction value is derived from the prediction value and actual value of the preceding time period. Simple exponential smoothing is suitable for stationary time series which do not exhibit trend effect. The double exponential smoothing approach is usually used to process the time series data with trend effect, and is predicted using Eq. (1) [7]. For preceding time series, x(t) is the actual value at time t, and ^xðtÞ is the prediction value at time t; and b(t) represents the trend effect at time t. To forecast the current value for time t þ 1; ^xðt þ 1Þ is the average value between two parameters, x(t) and ½^xðtÞ þ bðtÞ, weighted by a which is a smoothing constant. Therefore, the difference of smoothing constant would determine which parameter has greater influence in affecting the prediction value. Learning from the formula, each prediction value is weighted from the series value within the past period. The more recent the historical data, the greater the weight of the prediction: ^xðt þ 1Þ ¼ axðtÞþð1 aÞ½^xðtÞ þ bðtÞ; ð1Þ bðtÞ ¼ b½^xðtÞ ^xðt 1Þ þ ð1 bÞbðt 1Þ: ð2Þ The trend effect at time t, b(t) is calculated as Eq. (2). The value b is used to weight the difference between two prediction values: ^xðtÞ and ^xðt 1Þ, belonging to adjacent days and the preceding trend effect, b(t 1). For the double exponential smoothing method, the value of ^xðtÞ and b(1) have to be assigned in the initial stage. The simplest way is to make an assumption for ^xð2Þ ¼ xð1Þ and b(1) = 0. Some research has also suggested that the selection of the initial value is not important toward the stationary [7], since it does not have a significant effect on the prediction result. In this work, we modified a double exponential smoothing method to predict the popularity degree of the topic according to the variation in trends of click times by Internet readers. 2.4. Recommendation approaches The recommender system is widely used to provide suitable personalized information to users according to their needs and preferences [1–3,17,18,22,29,35]. The recommender system has been applied in many different areas [36], such as products [8,23], movies [32], books [10] and music [37], and not only offers personalized recommendation service for each customer, but also benefits business marketing strategies. Generally, the recommender system mainly includes content-based filtering and collaborative filtering. The content-based filtering (CBF) approach analyzes customers’ preferences regarding the item’s attribute features to build up a personal feature profile, and then predict which items the customer will like [14,41]. In other words, this approach recommends items with similar attribute features to the customer profiles according to their past preferences; it is more likely to be used for document webpage and news article recommendations. However, this method still has some restrictions which need to be improved; it is not easy to analyze the features of items, and users can only receive recommended items which are similar to past ones [21]. The collaborative filtering (CF) approach is one of the most popular recommending approaches, and it has been successfully applied in many areas [4,32]. This method can solve some problems of content-based method mentioned before. There is no need to analyze the contents of an item; the recommended items are identified for target users solely based on the similarities to the historical profiles of other users. Furthermore, it can deal with items with content dissimilar to those in the past. Based on the relationship between items or users, the CF method can be classified into two types [35]: user-based CF and item-based CF. User-based CF calculates the similarity between users, and predicts the target user’s preference regarding different items; GroupLens is an example of such a system [32]. The CF approach involves two steps: neighborhood formation and prediction. The neighborhood of a target user is selected according to his/her similarity to other users, and is computed by Pearson correlation coefficient or the cosine measure. Either the k-NN (nearest neighbor) approach or a threshold-based approach is used to choose k users who are most similar to the target user. With the numbers of users and items exploding, determining how to quickly produce high quality recommendations and search a large amount of potential neighbors in real time are important issues, especially for commercial systems. The itembased CF method has been proposed to identify the relationships between different items that users had already rated and then ranking recommended items each user has not viewed before; this method has already been applied on the Amazon platform [10], achieving good performance. The item-based collaborative filtering (ICF) algorithm [34] first analyzes the relationships between items (e.g., documents), rather than the relationships between users. Then, the item relationships are used to compute recommendations for users indirectly, by finding items that are similar to other items which the user has previously accessed. Thus, the prediction for item j for user u is calculated by the weighted sum of the ratings given by the user for items similar to j and weighted by item similarity, as shown in Eq. (3): pu;j ¼ Pn Pi¼1wðj; iÞ ru;i n i¼1jwðj; iÞj ; ð3Þ D.-R. Liu et al. / Information Sciences 181 (2011) 1552–1572 1555
1556 D-R. Liu et aL/ Information Sciences 181(2011 )1552-1572 Internet Participant role Publish Browse Click times Content Subsc by co Collect data occording to coRSS Customized content service Mobile users (m-CCS) (target customer ime-sensitive popularity tracking (TPT) Theme cluster Personal favorite analysis(PFA) Extract cu Integrated process behavio recommendation Fig. 1. System overview for m-CCS. where Puy represents the predicted rating of item j for user u: wG. i)is the similarity between two items j and i; and ru, i de- notes the rating of user u for item 1. a number of methods can be used to determine the similarity between items e.g., cosin based similarity, correlation-based similarity, and adjusted cosine similarity methods. Since the adjusted cosine similarity method performs better than the others [34], we used it as the similarity measure for the ICF method. The adjusted cosine similarity between two items i and j is given by Eq. (4): AdjSim(ij) ∑veu(rui-u)(ruj-f) where ru ruy is the rating of item ilj given by user u; and Tu is the average item rating of user The CBF method is limited in being unable to provide serendipitous recommendations since the recommendation is based solely on the content features of items that the user has preferred. The success of collaborative filtering relies on the avail- ability of a sufficiently large set of quality preference ratings provided by users. Accordingly, finding users with similar pref erences is difficult if the user rating matrix is very sparse( few preference ratings), causing the sparsity problem for the CI method. In addition, the CF method may suffer from the new item problem, in which there is no rating record on new items by which to derive the prediction 1 3. System process overview We propose a novel value-added mobile service, namely Customized Content Service on mobiles(m-CCS), to provide cus- tomized blog articles for mobile users based on the time-sensitive popularity of topics and personal preference patterns, as shown in Fig. 1 The first step of our system is to collect blog articles from the Internet. The rss mechanism is a useful way to capture the latest articles automatically without visiting each site. RSS is an abbreviation for Really Simple Syndication, which is an XML document to aggregate information from multiple web sources. Any mobile user can subscribe to RSS feeds. However, there lay be a shortage of information caused by insufficient RSS feeds subscribed to individuals. Thus, we propose a co-RSS meth- od to solve this problem. The co-RSS method gathers all RSS feeds from users such that RSS flocks, called crows-RSS, are formed to enrich information sources. After this preliminary procedure, the system can automatically collect desirable con- tents from diverse resources. Moreover, we use information retrieval technology (e.g. tf-idf approach)[33 to pre-proces articles which are trawled every day from blog websites according to crows-RSS feeds After extracting the features(term vectors)of blog articles, the time-sensitive popularity tracking(TPT) module groups articles into topic clusters and automat ically predicts their trend of popularity. The details of the tPt module are presented in Section 4
where pu,j represents the predicted rating of item j for user u; w(j,i) is the similarity between two items j and i; and ru,i denotes the rating of user u for item i. A number of methods can be used to determine the similarity between items e.g., cosinebased similarity, correlation-based similarity, and adjusted cosine similarity methods. Since the adjusted cosine similarity method performs better than the others [34], we used it as the similarity measure for the ICF method. The adjusted cosine similarity between two items i and j is given by Eq. (4): AdjSimði; jÞ ¼ P u2Uðru;i ruÞðru;j ruÞ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P u2Uðru;i ruÞ 2 q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P u2Uðru;j ruÞ 2 q ; ð4Þ where ru,i/ ru,j is the rating of item i/j given by user u; and ru is the average item rating of user u. The CBF method is limited in being unable to provide serendipitous recommendations since the recommendation is based solely on the content features of items that the user has preferred. The success of collaborative filtering relies on the availability of a sufficiently large set of quality preference ratings provided by users. Accordingly, finding users with similar preferences is difficult if the user rating matrix is very sparse (few preference ratings), causing the sparsity problem for the CF method. In addition, the CF method may suffer from the new item problem, in which there is no rating record on new items by which to derive the prediction [1]. 3. System process overview We propose a novel value-added mobile service, namely Customized Content Service on mobiles (m-CCS), to provide customized blog articles for mobile users based on the time-sensitive popularity of topics and personal preference patterns, as shown in Fig. 1. The first step of our system is to collect blog articles from the Internet. The RSS mechanism is a useful way to capture the latest articles automatically without visiting each site. RSS is an abbreviation for Really Simple Syndication, which is an XML document to aggregate information from multiple web sources. Any mobile user can subscribe to RSS feeds. However, there may be a shortage of information caused by insufficient RSS feeds subscribed to individuals. Thus, we propose a co-RSS method to solve this problem. The co-RSS method gathers all RSS feeds from users such that RSS flocks, called crows-RSS, are formed to enrich information sources. After this preliminary procedure, the system can automatically collect desirable contents from diverse resources. Moreover, we use information retrieval technology (e.g. tf-idf approach) [33] to pre-process articles which are trawled every day from blog websites according to crows-RSS feeds. After extracting the features (term vectors) of blog articles, the time-sensitive popularity tracking (TPT) module groups articles into topic clusters and automatically predicts their trend of popularity. The details of the TPT module are presented in Section 4. Fig. 1. System overview for m-CCS. 1556 D.-R. Liu et al. / Information Sciences 181 (2011) 1552–1572