Link recommender systems: The suggestions by Cumulative evidence A proach ean-Yves delort a, bernadette bouchon- Meunier a Computer Science Laboratory(LIP6) Paris 6 University 8, rue du Capitaine Scott 75015 Paris, France E-mail: JJean-Yoes. Delort, Bernadette. Bouchon-Meunierl alip6 fr The increasing interest of website designers in customization technologies has stimulated recommender system researches. This paper deals with link recommender systems(LRS). LRS are tools intended to support and assist the users during their navigation on the Internet or on a specific website. lrs differ from object recommender systems(ORS) which are tools intended to make the user buy or access resources. If ORS efficiency depends on the number of times the users have followed the recommendation proposed by the system, LrS can neither be evaluated nor compared, with respect to this only criteria. LRS suffer from the uncertainty of the available information about the user navigation behaviors and preferences. This paper introduces a new LRS intended to take this uncertainty into account in the recommendation process. Then, the issue of the evaluation of lrs is addressed and a some general features of LRS are put forward and discussed. Then, the proposed approach is compared with two other link recommendation algorithms. The proposed approach provides good results when evaluated with respect to ORS criteria and appears to be the best tested approach in the context of link recommendation Keywords: Recommender systems, Links, Uncertainty 1 INTRODUCTION areoftenusedbyretailerssuchasamAzon.com, to suggest future purchases. The purposes of LRS The increasing interest of website designers in are different. LRS aim at supporting or assisting customization technologies has stimulated recom- users when they are browsing on a specific web- mender system researches(RS). Online recom- site or on the Internet. For instance. a lrs coule be designed to help the users not to get lost when menders can be distinguished into two categories: they are browsing in a large and depth website. In Object recommender systems(ORS)and Link rec- this paper, we focus on LRS intended to a support ommender systems(LRS). ORS aim at providing user while they are browsing on a given website suggestions on a set of resources such as products or news. They are intended to motivate or encour The evaluation process of LRs is more complex than the ors one Indeed OrS can be evaluated age users to follow them. Then, a set (or a lis with respect to their ability to predict the next of related objects matching the user's current in- user clicks. This ability can be precisely and easily terests is proposed. Preferences are often, but not computed. Conversely, LRS are harder to compar for instance, to which extent a Rs helps the user ambiguous(i.e. the users intents or preferences can to feel comfortable on the website and not to get lost on it? ser votes in favor of, or against the objects. For In general, the making of user profiles and the instance, if a user buys a product, this is very likely extraction of user's preferences is based on a pre- to mean a user implicit vote in favor of the it. ORs processing step taking as an input the transaction Jean-Yves Delort. All rights reserved
1 Link Recommender Systems: The Suggestions by Cumulative Evidence Approach Jean-Yves Delort a , Bernadette Bouchon-Meunier a a Computer Science Laboratory (LIP6) Paris 6 University 8, rue du Capitaine Scott 75015 Paris, France E-mail: {Jean-Yves.Delort,Bernadette.Bouchon-Meunier}@lip6.fr The increasing interest of website designers in customization technologies has stimulated recommender system researches. This paper deals with link recommender systems (LRS). LRS are tools intended to support and assist the users during their navigation on the Internet or on a specific website. LRS differ from object recommender systems (ORS) which are tools intended to make the user buy or access resources. If ORS efficiency depends on the number of times the users have followed the recommendation proposed by the system, LRS can neither be evaluated, nor compared, with respect to this only criteria. LRS suffer from the uncertainty of the available information about the user navigation behaviors and preferences. This paper introduces a new LRS intended to take this uncertainty into account in the recommendation process. Then, the issue of the evaluation of LRS is addressed and a some general features of LRS are put forward and discussed. Then, the proposed approach is compared with two other link recommendation algorithms. The proposed approach provides good results when evaluated with respect to ORS criteria and appears to be the best tested approach in the context of link recommendation. Keywords: Recommender systems, Links, Uncertainty 1. INTRODUCTION The increasing interest of website designers in customization technologies has stimulated recommender system researches (RS). Online recommenders can be distinguished into two categories: Object recommender systems (ORS) and Link recommender systems (LRS). ORS aim at providing suggestions on a set of resources such as products or news. They are intended to motivate or encourage users to follow them. Then, a set (or a list) of related objects matching the user’s current interests is proposed. Preferences are often, but not necessarily, explicit votes. When behaviors are not ambiguous (i.e. the users intents or preferences can easily be understood) they will be interpreted as user votes in favor of, or against the objects. For instance, if a user buys a product, this is very likely to mean a user implicit vote in favor of the it. ORS are often used by retailers, such as Amazon.com, to suggest future purchases. The purposes of LRS are different. LRS aim at supporting or assisting users when they are browsing on a specific website or on the Internet. For instance, a LRS could be designed to help the users not to get lost when they are browsing in a large and depth website. In this paper, we focus on LRS intended to a support user while they are browsing on a given website. The evaluation process of LRS is more complex than the ORS one. Indeed ORS, can be evaluated with respect to their ability to predict the next user clicks. This ability can be precisely and easily computed. Conversely, LRS are harder to compare: for instance, to which extent a RS helps the user to feel comfortable on the website and not to get lost on it? In general, the making of user profiles and the extraction of user’s preferences is based on a preprocessing step taking as an input the transaction Jean-Yves Delort. All rights reserved
set recorded by the Hypertext Transfer Protocol A frequent assumption with LRS is that a user server. This leads. because of the draw ccess to a page implies an implicit vote in fa- backs of this protocol, to an uncertain represent vor of this page. LRS approach is more general tion of the user 's behavior and of his choices han the object one, and such Rs can be deployed This paper makes the following contributions: on any website because algorithms just need al- First SCE, an original link recommendation algo- ways available data. Mobasher et al. [12 pioneered rithm that handles uncertainty in user sessions is LRS research. They proposed two LRS algorithms. presented. It is based on Dempster-Shafer theory The former uses frequent association rules between of evidence [15. Then, general features of LRS are URLs within the same sessions. while the latter introduced and discussed. One of their advantage clusters similar user sessions. However the algo- is that they can be expressed as numerical values rithms they proposed have not been evaluated nor which can be compared. Finally, the proposed al- any evaluation methodology has been proposed gorithm is compared with two other link recom- Letizia [11] is an agent to assist users when they are browsing on the Internet. The system com- evaluation criteria s well as with the new ones. The bines two different recommendation strategies: In proposed approach is shown to provide good re- one hand it tries to predict the next user clicks and sults when evaluated with respect to ORs criteria on the other hand it takes into account heuristics and appears to be the best tested approach in the inferring user interests from their browsing behav context of link recommendation Breese 3 and Spiekerman [16 have introduced 2. RELATED WORKS alternate evaluation strategies for ORS. Yet, the proposed approaches were still highly connected to the abilities of the rs to make accurate predic Recommender systems tend to be distinguished tions. On the contrary, the issue of the LRS eval- into two classes: content-based RS use the con- uation implies cognitive factors. For instance to tent of the items(or some information related to which extent a rS helps the user to feel comfort them)to suggest links or objects(for instance the able on the website and not to get lost on it textual content or metadata such as annotations) Collaborative RS are trying to find similarities be- At the same time of Rs researches. the out. comes of works on prefetching have highlighted tween the users' behaviors and interests in order the efficiency of user navigation models on a to decide which recommendations to make. Exam- web site. Prefetcher systems try to predict the ples of content-based RS are search engines and examples of collaborative RS are the Alexa! sys- next resources that a visitor will request immedi- tem or the groupLens system [10. Alexa is also ately after. This can be very efficient if the pages an example of LRS: it provides a"What's re- are dynamically made because they can be pre- lated"service that suggests Web pages or web- generated Prefetcher objectives are closed to ORS sites with respect to common points discovered be but surprisingly they have not been used for a rec- tween the user's behavior and the database of all ommendation purpose. In this paper an interest the alexa users' navigation behaviors Another ex into one of the most used and efficient user nav- ample of collaborative RS, but of the Ors kind igation model is taken. The markovian approach is the grouplens project that recommends mes- has been extensively studied as a user navigation model(14,6, 13. It is based on a graph the states of ges in a newsgroup. The Surfen [7] project is a which represent the last k resources accessed by a RS close to Alexa. Each client sends to a server its navigation history. Then a recommendation engine uses the whole user browsing history database to compute recommendations and send them back to link with respect to the k last pages he has seen. the clients. Balabanovic [2 has proposed an ORS The main drawback of this statistical representa ing advantage of the content as well, ning at tak- tion is its extreme memory complexity when k and ased on a collaborative algorithm ai the number of pages increases. However this pa- per will also demonstrated that this approach is efficient when used by an ORs
2 set recorded by the Hypertext Transfer Protocol (HTTP) server. This leads, because of the drawbacks of this protocol, to an uncertain representation of the user’s behavior and of his choices. This paper makes the following contributions: First SCE, an original link recommendation algorithm that handles uncertainty in user sessions is presented. It is based on Dempster-Shafer theory of evidence [15]. Then, general features of LRS are introduced and discussed. One of their advantage is that they can be expressed as numerical values which can be compared. Finally, the proposed algorithm is compared with two other link recommendation algorithms with respect to the ORS evaluation criteria s well as with the new ones. The proposed approach is shown to provide good results when evaluated with respect to ORS criteria and appears to be the best tested approach in the context of link recommendation. 2. RELATED WORKS Recommender systems tend to be distinguished into two classes: content-based RS use the content of the items (or some information related to them) to suggest links or objects (for instance the textual content or metadata such as annotations). Collaborative RS are trying to find similarities between the users’ behaviors and interests in order to decide which recommendations to make. Examples of content-based RS are search engines and examples of collaborative RS are the Alexa1 system or the GroupLens system [10]. Alexa is also an example of LRS: it provides a “What’s related” service that suggests Web pages or websites with respect to common points discovered between the user’s behavior and the database of all the Alexa users’ navigation behaviors. Another example of collaborative RS, but of the ORS kind, is the GroupLens Project that recommends messages in a newsgroup. The Surflen [7] project is a RS close to Alexa. Each client sends to a server its navigation history. Then a recommendation engine uses the whole user browsing history database to compute recommendations and send them back to the clients. Balabanovic [2] has proposed an ORS based on a collaborative algorithm aiming at taking advantage of the content as well. 1http://www.alexa.com A frequent assumption with LRS is that a user access to a page implies an implicit vote in favor of this page. LRS approach is more general than the object one, and such RS can be deployed on any website because algorithms just need always available data. Mobasher et al. [12] pioneered LRS research. They proposed two LRS algorithms. The former uses frequent association rules between URLs within the same sessions, while the latter clusters similar user sessions. However the algorithms they proposed have not been evaluated nor any evaluation methodology has been proposed. Letizia [11] is an agent to assist users when they are browsing on the Internet. The system combines two different recommendation strategies: In one hand it tries to predict the next user clicks and on the other hand it takes into account heuristics inferring user interests from their browsing behavior. Breese [3] and Spiekerman [16] have introduced alternate evaluation strategies for ORS. Yet, the proposed approaches were still highly connected to the abilities of the RS to make accurate predictions. On the contrary, the issue of the LRS evaluation implies cognitive factors. For instance to which extent a RS helps the user to feel comfortable on the website and not to get lost on it? At the same time of RS researches, the outcomes of works on prefetching have highlighted the efficiency of user navigation models on a web site. Prefetcher systems try to predict the next resources that a visitor will request immediately after. This can be very efficient if the pages are dynamically made because they can be pregenerated. Prefetcher objectives are closed to ORS but surprisingly they have not been used for a recommendation purpose. In this paper an interest into one of the most used and efficient user navigation model is taken. The markovian approach has been extensively studied as a user navigation model[14,6,13]. It is based on a graph the states of which represent the last k resources accessed by a user called active windows. Transition matrix expresses the probability that a user will click on a link with respect to the k last pages he has seen. The main drawback of this statistical representation is its extreme memory complexity when k and the number of pages increases. However this paper will also demonstrated that this approach is efficient when used by an ORS
3. UNCERTAIN DATA of the IP address is not sufficient to know exactly if the user is a single person or not. Indeed, if the 3.1. Hypertert Transfer Protocol terminal is used by more than one person, or a set of users access the Internet via a gateway, the One of the reasons of the success and popularity Ip address recorded by the Http server will be of the Internet is probably that the mainly used the same. In the sequel, we will use indistinctly protocol Http lets the users remain anonymous the terms user and client to refer to any terminal As a counterpart of this, if we want to add person- that makes requests to a server and which can be cause available data on the users are scarce. fur- To relieve the load of the server, a request is thermore,ashttpdoesnotmaintainpersistentmadeonlyiftheresourceisneitherintheclient connection, it is neither possible to know what the cache nor in a proxy server or the document needs page the user is currently viewing nor it is possible to be refreshed. Thus, nothing guarantees that to be sure that the user is still viewing a page of some pages have been accessed more than once, the website or that some urls not recorded in the log file With the introduction of client-side stored in- have been accessed. The cache client keeps in mem- formation(the cookies)user profiling has become ory the previously seen pages in order not to have possible. Nevertheless, due to privacy reasons the to require resources that have recently been ac- right of using cookies depends on the user ac- cessed. In the same vein, proxy servers act like a ceptance. Yet, the users are often reluctant to be unique shared client cache among a set of users. tracked while they are browsing on a website. A Provided one of the users accessed the resources great deal of personalization services use the link they will be directly sent to any other users con- structure (the dependency graph), the resources nected to the same proxy and who is willing to and meta-descriptions(if they exist), and above access these same documents. These transaction allthehttpuseraccesslogfileeaChlineofit vill not be reported to the Web server. Thanks to contains the minimum pieces of information for a the local cache and the proxy servers, the number transaction between a client and the server to oc- of requests is tremendously reduced. However this cur. It is described by, at least 1 performance improvement has a cost: it goes with a loss of information since the moves and current the client ip addres positions in the website of the users are then un- the requested URL the protocol version, the transaction status2 Finally, another problem comes from the fact that the users never explicitly vote for or against the resources they access. The starting point is Depending on the Http protocol version ad- thus to choose a way to decide whether he liked ditional information can be sent, such as the re them or not, with respect to his behavior. The gen- FERRER which indicates the previous resource erally accepted hypothesis is to consider that each accessed by a user on his browser before his last time a document is accessed it means an implicit request. The REFERRER is an interesting data vote in favor of it which can be used to improve the retrieving pro- cess of the user accessed pages 4. USER SESSIONS 3. 2. Uncertain data 4. 1. Notations Because of dynamic IP addressing, it is not al- ways possible to recognize a user because he may A resource domain referrers to a set of ccessible by users and is denoted by l. A ses- ferent IP addresses. Besides, the unique knowledge sion is a suite of transactions which have occurred for some time between a client and a server. A A number. for instance 200 for OK 200" and 404 for ransa can be ented by a tuple t Not found404” (ID, URL, DATE,.where t ID is the user id
3 3. UNCERTAIN DATA 3.1. Hypertext Transfer Protocol One of the reasons of the success and popularity of the Internet is probably that the mainly used protocol, HTTP lets the users remain anonymous. As a counterpart of this, if we want to add personalization services to a website, we are bridled because available data on the users are scarce. Furthermore, as HTTP does not maintain persistent connection, it is neither possible to know what the page the user is currently viewing nor it is possible to be sure that the user is still viewing a page of the website. With the introduction of client-side stored information (the cookies) user profiling has become possible. Nevertheless, due to privacy reasons the right of using cookies depends on the user acceptance. Yet, the users are often reluctant to be tracked while they are browsing on a website. A great deal of personalization services use the link structure (the dependency graph), the resources and meta-descriptions (if they exist), and above all, the HTTP user access log file. Each line of it contains the minimum pieces of information for a transaction between a client and the server to occur. It is described by, at least [1]: – the client IP address, – the requested URL, – the protocol version, – the transaction status2 , – the date and time. Depending on the HTTP protocol version additional information can be sent, such as the REFERRER which indicates the previous resource accessed by a user on his browser before his last request. The REFERRER is an interesting data which can be used to improve the retrieving process of the user accessed pages. 3.2. Uncertain data Because of dynamic IP addressing, it is not always possible to recognize a user because he may have accessed the same website by means of different IP addresses. Besides, the unique knowledge 2A number, for instance 200 for ”OK 200” and 404 for ”Not Found 404”. of the IP address is not sufficient to know exactly if the user is a single person or not. Indeed, if the terminal is used by more than one person, or a set of users access the Internet via a gateway, the IP address recorded by the HTTP server will be the same. In the sequel, we will use indistinctly the terms user and client to refer to any terminal that makes requests to a server and which can be labeled with a unique identifier. To relieve the load of the server, a request is made only if the resource is neither in the client cache nor in a proxy server or the document needs to be refreshed. Thus, nothing guarantees that some pages have been accessed more than once, or that some URLs not recorded in the log file have been accessed. The cache client keeps in memory the previously seen pages in order not to have to require resources that have recently been accessed. In the same vein, proxy servers act like a unique shared client cache among a set of users. Provided one of the users accessed the resources, they will be directly sent to any other users connected to the same proxy and who is willing to access these same documents. These transaction will not be reported to the Web server. Thanks to the local cache and the proxy servers, the number of requests is tremendously reduced. However this performance improvement has a cost: it goes with a loss of information since the moves and current positions in the website of the users are then uncertain. Finally, another problem comes from the fact that the users never explicitly vote for or against the resources they access. The starting point is thus to choose a way to decide whether he liked them or not, with respect to his behavior. The generally accepted hypothesis is to consider that each time a document is accessed it means an implicit vote in favor of it. 4. USER SESSIONS 4.1. Notations A resource domain referrers to a set of resources accessible by users and is denoted by U. A session is a suite of transactions which have occurred for some time between a client and a server. A transaction can be represented by a tuple t = (ID,URL, DATE, ...) where t.ID is the user id
t URL is the requested URL and t DATE is the ion. In this situation. the user will be considered transaction time. These three data uniquely iden- to be still present on the site. Then s is called an tify any possible transaction. In the sequel, the active session and the user will be said to be ar letters t and u will be used when dealing with active user transactions and resources respectively. <u> de- notes the set of all possible sequences of consecu- tive transactions ofl. We call length of the session 5. RECOMMENDATION ALGORITHMS s, and we note s the length of the sequence t1,..., tn > The symbol <s> is an abbrevia- We call recommendation function any proce tion of the sequence <t1.URL, .,tm. URL> taking as an input a session s and giving as an 4.2. User sessions DEFINITION 1 A recommendation function on a The concept of user session is often used to re- resource domain l is a multivaluated function fer to the relations between the users information 3→P( needs and their interactions: it represents a set of accessed pages related to the same search activ- 3一→Rec()={u1,…p} ity. Most session boundaries detection algorithms The set Rec(s)is called the recommendation of are based on a time constraint between the access the session s time of the resources within the same session 9, 4 Recommendation processes are often based on Content-based approach have also been proposed a function that associates a weight with each re- Generally, the cohesion between the elements source of u. This weight conveys the degree of like- within a session depends on four relations linking liness that the resource is a good or a bad recom mendation the transactions together 1. all the transactions where request by the DEFINITION 2 Given a domainu, s the set of u. a valuated recommendation 2. all URLs belong to the same resource do- function is a function Recval defined by (3×l4)→→[0,1 3. the transactions occurred in a row and 4. there exists an additional temporal relation (s,u)→ Rectal(s,u) The value RecVal(s, u) is called weight of the Usually the last condition comes down to a max- recommendation of u. A recommendation is a set imum time span between the transactions. It can of pages ofl having the highest Recval value be To-static, if the time span between the first to When a recommender system cannot give a re and the last accessed document t cannot exceed a given number To of seconds ommendation, then it outputs the empty set deFINITION 3 Given a domain l, a recommen- tn DATE≤to.DATE+7o dation function Rec and that a recommendation succeeds if It can also be chosen T1-dynamic which means that the time span between two consecutive transac- Rec(3)≠ tions cannot exceed a given number Ti of seconds vi=2.n,t计+1.DATE≤ ti DATE+n1 6. MANAGING UNCERTAINTY WITH EVIDENCE THEORY Let s =< t1,.,tp> be a user session, sup- pose that a new request tp+l occurres, then the In this section the fundamentals of evidence the. concatenation of sequences of transactions s and ory are outlined. Then, the Suggestion by Cumu <tp+1 >,denoted by s.< tp+1> is still a ses- lative Evidence(SCe) algorithm is introduced
4 t.URL is the requested URL and t.DATE is the transaction time. These three data uniquely identify any possible transaction. In the sequel, the letters t and u will be used when dealing with transactions and resources respectively. < U > denotes the set of all possible sequences of consecutive transactions of U. We call length of the session →−s , and we note |→−s | the length of the sequence < t1, ...,tn >. The symbol < s > is an abbreviation of the sequence < t1.URL, ..,tn.URL >. 4.2. User sessions The concept of user session is often used to refer to the relations between the users’ information needs and their interactions: it represents a set of accessed pages related to the same search activity. Most session boundaries detection algorithms are based on a time constraint between the access time of the resources within the same session [9,4]. Content-based approach have also been proposed [5]. Generally, the cohesion between the elements within a session depends on four relations linking the transactions together: 1. all the transactions where request by the same user, 2. all URLs belong to the same resource domain, 3. the transactions occurred in a row and 4. there exists an additional temporal relation on the transactions. Usually the last condition comes down to a maximum time span between the transactions. It can be τ0-static, if the time span between the first t0 and the last accessed document tn cannot exceed a given number τ0 of seconds: tn.DATE ≤ t0.DATE + τ0 It can also be chosen τ1-dynamic which means that the time span between two consecutive transactions cannot exceed a given number τ1 of seconds: ∀i = 2..n, ti+1.DATE ≤ ti .DATE + τ1 Let →−s =< t1, ..,tp > be a user session, suppose that a new request tp+1 occurres, then the concatenation of sequences of transactions →−s and < tp+1 >, denoted by →−s . < tp+1 > is still a session. In this situation, the user will be considered to be still present on the site. Then →−s is called an active session and the user will be said to be an active user. 5. RECOMMENDATION ALGORITHMS We call recommendation function any process taking as an input a session →−s and giving as an output a subset (possibly empty) of U. DEFINITION 1 A recommendation function on a resource domain U is a multivaluated function: →− S −→ P(U) →−s 7−→ Rec(→−s ) = {u1, .., up} The set Rec(→−s ) is called the recommendation of the session →−s . Recommendation processes are often based on a function that associates a weight with each resource of U. This weight conveys the degree of likeliness that the resource is a good or a bad recommendation. DEFINITION 2 Given a domain U, →− S the set of all the sessions on U, a valuated recommendation function is a function RecV al defined by: RecV al : ( →− S × U) −→ [0, 1] (→−s , u) 7−→ RecV al(→−s , u) The value RecV al(→−s , u) is called weight of the recommendation of u. A recommendation is a set of pages of U having the highest RecV al value. When a recommender system cannot give a recommendation, then it outputs the empty set. DEFINITION 3 Given a domain U, a recommendation function Rec and a session →−s , we will say that a recommendation succeeds if: Rec(→−s ) 6= ∅ 6. MANAGING UNCERTAINTY WITH EVIDENCE THEORY In this section the fundamentals of evidence theory are outlined. Then, the Suggestion by Cumulative Evidence (SCE) algorithm is introduced
6. 1. BELIEF THEORY: BASIC CONCEPTS pages are ranked with respect to their degree of be- lief and a classical technique(support pruned cri- Evidence theory is a powerful tool to handle terion 6)is used to prune the pages that have low problems with uncertain and imprecise data. It upport in the learning database. In other words, was introduced by Dempster and Shafer [15.a to each pair(A, u), a weight P(A, u)is associated reference set U, called universe of the Discourse hich gives the strength of the following assertion: or equally frame of discernment is introduced. It In a session 3, such that < s >= ul,,un represents a set of mutually exclusive alternatives, if A S<s> then u E< s>. p(A, u conveys for instance all the possible values of an attribute. the strength of the relation characterized by the simultaneous presence of the resources of A and u deFinition 4 Let u be a universe of the Dis within a given session. Thus P(A, u) is the condi course. A function m: 24-[0, 1] is called a basic tional probability of u knowing A. If the confidence probability assignement over l if of the rule A= u is not zero, P(A, u)matches (1)m()=0 with the confidence in the rule A= u. Otherwise P(A, u) is set to zero. During a training phase, the values of p(A, u) are pre-computed for given min- imum support minsup and minimum confidence minconf. To each pair (u,)E(u, P(U))the fol- The amount m(A)is called basic value of the lowing weight is associated probability m(A) associated with the event A.It measures the strength of the belief that A will oc- mn(u)=cmf(→{ua) cur. A focal element of a belief function Bel is any subset A Cu such that m(a)>0 and the core In order to respect condition(2)of definition 4 of Bel is the union set of all its focal elements. To it is necessary to normalize: represent the reasons to believe in A, all the quan- tities m(B) such that b C A must be added to m(a) m(A). This leads to define cu mu(e) DEFINITION 5 Let l be a frame of discernment and m be a basic probability assignement over l Thus, coefficients mu are basic probability assign- A belief function over l is a function Bel ments, The valuated recommendation function as- 0,1 defined by sociated with Sce algorithm is the following belief function: Bel(A)=∑m(B Recce(S,u)= w C<s> 6.2. Suggestions by Cumulative Evidence 0 otherwise To each page a hashtable is associated. A key The proposed method relaxes the consecutively for this hashtable is a frequent set w and its cor- hypothesis of the pages within a session. Thus it responding value mu(u), is recorded if it is differ can take into account all the transactions previ- ent from 0. As the model depends on minimum ously occurred in the session. The proposed ap- support and minimum confidence thresholds proach, called Suggestions by Cumulative evidence easily tunable to reach a reasonable size (SCE)is based on the idea that all previously seen pages and their combinations must play a role in the link recommendation decision process. 7. EVALUATION METRICS After a suitable aggregation of all the evidence suggesting that a resource is connected to oth- Recall, precision and coverage measures are gen- ers, the global information on each resource should erally used to assess the efficiency of ORS 3, 16 creases. Thus, for each admissible resource, a However, these measures do not necessarily char- degree of belief that this page may interest the acterize the intent of LRS. In this section,new user, with regard to his history is computed. Then, criteria intended to represent LRS are introduced
5 6.1. BELIEF THEORY: BASIC CONCEPTS Evidence theory is a powerful tool to handle problems with uncertain and imprecise data. It was introduced by Dempster and Shafer [15]. A reference set U, called universe of the Discourse or equally frame of discernment is introduced. It represents a set of mutually exclusive alternatives, for instance all the possible values of an attribute. DEFINITION 4 Let U be a universe of the Discourse. A function m : 2 U → [0, 1] is called a basic probability assignement over U if: (1) m(∅) = 0 (2) X A⊆U m(A) = 1 The amount m(A) is called basic value of the probability m(A) associated with the event A. It measures the strength of the belief that A will occur. A focal element of a belief function Bel is any subset A ⊂ U such that m(A) > 0 and the core of Bel is the union set of all its focal elements. To represent the reasons to believe in A, all the quantities m(B) such that B ⊂ A must be added to m(A). This leads to define: DEFINITION 5 Let U be a frame of discernment, and m be a basic probability assignement over U. A belief function over U is a function Bel : 2 U → [0, 1] defined by: Bel(A) = X B⊆A m(B) (1) 6.2. Suggestions by Cumulative Evidence The proposed method relaxes the consecutively hypothesis of the pages within a session. Thus it can take into account all the transactions previously occurred in the session. The proposed approach, called Suggestions by Cumulative Evidence (SCE) is based on the idea that all previously seen pages and their combinations must play a role in the link recommendation decision process. After a suitable aggregation of all the evidence suggesting that a resource is connected to others, the global information on each resource should increases. Thus, for each admissible resource, a degree of belief that this page may interest the user, with regard to his history is computed. Then, pages are ranked with respect to their degree of belief and a classical technique (support pruned criterion [6]) is used to prune the pages that have low support in the learning database. In other words, to each pair (A, u), a weight p(A, u) is associated which gives the strength of the following assertion: ”In a session →−s , such that < s >=< u1, .., un >, if A ⊆< s > then u ∈< s >”. p(A, u) conveys the strength of the relation characterized by the simultaneous presence of the resources of A and u within a given session. Thus p(A, u) is the conditional probability of u knowing A. If the confidence of the rule A ⇒ u is not zero, p(A, u) matches with the confidence in the rule A ⇒ u. Otherwise, p(A, u) is set to zero. During a training phase, the values of p(A, u) are pre-computed for given minimum support minsup and minimum confidence minconf. To each pair (u, w) ∈ (U,P(U)) the following weight is associated: m0 u (w) = conf(w ⇒ {u}) In order to respect condition (2) of definition 4 it is necessary to normalize: mu(w) = m0 u (w) P z⊆U m0 u (z) Thus, coefficients mu are basic probability assignments. The valuated recommendation function associated with SCE algorithm is the following belief function: RecSCE(→−s , u) = X w⊆<s> mu(w) = 0 otherwise To each page a hashtable is associated. A key for this hashtable is a frequent set w and its corresponding value mu(w), is recorded if it is different from 0. As the model depends on minimum support and minimum confidence thresholds, it is easily tunable to reach a reasonable size. 7. EVALUATION METRICS Recall, precision and coverage measures are generally used to assess the efficiency of ORS [3,16]. However, these measures do not necessarily characterize the intent of LRS. In this section, new criteria intended to represent LRS are introduced