C. DeLong. P Desikan. and J Srivastava A set of concepts C=(Cl, C2, C3,... where each document D; can be la- beled by a subset of concepts from C A set of Web pages, S, generated by the content management system, which have a one-to-one mapping with each document D · Web server usage logs The views of an expert are captured explicitly by building a graph of Web pa connected by concepts that are derived from a Web graph generated by a set perts using a content management system. While our earlier work [25] concent on deriving expert knowledge from the content management system itself, in our current work, we present a more generally-applicable case of dealing with a Web graph, thus removing the dependence on the content management system. The expert views are captured using this graph and the expert opinion of the relative importance of any given Web page is captured using an ExpertRank The navigational importance is captured from the Web graph by the StructureRank. The queries of the learner are captured from the Usage Rank. By defining these three different kinds of rank we are able to capture what the learner is intending to learn and what the expert thinks the learner needs to know n The most popular kind of information infrastructure for representing documents nd their interconnections is a link graph. Such an information infrastructure can be modeled as a graph, G (V, E)-where V is a set of vertices of that represents units of information and E is a set of edges that represents the interaction between them. For pe of this paper, a vertex represents a single Web page and an edge represents a hyperlink between pages or a relation between two pages. As can be seen in Figure 2, two different graphs are constructed: each corresponding to a different type of edge relationship. In order to generate relevance ranks of nodes on these graphs, Googles PageRank [15] is used as a foundation due to its stability and success in the Web search domain. However, it should be noted that the best way to capture an experts advice on the whole set of documents automatically without an expert involved is an ExpertRank: A concept graph is constructed from the Web graph In this graph, a vertex is a single Web page and its edges correspond to its conceptual links with other Web pages. In the prototype recommender system, the concepts of a Web page are sented by individual anchor text words and the information-retaining phras e repre- derived from the anchor text of the Web pages pointing to it. The concepts are repre- be grown out of them. If two Web pages share a concept, but are not already itly-linked, then an implicit link is introduced between them. Two pages are mined to share a concept if the intersection of the set of concepts each represents is not empty. The set of concepts by themselves are determined by the anchor text of the s pointing to them. On this constructed graph, Page Rank is applied to obtain the ortance ranking of these documents thus the rank of a document d is defined as eR(d) d',d∈G Where d is a given Web page and d is a set of all Web pages that point to d, either by an explicit or an implicit link, Np is the number of documents in the Web graph, and a is the dampening factor
82 C. DeLong, P. Desikan, and J. Srivastava • A set of concepts C = {C1, C2, C3,….}, where each document Di can be labeled by a subset of concepts from C • A set of Web pages, S, generated by the content management system, which have a one-to-one mapping with each document D • Web server usage logs The views of an expert are captured explicitly by building a graph of Web pages connected by concepts that are derived from a Web graph generated by a set of experts using a content management system. While our earlier work [25] concentrated on deriving expert knowledge from the content management system itself, in our current work, we present a more generally-applicable case of dealing with a Web graph, thus removing the dependence on the content management system. The expert views are captured using this graph and the expert opinion of the relative importance of any given Web page is captured using an ExpertRank. The navigational importance is captured from the Web graph by the StructureRank. The queries of the learner are captured from the UsageRank. By defining these three different kinds of rank we are able to capture what the learner is intending to learn and what the expert thinks the learner needs to know. The most popular kind of information infrastructure for representing documents and their interconnections is a link graph. Such an information infrastructure can be modeled as a graph, G (V, E) – where V is a set of vertices of that represents units of information and E is a set of edges that represents the interaction between them. For the scope of this paper, a vertex represents a single Web page and an edge represents a hyperlink between pages or a relation between two pages. As can be seen in Figure 2, two different graphs are constructed: each corresponding to a different type of edge relationship. In order to generate relevance ranks of nodes on these graphs, Google’s PageRank [15] is used as a foundation due to its stability and success in the Web search domain. However, it should be noted that the best way to capture an experts advice on the whole set of documents automatically without an expert involved is an open issue of research. ExpertRank: A concept graph is constructed from the Web graph. In this graph, a vertex is a single Web page and its edges correspond to its conceptual links with other Web pages. In the prototype recommender system, the concepts of a Web page are derived from the anchor text of the Web pages pointing to it. The concepts are represented by individual anchor text words and the information-retaining phrases that can be grown out of them. If two Web pages share a concept, but are not already explicitly-linked, then an implicit link is introduced between them. Two pages are determined to share a concept if the intersection of the set of concepts each represents is not empty. The set of concepts by themselves are determined by the anchor text of the pages pointing to them. On this constructed graph, PageRank is applied to obtain the importance ranking of these documents. Thus the rank of a document d is defined as: ( ) ∑′ ∈ ′ ′ = + − ⋅ D d d G OutDeg d ER d N ER d , ( ) ( ) ( ) 1 α α (1) Where d is a given Web page and d’ is a set of all Web pages that point to d, either by an explicit or an implicit link, ND is the number of documents in the Web graph, and α is the dampening factor
USER: User-Sensitive Expert Recommendations 83 USER GraphRank: The graph of the Web site is generated by the content management system. Here, the vertices represent individual Web pages and the edges represent the hyperlinks connecting them. Though the Web pages can be mapped to the document set, the edge set is different primarily due to the difference in purpose and method of creating the edges. The Structure graph contains explicit links that can are created mainly for easy navigation. Applying Page Rank to this graph gives a set of rankings for which Web pages are important, in a very general sense. However, since the edges shared by two linked pages in the Web graph do not take context into account this is not the optimal representation of relationships between documents/pages de fined by experts. This graph contains information about certain Web pages that are important, such as good hubs, but the edge sets are vastly different from that of the concept graph and, hence, will not be reflected in the document rank in the same way The Page Rank of a page, S is computed using the Page Rank metric: PR(s) PR(s’) OutDeg(s) Where s is a given Web page and s'is a set of all Web pages that point to s, either by an explicit or an implicit link, Ns is the number of documents in the Web graph, and a is the dan opening fac
USER: User-Sensitive Expert Recommendations 83 Concept Generation S1 S2 S3 S4 S5 Web Server Logs Offline Online Assoc. Rule generator Usage Confidence Analysis Rules UR Query Concept (Expert) Graph Analysis Web Graph Analysis ER PR USER D1 D2 D3 D4 D5 Explicit Link Implicit Link Concept Generation S1 S2 S3 S4 S5 Web Server Logs Offline Online Assoc. Rule generator Usage Confidence Analysis Rules UR Query Concept (Expert) Graph Analysis Web Graph Analysis ER PR USER D1 D2 D3 D4 D5 Explicit Link Implicit Link Concept Generation S1 S2 S3 S4 S5 Web Server Logs Offline Online Assoc. Rule generator Usage Confidence Analysis Rules UR Query Concept (Expert) Graph Analysis Web Graph Analysis ER PR USER D1 D2 D3 D4 D5 Explicit Link Implicit Link Fig. 2. Technical approach to obtain User Sensitive Expert Rank (USER) GraphRank: The graph of the Web site is generated by the content management system. Here, the vertices represent individual Web pages and the edges represent the hyperlinks connecting them. Though the Web pages can be mapped to the document set, the edge set is different primarily due to the difference in purpose and method of creating the edges. The Structure graph contains explicit links that can are created mainly for easy navigation. Applying PageRank to this graph gives a set of rankings for which Web pages are important, in a very general sense. However, since the edges shared by two linked pages in the Web graph do not take context into account, this is not the optimal representation of relationships between documents/pages defined by experts. This graph contains information about certain Web pages that are important, such as good hubs, but the edge sets are vastly different from that of the concept graph and, hence, will not be reflected in the document rank in the same way. The PageRank of a page, S is computed using the PageRank metric: ( ) ∑′ ∈ ′ ′ = + − ⋅ s s s G OutDeg s PR s N PR s , ( ) ( ) ( ) 1 α α (2) Where s is a given Web page and s’ is a set of all Web pages that point to s, either by an explicit or an implicit link, NS is the number of documents in the Web graph, and α is the dampening factor