Expert Systems with Applications 36(2009)12520-12528 Contents lists available at Science Direct Expert Systems with Applications ELSEVIER journalhomepagewww.elsevier.com/locate/eswa A multi-disciplinar recommender system to advice research resources in University Digital Libraries C Porcel ,J.M. Moreno, E. Herrera-Viedma University of jaen, Department of Computer Science, Jaen, Spain University of Murcia, Department of Computer Science, Murcia, Spain University of granada, Department of Computer Science and AL, Granada, Spain ARTICLE INFO ABSTRACT Web is recommender systen other media, as for example, newspapers, journal d libraries. In this paper, we analyze the log- Fuzzy linguistic modeling ical extensions of traditional libraries in the Information Society. In Information Society people want to University Digital Libraries communicate and collaborate. So, libraries must develop services for connecting people together in infor- nation environments. Then, the library staff need automatic techniques to facilitate so that a great num ber of users can access to a great number of resources. Recommender systems are tools whose objective is to evaluate and filter the great amount of information available on the Web to assist the in their information access processes. We present a model of a fuzzy linguistic recommender system to he the University Digital Libraries users to access for their research resources. This system recommend researchers specialized and complementary resources in order to discover collaboration possibilities to form multi-disciplinar groups In this way. this system increases social collaboration possibilities in a uni versity framework and contributes to improve the services provided by a University Digital Library. e 2009 Elsevier Ltd. All rights reserved. 1 Introduction or areas of interest. The library staff searches, evaluates, selects, catalogues, classifies, preserves and schedules the digital docu- In the last years the new concept of digital library is growing. ments access(Goncalves et al., 2004). Some of the main digital li- Digital libraries are information collections that have associated braries functions are the following: services delivered to user communities using a variety of technol- gies. The information collections can be scientific, business or per- To evaluate and select digital materials to add in its repository. video, or other media. This information can be displayer audio sonal data, and can be represented as digital text, image, audio, of the ma To describe and index the new digital materials (catalogue and digitalized paper or born digital material and the services offered classify). on such information can be varied and can be offered to individuals to deliver users the material stored in the libra or user communities(Callan et al., 2003: Goncalves, Fox, Watson, Other managerial tasks Kipp, 2004: Renda Straccia, 2005) Digital libraries are the logical extensions of physical libraries in Libraries offer different types of references and referral services the electronic information society. These extensions amplify exist -(e.g, ready reference, exhaustive search, and selective dissemina- ing resources and services. As such, digital libraries offer new levels tion of information), instructional services (e. g, bibliographi of access to broader audiences of users and new opportunities for instruction and database searching ) added value services(e. g, bib- the library. In practice, a digital library makes its contents and ser- liography preparation, and language translation) and promotional vices remotely accessible through networks such as the Web or services(e.g, literacy and freedom of expression ). As digital li- limited-access intranets(Marchionini, 2009). braries become commonplace and as their contents and services The digital libraries are composed of human resources(staff) become more varied, the users expect more sophisticated services that take over handle and enable the users to access the documents from their digital libraries (Callan et al, 2003: Goncalves et al. that are more interesting for them, taking into account their needs 2004: Renda Straccia, 2005). A service that is particularly important is the selective dissem ination of information or filtering(Morales del Castillo, Pedraza- E-mail addresses: cporcele (C Porce)). jmmorenoeumes (M. Moreno). Jimenez, Ruiz, Peis, Herrera-Viedma, 2009: Morales del Castillo viedma@desai. ugres(E. Herrera-Viedma) Peis, Moreno, Herrera-Viedma, in press). Users develop profiles 0957-4174/- see front matter o 2009 Elsevier Ltd. All rights reserved. do:101016eswa200904.038
A multi-disciplinar recommender system to advice research resources in University Digital Libraries C. Porcel a,*, J.M. Moreno b , E. Herrera-Viedma c aUniversity of Jaen, Department of Computer Science, Jaen, Spain bUniversity of Murcia, Department of Computer Science, Murcia, Spain cUniversity of Granada, Department of Computer Science and A.I., Granada, Spain article info Keywords: Recommender systems Fuzzy linguistic modeling University Digital Libraries abstract The Web is one of the most important information media and it is influencing in the development of other media, as for example, newspapers, journals, books, and libraries. In this paper, we analyze the logical extensions of traditional libraries in the Information Society. In Information Society people want to communicate and collaborate. So, libraries must develop services for connecting people together in information environments. Then, the library staff need automatic techniques to facilitate so that a great number of users can access to a great number of resources. Recommender systems are tools whose objective is to evaluate and filter the great amount of information available on the Web to assist the users in their information access processes. We present a model of a fuzzy linguistic recommender system to help the University Digital Libraries users to access for their research resources. This system recommends researchers specialized and complementary resources in order to discover collaboration possibilities to form multi-disciplinar groups. In this way, this system increases social collaboration possibilities in a university framework and contributes to improve the services provided by a University Digital Library. 2009 Elsevier Ltd. All rights reserved. 1. Introduction In the last years the new concept of digital library is growing. Digital libraries are information collections that have associated services delivered to user communities using a variety of technologies. The information collections can be scientific, business or personal data, and can be represented as digital text, image, audio, video, or other media. This information can be displayed on the digitalized paper or born digital material and the services offered on such information can be varied and can be offered to individuals or user communities (Callan et al., 2003; Gonçalves, Fox, Watson, & Kipp, 2004; Renda & Straccia, 2005). Digital libraries are the logical extensions of physical libraries in the electronic information society. These extensions amplify existing resources and services. As such, digital libraries offer new levels of access to broader audiences of users and new opportunities for the library. In practice, a digital library makes its contents and services remotely accessible through networks such as the Web or limited-access intranets (Marchionini, 2009). The digital libraries are composed of human resources (staff) that take over handle and enable the users to access the documents that are more interesting for them, taking into account their needs or areas of interest. The library staff searches, evaluates, selects, catalogues, classifies, preserves and schedules the digital documents access (Gonçalves et al., 2004). Some of the main digital libraries functions are the following: To evaluate and select digital materials to add in its repository. To preserve the security and conservation of the materials. To describe and index the new digital materials (catalogue and classify). To deliver users the material stored in the library. Other managerial tasks. Libraries offer different types of references and referral services (e.g., ready reference, exhaustive search, and selective dissemination of information), instructional services (e.g., bibliographic instruction and database searching), added value services (e.g., bibliography preparation, and language translation) and promotional services (e.g., literacy and freedom of expression). As digital libraries become commonplace and as their contents and services become more varied, the users expect more sophisticated services from their digital libraries (Callan et al., 2003; Gonçalves et al., 2004; Renda & Straccia, 2005). A service that is particularly important is the selective dissemination of information or filtering (Morales del Castillo, PedrazaJiménez, Ruíz, Peis, & Herrera-Viedma, 2009; Morales del Castillo, Peis, Moreno, & Herrera-Viedma, in press). Users develop profiles 0957-4174/$ - see front matter 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2009.04.038 * Corresponding author. E-mail addresses: cporcel@ujaen.es (C. Porcel), jmmoreno@um.es (J.M. Moreno), viedma@decsai.ugr.es (E. Herrera-Viedma). Expert Systems with Applications 36 (2009) 12520–12528 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa
C Porcel et aL /Expert Systems with Applications 36(2009)12520-12528 12521 that reveals their areas of interest and as new materials are added A variety of techniques have been proposed as the basis for to the collection, they are compared to the profiles and relevant ommender systems. We can distinguish four different classes of items are sent to the users(Marchionini, 2009). ecommendation techniques based on the source of knowledge One interesting extension of this concept is to use the connec -(Burke, 2007: Hanani et al., 2001; Reisnick Varian, 1997): tivity inherent in digital libraries to support collaborative filtering. here users rate or add value to information objects and these rat Content-based systems: They generate the recommendations ings are shared with a large community, so that popular items can king into account the terms used in the items representation be easily located or people can search for objects found useful by and the ratings that a user has given to them(Basu, Hirsh, others with similar profiles(hanani, Shapira, Shoval, 2001: Mar- Cohen, 1998: Claypool, Gokhale, Miranda, 1999). These rec- hionini, 2009: Reisnick Varian, 1997). ommender systems tend to fail when little is known about the Digital libraries have been applied in many contexts but in this ser information needs paper we focus on an academic environment. University Digital Li Collaborative systems: The system generates recommendations braries(UDls) provide information resources and services to stu- using explicit or implicit preferences from many users, ignor ents, faculty and staff in an environment that supports learning. items representation. Collab teaching and research( Chao, 2002). peer users with a rating history similar to the current user In this paper we propose a fuzzy linguistic recommender sys- and they generate recommendations using this neighborhood tem to achieve major advances in the activities of UDL in order (Good et aL, 1999: Renda& Straccia, 2005 ). to improve their performance. The system is oriented to research Demographic systems: A demographic recommender system ers and it recommends two types of resources: in the first place. provides recommendations based on a demographic profile specialized resources of the user research area, and in the second of the user. Recommended items can be generated for differ place, complementary resources in order to include resources of ent demographic niches, by combining the ratings of users related areas that could be interesting to discover collaboration in those niches(Pazzani, 1999) possibilities with other researchers and to form multi-disciplina Knowledge-based systems: These systems generate the recom- groups. As in( Porcel, Lopez-Herrera, Herrera-Viedma, 2009) we endations based on the inferences about items that satisfy combine a recommender system, to filter out the information, with the users from the information provided by each user regard a multi-granular Fuzzy Linguistic Modeling(FLm), to represent and ing his her knowledge about items that can be recommended handle flexible information by means of linguistic labels(Chang Wang, Wang, 2007; Chen& Ben-Arieh, 2006: Herrera Martinez 2001: Herrera-Viedma, Cordon, Luque, Lopez, Munoz, 2003: All these techniques have benefits and disadvantages. However, Herrera-Viedma, Martinez, Mata, Chiclana, 2005: Herrera, we can use a hybrid approach to smooth out the disadvantages of Herrera-Viedma, Martinez, 2008). each one of them and to exploit their benefits(basu et al, 1998 The paper is structured as follows. Section 2 revises some pre- Claypool et al, 1999: Good et al., 1999). In these kind of systems. iminaries, i.e., the concept and main aspects about recommender the users' information preferences can be used to define user pro- systems and the approaches of FLM that we use to the system de- files that are applied as filters to streams of documents. Therefore, sign, the 2-tuple FLM and the multi-granular FLM. In Section 3 we the construction of accurate profiles is a key task and the system present a multi-disciplinar fuzzy linguistic recommender systems success will depend on a large extent on the ability of the learned to advice research resources in UDL Section 4 reports the system profiles to represent the user's preferences (Quiroga Mostafa evaluation and some experimental results. Finally, some conclud- 2002). narks are pointed out. The recommendation activity is followed by a relevance feed back phase Relevance feedback is a cyclic process whereby the user feeds back into the system decisions on the relevance of retrieved 2. Preliminaries documents and the system then uses these evaluations to auto- matically update the user profile( Hanani et al, 2001; Reisnick 2.1. Recommender systems Recommender systems could be defined as systems that pro- 2. 2. Fuzzy linguistic modeling ndividualized recommendations as output or has the effect hiding the user in a personalized way to interesting or useful The use of fuzzy sets theory has given very good results for cts in a large space of possible options(Burke 2002). They modeling qualitative information(Zadeh, 1975)and it has pro- becoming popular tools for reducing information overload ven to be useful in many problems, e.g., in decision making and for improving the sales in nerce web sites(Burke,(Cabrerizo, Alonso, Herrera-Viedma, 2009: Herrera, Herrera 2007: Cao Li, 2007: Hsu, 2008: Reisnick Varian, 1997) Viedma, Verdegay, 1996: Mata, Martinez, Herrera-Viedma It is a research area that offers tools for discriminating between 2009), quality evaluation(Herrera-Viedma, Pasi, Lopez-Herrera relevant and irrelevant information by providing personalized Porcel, 2006: Herrera-Viedma Peis, 2003 ). models of ssistance for continuous information accesses, filtering the infor- information retrieval(Herrera-Viedma, 2001 a, 2001 b: Herrera mation and delivering it to people who need it(Reisnick Varian, Viedma Lopez-Herrera, 2007: Herrera-Viedma, Lopez-Herrera 997). Automatic filtering services differ from retrieval services in Luque, Porcel, 2007: Herrera-Viedma, Lopez-Herrera, Porcel that in filtering the corpus changes continuously, the users have 2005), and political analysis(Arfi, 2005). It is a tool based on long time information needs( described by mean of user profiles the concept of linguistic variable proposed by Zadeh(1975 ). Next instead of to introduce a query into the system )and their objective we analyze the two approaches of FLM that we use in is to remove irrelevant data from incoming streams of data items system (Hanani et al., 2001; Marchionini, 2009: Reisnick varian, 1997). A result from a recommender system is understood com- 2. 2.1. The 2-tuple fuzzy linguistic mendation, an option worthy of consideration; a result from an The 2-tuple FLM(Herrera 2,2000)isa information retrieval system is interpreted as a match to the users of which allows query(Burke, 2007). he loss of information typical of other fuzzy linguisti
that reveals their areas of interest and as new materials are added to the collection, they are compared to the profiles and relevant items are sent to the users (Marchionini, 2009). One interesting extension of this concept is to use the connectivity inherent in digital libraries to support collaborative filtering, where users rate or add value to information objects and these ratings are shared with a large community, so that popular items can be easily located or people can search for objects found useful by others with similar profiles (Hanani, Shapira, & Shoval, 2001; Marchionini, 2009; Reisnick & Varian, 1997). Digital libraries have been applied in many contexts but in this paper we focus on an academic environment. University Digital Libraries (UDLs) provide information resources and services to students, faculty and staff in an environment that supports learning, teaching and research (Chao, 2002). In this paper we propose a fuzzy linguistic recommender system to achieve major advances in the activities of UDL in order to improve their performance. The system is oriented to researchers and it recommends two types of resources: in the first place, specialized resources of the user research area, and in the second place, complementary resources in order to include resources of related areas that could be interesting to discover collaboration possibilities with other researchers and to form multi-disciplinar groups. As in (Porcel, López-Herrera, & Herrera-Viedma, 2009) we combine a recommender system, to filter out the information, with a multi-granular Fuzzy Linguistic Modeling (FLM), to represent and handle flexible information by means of linguistic labels (Chang, Wang, & Wang, 2007; Chen & Ben-Arieh, 2006; Herrera & Martínez, 2001; Herrera-Viedma, Cordón, Luque, López, & Muñoz, 2003; Herrera-Viedma, Martínez, Mata, & Chiclana, 2005; Herrera, Herrera-Viedma, & Martínez, 2008). The paper is structured as follows. Section 2 revises some preliminaries, i.e., the concept and main aspects about recommender systems and the approaches of FLM that we use to the system design, the 2-tuple FLM and the multi-granular FLM. In Section 3 we present a multi-disciplinar fuzzy linguistic recommender systems to advice research resources in UDL. Section 4 reports the system evaluation and some experimental results. Finally, some concluding remarks are pointed out. 2. Preliminaries 2.1. Recommender systems Recommender systems could be defined as systems that produce individualized recommendations as output or has the effect of guiding the user in a personalized way to interesting or useful objects in a large space of possible options (Burke, 2002). They are becoming popular tools for reducing information overload and for improving the sales in e-commerce web sites (Burke, 2007; Cao & Li, 2007; Hsu, 2008; Reisnick & Varian, 1997). It is a research area that offers tools for discriminating between relevant and irrelevant information by providing personalized assistance for continuous information accesses, filtering the information and delivering it to people who need it (Reisnick & Varian, 1997). Automatic filtering services differ from retrieval services in that in filtering the corpus changes continuously, the users have long time information needs (described by mean of user profiles instead of to introduce a query into the system) and their objective is to remove irrelevant data from incoming streams of data items (Hanani et al., 2001; Marchionini, 2009; Reisnick & Varian, 1997). A result from a recommender system is understood as a recommendation, an option worthy of consideration; a result from an information retrieval system is interpreted as a match to the user’s query (Burke, 2007). A variety of techniques have been proposed as the basis for recommender systems. We can distinguish four different classes of recommendation techniques based on the source of knowledge (Burke, 2007; Hanani et al., 2001; Reisnick & Varian, 1997): Content-based systems: They generate the recommendations taking into account the terms used in the items representation and the ratings that a user has given to them (Basu, Hirsh, & Cohen, 1998; Claypool, Gokhale, & Miranda, 1999). These recommender systems tend to fail when little is known about the user information needs. Collaborative systems: The system generates recommendations using explicit or implicit preferences from many users, ignoring the items representation. Collaborative systems locate peer users with a rating history similar to the current user and they generate recommendations using this neighborhood (Good et al., 1999; Renda & Straccia, 2005). Demographic systems: A demographic recommender system provides recommendations based on a demographic profile of the user. Recommended items can be generated for different demographic niches, by combining the ratings of users in those niches (Pazzani, 1999). Knowledge-based systems: These systems generate the recommendations based on the inferences about items that satisfy the users from the information provided by each user regarding his/her knowledge about items that can be recommended (Burke, 2002). All these techniques have benefits and disadvantages. However, we can use a hybrid approach to smooth out the disadvantages of each one of them and to exploit their benefits (Basu et al., 1998; Claypool et al., 1999; Good et al., 1999). In these kind of systems, the users’ information preferences can be used to define user pro- files that are applied as filters to streams of documents. Therefore, the construction of accurate profiles is a key task and the system’s success will depend on a large extent on the ability of the learned profiles to represent the user’s preferences (Quiroga & Mostafa, 2002). The recommendation activity is followed by a relevance feedback phase. Relevance feedback is a cyclic process whereby the user feeds back into the system decisions on the relevance of retrieved documents and the system then uses these evaluations to automatically update the user profile (Hanani et al., 2001; Reisnick & Varian, 1997). 2.2. Fuzzy linguistic modeling The use of fuzzy sets theory has given very good results for modeling qualitative information (Zadeh, 1975) and it has proven to be useful in many problems, e.g., in decision making (Cabrerizo, Alonso, & Herrera-Viedma, 2009; Herrera, HerreraViedma, & Verdegay, 1996; Mata, Martínez, & Herrera-Viedma, 2009), quality evaluation (Herrera-Viedma, Pasi, López-Herrera, & Porcel, 2006; Herrera-Viedma & Peis, 2003), models of information retrieval (Herrera-Viedma, 2001a, 2001b; HerreraViedma & López-Herrera, 2007; Herrera-Viedma, López-Herrera, Luque, & Porcel, 2007; Herrera-Viedma, López-Herrera, & Porcel, 2005), and political analysis (Arfi, 2005). It is a tool based on the concept of linguistic variable proposed by Zadeh (1975). Next we analyze the two approaches of FLM that we use in our system. 2.2.1. The 2-tuple fuzzy linguistic approach The 2-tuple FLM (Herrera & Martínez, 2000) is a continuous model of representation of information which allows to reduce the loss of information typical of other fuzzy linguistic approaches C. Porcel et al. / Expert Systems with Applications 36 (2009) 12520–12528 12521
(classical and ordinal(Herrera Herrera-Viedma, 1997: Zadeh, that transform without loss of information numerical values 1975)). To define it we have to establish the 2-tuple representation into linguistic 2-tuples and viceversa, any of the existing model and the 2-tuple computational model to represent and aggregation operator can be easily extended for dealing with aggregate the linguistic information, respectively linguistic 2-tuples. Some examples are: Let s= Sgt be a linguistic term set with odd cardinali where the mid term represents a indifference value and the rest of Definition 3(Arithmetic mean). Let x=[( 1, 1),.(rn, Mn))be a the terms is symmetrically related to it. We assume that the set of linguistic 2-tuples, the 2-tuple arithmetic mean X is semantics of the labels is given by means of triangular membership computed as, strib 0nwha0edns1 In this fuzzy lin- X[,2)…(m=4(∑(,x)=4 guistic context, if a symbolic method(Herrera Herrera-Viedma 1997: Herrera et al., 1996) aggregating linguistic information ob- tains a value BE[0, g and B#(O, -.g, then an approximation Definition 4 (Weighted average operator). Let x=I(1, a1),.... function is used to express the result in S. (rn, an)I be a set of linguistic 2-tuples and W=WI, .. wn be their associated weights. The 2-tuple weighted average x is: Definition 1 Herrera and Martinez, 2000. Let b be the result of an aggregation of the indexes of a set of labels assessed in a linguistic term set S, i.e. the result of a symbolic aggregation operation. (n,21)…、(m,=4(2=10nx)-=4( BE[O, g. Let i= round(B)and a=B-i be two values, such that, iE [O g] and a E[-5.5 )then a is called a Symbolic Translation The 2-tuple fuzzy linguistic approach is developed from the Definition 5(Linguistic weighted average operator). Let concept of symbolic translation by representing the linguistic x=[(r1,1),.,(Ta, n)) be a set of linguistic 2-tuples and nformation by means of 2-tuples (S, M),S E S and a E[-5.5) weights. The 2-tuple linguistic weighted average xr is S represents the linguistic label of the information, and a, is a numerical value expressing the value of the translation xW((r1, a1).(w1, a").((m, an), (wn, a-M))=4 B1·Bw from the original result B to the closest index label, i, in the lin guistic term set(s∈S) with B,=4"(r, a4)and Bw,=4"(w, a) This model defines a set of transformation functions between numeric values and 2-tuples. 2. 2.2. The multi-granular fuzzy linguistic modeling Definition 2(Herrera and Martinez, 2000). LetS=(So,., s be a In any fuzzy linguistic approach, an important parameter to determine is the" granularity of uncertainty", i.e. the cardinality linguistic term set and Be [0 g] a value representing the result of a of the linguistic term set S. According to the uncertainty degre ymbolic aggregation operation, then the 2-tuple that expresses that an expert qualifying a phenomenon has on it. the linguistic the equivalent information to B is obtained with the following term set chosen to provide his knowledge will have more or less A:0.g→s×-0.50.5) ent granularity of uncertainty are necessary(herrera Martinez, 4(0=(s.x,wit5 i=round(e). 2001: Herrera-Viedma et al, 2005 ). The use of different labels sets la=B-i aE[-5.5) to assess information is also necessary when an expert has to as- sess different ts,as for example it happens in information where round()is the usual round operation, St has the closest index uate the importance of the query terms el to"p"and"a"is the value of the symbolic translation and the relevance of the retrieved documents(Herrera-Viedma et al., 2003). In such situations, we need tools to manage multi- For all 4 there exists 4", defined as 4"(st,a)=i+a. On the granular linguistic information. In(Herrera Martinez, 2001)a other hand, it is obvious that the conversion of a linguistic term multi-granular 2-tuple FLm based on the concept of linguistic hier into a linguistic 2-tuple consists of adding a symbolic translation archy is proposed. alue of o:s∈S→(S,0 A Linguistic Hierarchy, LH, is a set of levels I(t, n(t)), i.e., The computational model is defined by presenting the following LH=U l(t, n(t), where each level t is a linguistic term set with a different granularity n(t) from the remaining of levels of the hier- archy. The levels are ordered according to their granularity, i.e.a (1)Negation operator: Neg((s, ax))=4(g-(4"(5,, ax) level t 1 provides a linguistic refinement of the previous level t. (2)Comparison of 2-tuples(Sk, a,)and(s1, a2): We can define a level from its predecessor level as: I(t,n(t)) If k <I then(Sk, 1) is smaller than(s,, a2) I(t +1, 2. n(t)-1) Table 1 shows the granularity needed in each Ifk=l then linguistic term set of the level t depending on the value n(t) defined (a)if a1=a2 then(Sk, a1)and(S a2)represent the same in the first level(3 and pectively A graphical example of a linguistic hierarchy is shown in Fig. 1 (b)if a1 a2 then(Sk, a1)is smaller than(S, az). (c)if a1>%2 then(Sk, %,)is bigger than(S, a2). (3)Aggregation operators. The aggregation of information con- Table 1 sists of obtaining a value that summarizes a set of values, Linguistic hierarchies. therefore, the result of the aggregation of a set of 2-tuples must be a 2-tuple. In the literature we can find many aggre- gation operators which allow us to combine the information l(t, n(t)) I(2.5) according to different criteria Using functions 4 and 4-l (,n(r)
(classical and ordinal (Herrera & Herrera-Viedma, 1997; Zadeh, 1975)). To define it we have to establish the 2-tuple representation model and the 2-tuple computational model to represent and aggregate the linguistic information, respectively. Let S ¼ fs0; ... ; sg g be a linguistic term set with odd cardinality, where the mid term represents a indifference value and the rest of the terms is symmetrically related to it. We assume that the semantics of the labels is given by means of triangular membership functions and we consider that all terms are distributed on a scale on which a total order is defined, si 6 sj () i 6 j. In this fuzzy linguistic context, if a symbolic method (Herrera & Herrera-Viedma, 1997; Herrera et al., 1996) aggregating linguistic information obtains a value b 2 ½0; g, and b R f0; ... ; gg; then an approximation function is used to express the result in S. Definition 1 Herrera and Martínez, 2000. Let b be the result of an aggregation of the indexes of a set of labels assessed in a linguistic term set S, i.e., the result of a symbolic aggregation operation, b 2 ½0; g. Let i ¼ roundðbÞ and a ¼ b i be two values, such that, i 2 ½0; g and a 2[.5,.5 )then a is called a Symbolic Translation. The 2-tuple fuzzy linguistic approach is developed from the concept of symbolic translation by representing the linguistic information by means of 2-tuples ðsi; aiÞ; si 2 S and ai 2[.5,.5): si represents the linguistic label of the information, and ai is a numerical value expressing the value of the translation from the original result b to the closest index label, i, in the linguistic term set ðsi 2 SÞ. This model defines a set of transformation functions between numeric values and 2-tuples. Definition 2 (Herrera and Martínez, 2000). Let S ¼ fs0; ... ; sg g be a linguistic term set and b 2 ½0; g a value representing the result of a symbolic aggregation operation, then the 2-tuple that expresses the equivalent information to b is obtained with the following function: D : ½0; g ! S ½0:5; 0:5Þ DðbÞ¼ðsi; aÞ; with si i ¼ roundðbÞ; a ¼ b i a 2 ½:5; :5Þ; where roundðÞ is the usual round operation, si has the closest index label to ‘‘b” and ‘‘a” is the value of the symbolic translation. For all D there exists D1 , defined as D1 ðsi; aÞ ¼ i þ a. On the other hand, it is obvious that the conversion of a linguistic term into a linguistic 2-tuple consists of adding a symbolic translation value of 0 : si 2 S ) ðsi; 0Þ. The computational model is defined by presenting the following operators: (1) Negation operator: Negððsi; aÞÞ ¼ Dðg ðD1 ðsi; aÞÞÞ. (2) Comparison of 2-tuples ðsk; a1Þ and ðsl; a2Þ: If k < l then ðsk; a1Þ is smaller than ðsl; a2Þ. If k ¼ l then (a) if a1 ¼ a2 then ðsk; a1Þ and ðsl; a2Þ represent the same information, (b) if a1 < a2 then ðsk; a1Þ is smaller than ðsl; a2Þ, (c) if a1 > a2 then ðsk; a1Þ is bigger than ðsl; a2Þ. (3) Aggregation operators. The aggregation of information consists of obtaining a value that summarizes a set of values, therefore, the result of the aggregation of a set of 2-tuples must be a 2-tuple. In the literature we can find many aggregation operators which allow us to combine the information according to different criteria. Using functions D and D1 that transform without loss of information numerical values into linguistic 2-tuples and viceversa, any of the existing aggregation operator can be easily extended for dealing with linguistic 2-tuples. Some examples are: Definition 3 (Arithmetic mean). Let x ¼ fðr1; a1Þ; ... ;ðrn; anÞg be a set of linguistic 2-tuples, the 2-tuple arithmetic mean xe is computed as, xe ½ðr1; a1Þ; ... ;ðrn; anÞ ¼ D Xn i¼1 1 n D1 ðri; aiÞ ! ¼ D 1 n Xn i¼1 bi !: Definition 4 (Weighted average operator). Let x ¼ fðr1; a1Þ; ... ; ðrn; anÞg be a set of linguistic 2-tuples and W ¼ fw1; ... ; wng be their associated weights. The 2-tuple weighted average xw is: xw½ðr1; a1Þ; ... ;ðrn; anÞ ¼ D Pn i¼1D1 ðri P ; aiÞ wi n i¼1wi ! ¼ D Pn i¼1bi P wi n i¼1wi : Definition 5 (Linguistic weighted average operator). Let x ¼ fðr1; a1Þ; ... ;ðrn; anÞg be a set of linguistic 2-tuples and W ¼ fðw1; aw 1 Þ; ... ;ðwn; aw n Þg be their linguistic 2-tuple associated weights. The 2-tuple linguistic weighted average xw l is: xw l ½ððr1; a1Þ;ðw1; aw 1 ÞÞ ððrn; anÞ;ðwn; aw n ÞÞ ¼ D Pn i¼1bi P bWi n i¼1bWi !; with bi ¼ D1 ðri; aiÞ and bWi ¼ D1ðwi; aw i Þ. 2.2.2. The multi-granular fuzzy linguistic modeling In any fuzzy linguistic approach, an important parameter to determine is the ‘‘granularity of uncertainty”, i.e., the cardinality of the linguistic term set S. According to the uncertainty degree that an expert qualifying a phenomenon has on it, the linguistic term set chosen to provide his knowledge will have more or less terms. When different experts have different uncertainty degrees on the phenomenon, then several linguistic term sets with a different granularity of uncertainty are necessary (Herrera & Martínez, 2001; Herrera-Viedma et al., 2005). The use of different labels sets to assess information is also necessary when an expert has to assess different concepts, as for example it happens in information retrieval problems, to evaluate the importance of the query terms and the relevance of the retrieved documents (Herrera-Viedma et al., 2003). In such situations, we need tools to manage multigranular linguistic information. In (Herrera & Martínez, 2001) a multi-granular 2-tuple FLM based on the concept of linguistic hierarchy is proposed. A Linguistic Hierarchy, LH, is a set of levels l(t,n(t)), i.e., LH ¼ S tlðt; nðtÞÞ, where each level t is a linguistic term set with a different granularity nðtÞ from the remaining of levels of the hierarchy. The levels are ordered according to their granularity, i.e., a level t þ 1 provides a linguistic refinement of the previous level t. We can define a level from its predecessor level as: lðt; nðtÞÞ ! lðt þ 1; 2 nðtÞ 1Þ. Table 1 shows the granularity needed in each linguistic term set of the level t depending on the value n(t) defined in the first level (3 and 7, respectively). A graphical example of a linguistic hierarchy is shown in Fig. 1. Table 1 Linguistic hierarchies. Level 1 Level 2 Level 3 lðt; nðtÞÞ lð1; 3Þ lð2; 5Þ lð3; 9Þ lðt; nðtÞÞ lð1; 7Þ lð2; 13Þ 12522 C. Porcel et al. / Expert Systems with Applications 36 (2009) 12520–12528
C Porcel et aL / Expert Systems with Applications 36(2009)12520-12528 tion managed by digital libraries increases. Digital libraries must move from being passive, with little adaptation to their users, to being more proactive in offering and tailoring information for indi- viduals and communities, and in supporting community efforts to capture, structure and share knowledge(Callan et al, 2003: Gong alves et al, 2004: Renda Straccia, 2005). So, the digital libraries should anticipate the users'needs and recommend about resources that could be interesting for them. We present a hybrid recommender system that combines both the content-based and collaborative approaches(Burke, 2007 Hanani et al, 2001; Lekakos Giaglis, 2006). The the incoming information stream and delivers it to the suitable researchers according to their research areas. It recommends users research resources of their own research areas and of complemen- ry areas. We use typical similarity functions based on threshold values to identify research resources of the own areas( Porcel et al., 2009). For example we could use the threshold semantic Fig. 1. Linguistic hierarchy of 3, 5 and 9 labels. unctions defined in Information Retrieval to evaluate weighted ueries(Bordogna Pasi, 1993: Korfhage 1997). On the other hand to identify research resources of the complementary areas. erhras and msaefuti te zpoese t mortstrated ah i the stiginstric r. 2007. use Gaussian similarity functions( Bordogna Pasi, 1993: Yag lation and allow to combine ulti-granular linguistic information To represent the linguistic information we use different label Car- functions between labels from different levels was defined allow a higher flexibility in the communication processes of the Definition 6. Let LH=U(t, n(t)be a linguistic hierarchy whose system. Therefore, the system uses different label sets(S1, S2,-) inguistic term sets are denoted as sn()=(som L. The transformation function between a 2-tuple that belongs to level t to represent the different concepts to be assessed in its filtering and another 2-tuple in level tyt is defined as activity. These label sets S are chosen from those label sets of LH, i.e. SI ELH. We should point out that the number of different label sets that we can use is limited by the number of levels of Fr: I(t, n(o)I(t', n(t)) LH, and therefore, in many cases the label sets Si and S, can be as TF,x)=/4-1(,x2,(m)-1 iated to a same label set of Lh but with different interpretations pending on the concept to be modeled In our system, we distin- guish between three concepts that can be assessed: As it was pointed out in Herrera and Martinez (2001) this family of (S1)of a discipline with respec transformation functions is bijective. This result guarantees that the Irce scope or user pr transformations between levels of a linguistic hierarchy are carried Relevance degree(S2)of a resource for a user. out without loss of information. To define the computational model Complementary degree(S3)between the resource scope and we select a level to make uniform the information(for instance, the er topics of interest. greatest granularity level)and then we can use the operators de- Following the linguistic hierarchy shown in Fig. 1, in our system we use the level 2(5 labels)to assign importance degree(S=S) and the level 3(9 labels)to assign relevance degrees(S2=S)and 3. A multi-disciplinar recommender system to advice research complementary degrees(S,=S). Using this Lh the linguistic ources in UDL terms in each level are In this section we present a fuzzy linguistic recommender sys-.S=bo= Null=N, bi=Low=L, b2= Medium=M, b3=High tem designed using a hybrid approach and assuming a multi-gran- H, b4= Total= T: ular FLM. This system is applied to advice users on the best S=(co=Null=N, 1=Very_Low=VL, C2=Low=L, C3 research resources that could satisfy their information needs in a More_Less_Low=MLL, Ca=Medium= M. cs= More-Less High UDL. Moreover, the system recommends complementary resources that could be used by the users to meet other researchers of related areas with the aim to discover collaboration possibilities and so, to The system has three main components: resources manage- form multi-disciplinar groups In this way, it improves the services ment, user profiles management and recommendation process de users (see Fig. 2). The UDL staff manages and spreads a lot of information re sources, such as electronic books, electronic papers, electronic journals, and official dailies( Callan et aL, 2003 Renda Straccia. 3.1. Resources management 2005).Nowadays, this amount of information is growing up and This module is responsible for the management and representa- they are in need of automated tools to filter and spread that info mation to the users in a simple and timely manner. tion of the research resources. to characterize a resource. the li- a traditional search function is normally an integral part of any brary staff must insert all the available information, such as the digital library but, however, users' frustrations are increased as title, author(s), kind of resource (if it is a book, or book chapter. their needs become more complex and as the volume of informa- or a paper, or a journal, or a conference, or an official daily). journal
Herrera and Martínez (2001) demonstrated that the linguistic hierarchies are useful to represent multi-granular linguistic information and allow to combine multi-granular linguistic information without loss of information. To do this, a family of transformation functions between labels from different levels was defined: Definition 6. Let LH ¼ S tlðt; nðtÞÞ be a linguistic hierarchy whose linguistic term sets are denoted as SnðtÞ ¼ fs nðtÞ 0 ; ... ; s nðtÞ nðtÞ1g. The transformation function between a 2-tuple that belongs to level t and another 2-tuple in level t 0 –t is defined as: TFt t0 : lðt; nðtÞÞ ! lðt 0 ; nðt 0 ÞÞ; TFt t0ðs nðtÞ i ; anðtÞ Þ ¼ D D1 ðs nðtÞ i ; anðtÞ Þðnðt 0 Þ 1Þ nðtÞ 1 !: As it was pointed out in Herrera and Martínez (2001) this family of transformation functions is bijective. This result guarantees that the transformations between levels of a linguistic hierarchy are carried out without loss of information. To define the computational model, we select a level to make uniform the information (for instance, the greatest granularity level) and then we can use the operators de- fined in the 2-tuple FLM. 3. A multi-disciplinar recommender system to advice research resources in UDL In this section we present a fuzzy linguistic recommender system designed using a hybrid approach and assuming a multi-granular FLM. This system is applied to advice users on the best research resources that could satisfy their information needs in a UDL. Moreover, the system recommends complementary resources that could be used by the users to meet other researchers of related areas with the aim to discover collaboration possibilities and so, to form multi-disciplinar groups. In this way, it improves the services that a UDL could provide users. The UDL staff manages and spreads a lot of information resources, such as electronic books, electronic papers, electronic journals, and official dailies (Callan et al., 2003; Renda & Straccia, 2005). Nowadays, this amount of information is growing up and they are in need of automated tools to filter and spread that information to the users in a simple and timely manner. A traditional search function is normally an integral part of any digital library but, however, users’ frustrations are increased as their needs become more complex and as the volume of information managed by digital libraries increases. Digital libraries must move from being passive, with little adaptation to their users, to being more proactive in offering and tailoring information for individuals and communities, and in supporting community efforts to capture, structure and share knowledge (Callan et al., 2003; Gonç- alves et al., 2004; Renda & Straccia, 2005). So, the digital libraries should anticipate the users’ needs and recommend about resources that could be interesting for them. We present a hybrid recommender system that combines both the content-based and collaborative approaches (Burke, 2007; Hanani et al., 2001; Lekakos & Giaglis, 2006). The system filters the incoming information stream and delivers it to the suitable researchers according to their research areas. It recommends users research resources of their own research areas and of complementary areas. We use typical similarity functions based on threshold values to identify research resources of the own areas (Porcel et al., 2009). For example, we could use the threshold semantic functions defined in Information Retrieval to evaluate weighted queries (Bordogna & Pasi, 1993; Korfhage, 1997). On the other hand, to identify research resources of the complementary areas, we use Gaussian similarity functions (Bordogna & Pasi, 1993; Yager, 2007). To represent the linguistic information we use different label sets, i.e. the communication among the users and the system is carried out by using multi-granular linguistic information, in order to allow a higher flexibility in the communication processes of the system. Therefore, the system uses different label sets ðS1; S2; ...Þ to represent the different concepts to be assessed in its filtering activity. These label sets Si are chosen from those label sets of LH, i.e., Si 2 LH. We should point out that the number of different label sets that we can use is limited by the number of levels of LH, and therefore, in many cases the label sets Si and Sj can be associated to a same label set of LH but with different interpretations depending on the concept to be modeled. In our system, we distinguish between three concepts that can be assessed: Importance degree ðS1Þ of a discipline with respect to a resource scope or user preferences. Relevance degree ðS2Þ of a resource for a user. Complementary degree ðS3Þ between the resource scope and the user topics of interest. Following the linguistic hierarchy shown in Fig. 1, in our system we use the level 2 (5 labels) to assign importance degree ðS1 ¼ S5 Þ and the level 3 (9 labels) to assign relevance degrees ðS2 ¼ S9 Þ and complementary degrees ðS3 ¼ S9 Þ. Using this LH the linguistic terms in each level are: S 5 ¼ fb0 ¼ Null ¼ N; b1 ¼ Low ¼ L; b2 ¼ Medium ¼ M; b3 ¼ High ¼ H; b4 ¼ Total ¼ Tg; S 9 ¼ fc0 ¼ Null ¼ N; c1 ¼ Very Low ¼ VL; c2 ¼ Low ¼ L; c3 ¼ More Less Low ¼ MLL; c4 ¼ Medium ¼ M; c5 ¼ More Less High ¼ MLH; c6 ¼ High ¼ H; c7 ¼ Very High ¼ VH; c8 ¼ Total ¼ Tg. The system has three main components: resources management, user profiles management and recommendation process (see Fig. 2). 3.1. Resources management This module is responsible for the management and representation of the research resources. To characterize a resource, the library staff must insert all the available information, such as the title, author(s), kind of resource (if it is a book, or book chapter, or a paper, or a journal, or a conference, or an official daily), journal Fig. 1. Linguistic hierarchy of 3, 5 and 9 labels. C. Porcel et al. / Expert Systems with Applications 36 (2009) 12520–12528 12523
C Porcel et aL/ Expert Systems with Applications 36(2009)12520-12528 Informati Resources representation profiles Process Relevant resources for users Fig. 2. Structure of the system. (if it is part of a journal, the system stores the journal name ),con- s about resources( the users choose the kind of desired ference name and dates (if it is a conference). book (if it is a book i.e. if they want only books, or papers, etc )and topics chapter, the system stores the book title), official daily (if it is part of an official daily, the system stores the daily title), date, source, o the vector model Korfhage 1997)to represent text, access link to the resource and its scope. topics of interest. Then, for a user x, we have a vector We use the vector model to represent the resource scope(Korth- VUx=(VUx, VUx,., VUx5) age, 1997). Thus, to represent a resource i, we use a classification composed of 25 disciplines(see Fig 3). In each position we store where each component VUxy E SI, with y=1,., 25, stores a lin- linguistic 2-tuple value representing the importance degree for guistic 2-tuple indicating the importance degree of the discipline y with regard to the user x topics of interest. These 2-tuples values R=(R1,WR2,…VRa5) are also assigned by the library staff. The system is based on a content-based approach, but this Then, each component VR E Su, with j=1...25. indicates the proach suffers the cold-start problem to handle new items or importance degree of the discipline j with regard to the resource new users(Burke, 2007). New items cannot be recommended to i. These importance degrees are assigned by the library staff when any user until they have been rated by some one Recommenda- tions for new resources are considerably weaker than those for more widely rated resources. To overcome this problem, in our sys 3. 2. User profiles management em, as it was done in other systems(for example in Movielens), when a new user is inserted, the first action to confirm his/her reg To characterize an user, the system stores the following basic ister is to access and assess more than 15 resources of all the re- information: nickname, password (necessary to access the system). sources in the system passport number, name and surname, department and center, ad Another aspect of our system is that users can modify the dress, phone number, mobile phone and fax, web, email (elemental threshold that defines the number of recommendations that they information to send the resources and recommendations ) research want to receive So, if the system sends a lot of recommendations. group(it is a string composed of six digits, three characters indicat- the users can limit this number to N and in the future they will re- ng the research area and three numbers identifying the group), ceive only the n most relevant resources D Agriculture, animal breeding and fishing E Vegetal and animal biology and ecology Biotechnology,molecular and cellular biology and genetics E Food science and techonology D Materials science and techonology EEarth science a Social science R Computers science and techonology Energy and combustibles E Philology and philosophy hysics and space sciences a History and art Clvll engineering, transportations, construction and Industrial, mechanics, naval and aeronautic engineering E Mathematics Medicine and veterinary Environment and environmental technology a Mult-disclplinar Sclentific policy [E Psychology and education sciences E Chemisty and chemistry technology a Telecommunications, electric e electronics and Fig 3. Disciplines to define the resource scope
(if it is part of a journal, the system stores the journal name), conference name and dates (if it is a conference), book (if it is a book chapter, the system stores the book title), official daily (if it is part of an official daily, the system stores the daily title), date, source, text, access link to the resource and its scope. We use the vector model to represent the resource scope (Korfhage, 1997). Thus, to represent a resource i, we use a classification composed of 25 disciplines (see Fig. 3). In each position we store a linguistic 2-tuple value representing the importance degree for the resource scope of the discipline in that position: VRi ¼ ðVRi1; VRi2; ... ; VRi25Þ: Then, each component VRij 2 S1, with j ¼ 1 ... 25, indicates the importance degree of the discipline j with regard to the resource i. These importance degrees are assigned by the library staff when they add a new resource. 3.2. User profiles management To characterize an user, the system stores the following basic information: nickname, password (necessary to access the system), passport number, name and surname, department and center, address, phone number, mobile phone and fax, web, email (elemental information to send the resources and recommendations), research group (it is a string composed of six digits, three characters indicating the research area and three numbers identifying the group), preferences about resources (the users choose the kind of desired resources, i.e. if they want only books, or papers, etc.) and topics of interest. We use also the vector model (Korfhage, 1997) to represent the topics of interest. Then, for a user x, we have a vector: VUx ¼ ðVUx1; VUx2; ... ; VUx25Þ; where each component VUxy 2 S1, with y ¼ 1; ... ; 25, stores a linguistic 2-tuple indicating the importance degree of the discipline y with regard to the user x topics of interest. These 2-tuples values are also assigned by the library staff. The system is based on a content-based approach, but this approach suffers the cold-start problem to handle new items or new users (Burke, 2007). New items cannot be recommended to any user until they have been rated by some one. Recommendations for new resources are considerably weaker than those for more widely rated resources. To overcome this problem, in our system, as it was done in other systems (for example in Movielens), when a new user is inserted, the first action to confirm his/her register is to access and assess more than 15 resources of all the resources in the system. Another aspect of our system is that users can modify the threshold that defines the number of recommendations that they want to receive. So, if the system sends a lot of recommendations, the users can limit this number to N, and in the future they will receive only the N most relevant resources. Information sources Resources insertion process Resources representation Matching Process Relevant resources for users Complementary recommendations Feedback User profiles Users Users insertion process Resources management Recommendation process User profile management Fig. 2. Structure of the system. Fig. 3. Disciplines to define the resource scope. 12524 C. Porcel et al. / Expert Systems with Applications 36 (2009) 12520–12528