Expert Systems with Applications 36(2009)5173-5183 Contents lists available at ScienceDirect Expert Systems with Applications ELSEVIER journalhomepagewww.elsevier.com/locate/eswa A recommender system for research resources based on fuzzy linguistic modeling C Porcel, A.G. Lopez-Herrera, E. Herrera-Viedma Computer Science. University of jaen, 23071 Jaen, Spain dEpartment of Computer Science and Artificial Intelligence University of Granada, 18071, Granada, Spain ARTICLE IN FO A BSTRACT Nowadays, the increasing popularity of Internet has led to an abundant amount of information created and delivered over electronic media. It causes the information access by the users is a complex activity uzzy linguistic modeling nd they need tools to assist them to obtain the required information Recommender systems are tools ulti-granular linguistic information Those objective is to evaluate and filter the great amount of information available in a specific scope to assist the users in their information access processes. Another obstacle is the great variety of represen- tations of information, specially when the users take part in the process, so we need more flexibility in the information processing. The fuzzy linguistic modeling allows to represent and handle flexible informa- tion. Similar problems are appearing in other frameworks, such as digital academic libraries, research offices business contacts etc. we focus on info access processes in technology transfer offices. The aim of this paper is to develop a recommender system for research resources based on fuzzy linguis- tic modeling. The system helps researchers and environment companies allowing them to obtain auto- matically information about research resources(calls or projects) in their interest areas. It is designed using some filtering tools and a particular fuzzy linguistic modeling, called multi-granular fuzzy linguistic modeling, which is useful when we have to assess different qualitative concepts. The system is working in ne University of Granada and experimental results show that it ible and effective e 2008 Elsevier Ltd. All rights reserved. 1 Introduction Advice in the preparation of offers(management, spread and A Technology Transfer Office(TTO) is responsible for putting Support in the elaboration and negotiation of contracts with into action and managing the activities which generate knowledge companies. and technical and scientific collaboration, thus enhancing the Management of contacts. interrelation between researchers at the University and the entre- Technological offer (the elaboration preneurial world and their participation in various support pro- grammes designed to carry out research, development and n of the The advice in the creation of new businesses nnovation activities. the main mission in this office is to encour- Evaluation, protection and transfer of ownership rights both and help, from the University, the generation of knowledge intellectual and industria nd its spread and transfer to the society, with the aim of rapidly meeting society's needs and demands A graphical representation To fulfil these objectives and manage all the services, a Tto is of this mission is shown in Fig. 1(The Centre for Innovation, composed by a team of technicians that are experts in technology transfer. Each one manages a specific task, but all of them must To carry out its objectives, a tto runs a number of services provide information about research resources to the researchers hich we highlight the followings (The Centre for Innovation, and companies, that is bulletins, projects, calls, notices, events, congresses, courses, and so on. This task requires the selection by he expert of suitable researchers to deliver the information. In this Guidance for Research and Development(R&D)and Technology sources is contributing to that Tto experts not being able to spread Transfer funding. the information to the suitable users(both researchers and compa nies)in a simple and timely manner. Then Tto experts are in need of tools to help them cope with the large amount of information E-mail addresses: cporcelQujaenes(C Porcel), viedma@desai. ugr es (E. Herrera- available about research resources. A promising direction to im- prove the information access about research resources concerns 0957-4174/s- see front matter o 2008 Elsevier Ltd. All rights reserved. doi:10.1016eswa2008.06.03
A recommender system for research resources based on fuzzy linguistic modeling C. Porcel a , A.G. López-Herrera a , E. Herrera-Viedma b,* aDepartment of Computer Science. University of Jaén, 23071 Jaén, Spain bDepartment of Computer Science and Artificial Intelligence University of Granada, 18071, Granada, Spain article info Keywords: Recommender systems Information filtering Fuzzy linguistic modeling Multi-granular linguistic information abstract Nowadays, the increasing popularity of Internet has led to an abundant amount of information created and delivered over electronic media. It causes the information access by the users is a complex activity and they need tools to assist them to obtain the required information. Recommender systems are tools whose objective is to evaluate and filter the great amount of information available in a specific scope to assist the users in their information access processes. Another obstacle is the great variety of representations of information, specially when the users take part in the process, so we need more flexibility in the information processing. The fuzzy linguistic modeling allows to represent and handle flexible information. Similar problems are appearing in other frameworks, such as digital academic libraries, research offices, business contacts, etc. We focus on information access processes in technology transfer offices. The aim of this paper is to develop a recommender system for research resources based on fuzzy linguistic modeling. The system helps researchers and environment companies allowing them to obtain automatically information about research resources (calls or projects) in their interest areas. It is designed using some filtering tools and a particular fuzzy linguistic modeling, called multi-granular fuzzy linguistic modeling, which is useful when we have to assess different qualitative concepts. The system is working in the University of Granada and experimental results show that it is feasible and effective. 2008 Elsevier Ltd. All rights reserved. 1. Introduction A Technology Transfer Office (TTO) is responsible for putting into action and managing the activities which generate knowledge and technical and scientific collaboration, thus enhancing the interrelation between researchers at the University and the entrepreneurial world and their participation in various support programmes designed to carry out research, development and innovation activities. The main mission in this office is to encourage and help, from the University, the generation of knowledge and its spread and transfer to the society, with the aim of rapidly meeting society’s needs and demands. A graphical representation of this mission is shown in Fig. 1 (The Centre for Innovation, XXXX). To carry out its objectives, a TTO runs a number of services which we highlight the followings (The Centre for Innovation, XXXX): Information (R&D bulletins, R&D&I, calls, notices, projects). Guidance for Research and Development (R&D) and Technology Transfer funding. Advice in the preparation of offers (management, spread and exploitation). Support in the elaboration and negotiation of contracts with companies. Management of contacts. Technological offer (the elaboration of the offer, spread and promotion). The advice in the creation of new businesses. Evaluation, protection and transfer of ownership rights both intellectual and industrial. To fulfil these objectives and manage all the services, a TTO is composed by a team of technicians that are experts in technology transfer. Each one manages a specific task, but all of them must provide information about research resources to the researchers and companies, that is bulletins, projects, calls, notices, events, congresses, courses, and so on. This task requires the selection by the expert of suitable researchers to deliver the information. In this task, we find a first problem, the large increase of research resources is contributing to that TTO experts not being able to spread the information to the suitable users (both researchers and companies) in a simple and timely manner. Then TTO experts are in need of tools to help them cope with the large amount of information available about research resources. A promising direction to improve the information access about research resources concerns 0957-4174/$ - see front matter 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2008.06.038 * Corresponding author. E-mail addresses: cporcel@ujaen.es (C. Porcel), viedma@decsai.ugr.es (E. HerreraViedma). Expert Systems with Applications 36 (2009) 5173–5183 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa
5174 C Porcel et al / Expert Systems with Applications 36(2009)5173-5183 research groups) TTO Fig 1. Main mission in a TTo the way in which it is possible to filter the great amount of infor- The recommender systems can be characterized because they mation available Recommender Systems are tools whose objective ( Hanani et al, 2001; Reisnick Varian, 1997) is to evaluate and filter the great amount of information available in a specific scope to assist the users in their information access are applicable for unstructured or semi-structured data (e.g. processes(Basu, Hirsh, Cohen, 1998: Cao Li, 2007: Hanani Web documents or e-mail messages). Shapira, Shoval, 2001: Hsu, 2008: Ungar, Pennock, Lawrence, the users have long time information needs that are described 2001: Reisnick varian, 1997). by means of user profiles, Another problem is the great variety of representations and handle large amounts of data, evaluations of the information. The problem becomes more notice-. deal primarily with textual data and able when users take part in the S.Therefore, to improve the their objective is to remove irrelevant data from incoming information representations and the user interface we need more ns of data items flexibility in the information processing. To solve this probler we propose the use of Fuzzy Linguistic Modeling(FLm)(Ben-Arieh Traditionally, recommender systems have fallen into two main Zhifeng, 2006: Herrera Herrera-Viedma, 1997: Herrera, categories(Good et al., 1999: Hanani et al Popescul et al, Herrera-Viedma, Martinez, 2008: Herrera, Herrera-Viedma, 2001: Reisnick Varian, 1997). Content- recommender Verdegay, 1996: Herrera Martinez, 2000: Zadeh, 1975)to repre- systems recommend the information by matching the terms used sent and handle flexible information by means of linguistic labels. in the representation of user profiles with the index terms used In this paper, we propose SIREZIN, a recommender system for in the representation of documents, ignoring data from other recommending research resources based on FLM. The system al- users. These recommender systems tend to fail when little lows the researchers to obtain automatically information about re- is known about user information needs. Collaborative recom search resources in their interest areas and it recommends about mender systems use explicit or implicit preferences from many panies or another researchers which could collaborate with users to recommend documents to a given user, ignoring the rep- them in projects(Chang, Wang. Wang, 2007; Chen& Ben-Arieh, resentation of documents. These recommender systems tend to 2006: Herrera MartInez, 2001: Herrera-Viedma, Cordon, Luqu fail when little is known about a user, or when he/she has uncom- Lopez, Munoz, 2003: Herrera-Viedma, Martinez, Mata, mon interests(Popescul et al., 2001). In these kind of systems, th Chiclana, 2005). SIREZIN is designed using both recommendation users'information preferences can be used to define user profiles hniques and the multi-granular FLM to represent and handle that are applied as filters to streams of documents: the recom flexible information by means of linguistic labels. To prove the sys- mendations to a user are based on another users' recommend. tem functionality we have implemented a primary version and the tions with similar profiles. The construction of accurate profiles experimental results shows its useful and effectiveness. is a key task and the systems success will depend on a large The paper is structured as follows: Section 2 revises the recom- extent on the ability of the learned profiles to represent the users mendation approaches and the FLM. Section 3 presents the design preferences(Quiroga Mostafa, 2002). Moreover, we can use a of the system, analyzing its architecture, data structure and activ- hybrid approach to smooth out the disadvantages of each one of ty. Section 4 reports the system evaluatio them and to exploit their benefits(Basu et al., 1998: Claypool, esults. Finally, we point out some concluding remarks. Gokhale, Miranda, 1999: Good et al., 1999: Popescul et al 2. Preliminaries On the other hand, we should point out that the matching pro- cess is a main process in the activity of the recommender systems. 2.1. Recommender systems The two major approaches followed in the design and implementa tion of recommender systems to do the matching are the statistical Information gathering in Internet is a complex activity. Find the approach and the knowledge based approach( Hanani et al, 2001) formation, required for the users, on the Web is not a In our system, we have applied the statistical approach. This ap- simple task. This problem is more acute with the ever increasing proach represents the documents and the user profiles as weighted Ise of the Internet. For example, users who subscribe to internet vectors of index terms To filter the information the system imple- lists waste a great deal of time reading, viewing or deleting irrele- ments a statistical algorithm that computes the similarity of a vee vant e-mail messages. To improve the information access on the tor of terms that represents the data item being filtered to a user's Web the users need tools to filter the great amount of information profile. The most common algorithm used is the Correlation or the available across the Web. Recommender systems can provide Cosine measure between the user's profile and the document's vec information services by delivering the information to people who tor( Korfhage, 1997). need it. It is a research area that offers tools for discriminating be- The recommendation activity is followed by a relevance feed- ween relevant and irrelevant information by providing personal- back phase Relevance feedback is a cyclic process whereby the user ized assistance for continuous retrieval of information(Reisnick feeds back into the system decisions on the relevance of retrieved varian, 1997). documents and the system then uses these evaluations to auto-
the way in which it is possible to filter the great amount of information available. Recommender Systems are tools whose objective is to evaluate and filter the great amount of information available in a specific scope to assist the users in their information access processes (Basu, Hirsh, & Cohen, 1998; Cao & Li, 2007; Hanani, Shapira, & Shoval, 2001; Hsu, 2008; Ungar, Pennock, & Lawrence, 2001; Reisnick & Varian, 1997). Another problem is the great variety of representations and evaluations of the information. The problem becomes more noticeable when users take part in the process. Therefore, to improve the information representations and the user interface we need more flexibility in the information processing. To solve this problem we propose the use of Fuzzy Linguistic Modeling (FLM) (Ben-Arieh & Zhifeng, 2006; Herrera & Herrera-Viedma, 1997; Herrera, Herrera-Viedma, & Martı´ nez, 2008; Herrera, Herrera-Viedma, & Verdegay, 1996; Herrera & Martı´ nez, 2000; Zadeh, 1975) to represent and handle flexible information by means of linguistic labels. In this paper, we propose SIRE2IN, a recommender system for recommending research resources based on FLM. The system allows the researchers to obtain automatically information about research resources in their interest areas and it recommends about companies or another researchers which could collaborate with them in projects (Chang, Wang, & Wang, 2007; Chen & Ben-Arieh, 2006; Herrera & Martı´ nez, 2001; Herrera-Viedma, Cordón, Luque, López, & Muñoz, 2003; Herrera-Viedma, Martı´ nez, Mata, & Chiclana, 2005). SIRE2IN is designed using both recommendation techniques and the multi-granular FLM to represent and handle flexible information by means of linguistic labels. To prove the system functionality we have implemented a primary version and the experimental results shows its useful and effectiveness. The paper is structured as follows: Section 2 revises the recommendation approaches and the FLM. Section 3 presents the design of the system, analyzing its architecture, data structure and activity. Section 4 reports the system evaluation and the experimental results. Finally, we point out some concluding remarks. 2. Preliminaries 2.1. Recommender systems Information gathering in Internet is a complex activity. Find the appropriate information, required for the users, on the Web is not a simple task. This problem is more acute with the ever increasing use of the Internet. For example, users who subscribe to internet lists waste a great deal of time reading, viewing or deleting irrelevant e-mail messages. To improve the information access on the Web the users need tools to filter the great amount of information available across the Web. Recommender systems can provide information services by delivering the information to people who need it. It is a research area that offers tools for discriminating between relevant and irrelevant information by providing personalized assistance for continuous retrieval of information (Reisnick & Varian, 1997). The recommender systems can be characterized because they (Hanani et al., 2001; Reisnick & Varian, 1997): are applicable for unstructured or semi-structured data (e.g. Web documents or e-mail messages), the users have long time information needs that are described by means of user profiles, handle large amounts of data, deal primarily with textual data and their objective is to remove irrelevant data from incoming streams of data items. Traditionally, recommender systems have fallen into two main categories (Good et al., 1999; Hanani et al., 2001; Popescul et al., 2001; Reisnick & Varian, 1997). Content-based recommender systems recommend the information by matching the terms used in the representation of user profiles with the index terms used in the representation of documents, ignoring data from other users. These recommender systems tend to fail when little is known about user information needs. Collaborative recommender systems use explicit or implicit preferences from many users to recommend documents to a given user, ignoring the representation of documents. These recommender systems tend to fail when little is known about a user, or when he/she has uncommon interests (Popescul et al., 2001). In these kind of systems, the users’ information preferences can be used to define user profiles that are applied as filters to streams of documents; the recommendations to a user are based on another users’ recommendations with similar profiles. The construction of accurate profiles is a key task and the system’s success will depend on a large extent on the ability of the learned profiles to represent the user’s preferences (Quiroga & Mostafa, 2002). Moreover, we can use a hybrid approach to smooth out the disadvantages of each one of them and to exploit their benefits (Basu et al., 1998; Claypool, Gokhale, & Miranda, 1999; Good et al., 1999; Popescul et al., 2001). On the other hand, we should point out that the matching process is a main process in the activity of the recommender systems. The two major approaches followed in the design and implementation of recommender systems to do the matching are the statistical approach and the knowledge based approach (Hanani et al., 2001). In our system, we have applied the statistical approach. This approach represents the documents and the user profiles as weighted vectors of index terms. To filter the information the system implements a statistical algorithm that computes the similarity of a vector of terms that represents the data item being filtered to a user’s profile. The most common algorithm used is the Correlation or the Cosine measure between the user’s profile and the document’s vector (Korfhage, 1997). The recommendation activity is followed by a relevance feedback phase. Relevance feedback is a cyclic process whereby the user feeds back into the system decisions on the relevance of retrieved documents and the system then uses these evaluations to autoResearchers (research groups) Environment companies TTO Generation of knowledge and its transfer to the society Fig. 1. Main mission in a TTO. 5174 C. Porcel et al. / Expert Systems with Applications 36 (2009) 5173–5183
C Porcel et aL /Expert Systems with Applications 36(2009)5173-518 5175 matically update the user profile(Hanani et al, 2001; Popescul et BE[0. g. Let i=round() and a=B-i be two values, such that, aL., 2001; Reisnick& Varian, 1997) iE[0, g and aE[-5,5) then a is called a Symbolic Translation. Another important aspect that we must have in mind when we design a recommender system is the method to gather user infor The 2-tuple fuzzy linguistic approach is developed from the mation. In order to discriminate between relevant and irrelevant concept of symbolic translation by representing the linguistic information for a user, we must have some information about this information by means of 2-tuples (S, x),s ndx∈|-5,5 user,i.e, we must know the user preferences. Information about user preferences can be obtained in two different ways(Hanani S represents the linguistic label of the information, and et al., 2001; Quiroga Mostafa, 2002), implicit and explicit mode, z is a numerical value expressing the value of the translation although these ways not be mutually exclusive. from the original result B to the closest index label, i, in the lin- The implicit approach is implemented by inference from ustic term set(s∈S kind of observation. The observation is applied to user behavior or to detecting a user's environment(such as bookmarks or visited This model defines a set of transformation functions between URL). The user preferences are updated by detecting changes while numeric values and 2-tuples. bserving the user. On the other hand, the explicit approach, inter- Definition 2.(Herrera 8 Martinez, 2000). Let S=(So,., sgl be a acts with the users by acquiring feedback on information that is fil- linguistic term set and B E[O g] a value representing the result of a tered, that is, the user expresses some specifications of what they symbolic aggregation operation, then the 2-tuple that expresses desire. This last approach is very used(Hanani et al., 2001: the equivalent information to B is obtained with the following Popescul et al, 2001: Reisnick Varian, 1997). function: 2. 2. Fuzzy linguistic modeling A:10.g]→S×[0505) There are situations in which the information cannot be as- 4(B)=(Si, az),with i= round(B x=B-ia∈|-.5.5) sessed precisely in a quantitative form but may be in a qualitative one. For example, when attempting to quality phenomena related where round() is the usual round operation, s, has the closest index to human perception, we are often led to use words in natural lan- label to"Band"o" is the value of the symbolic translation. guage instead of numerical values. In other cases, precise quantita- tive information cannot be stated because either it is unavailable or For all 4 there exists A-l, defined as 4-(si, a)=i+a. On the the cost for its computation is too high and an approximate value other hand, it is obvious that the conversion of a linguistic term can be applicable The use of Fuzzy Sets Theory has given very good into a linguistic 2-tuple consists of adding a symbolic translation results for modeling qualitative information(Zadeh, 1975)and it value of 0: S:ES=(s4,0) has proven to be useful in many problems, e.g., in decision making The computational model is defined by presenting the following (Herrera, Herrera-Viedma, Verdegay, 1996: Herrera et al, 1996: operators: Herrera, Herrera-Viedma, Verdegay, 1998: Xu, 2006), quality Herrera-Viedma Peis, 2003: Herrera-Viedma, Peis, Morales del 2. Comparison of 2-tuples(Sk, a,)and(S, 42): s a)) evaluation(Herrera-Vviedma, Pasi, Lopez-Herrera, Porcel, 2006: 1. Negation operator: Neg((si, a))=4(g-(4"( astillo, Alonso, Anaya, 2007), information retrieval(Herrera- Viedma, 2001; Herrera-Viedma, 2001: Herrera-Viedma lopez Ifk=I then Herrera, 2007: Herrera-Viedma, Lopez-Herrera, Luque, porcel )if 1=2 then (Sk, a1)and (S a2) represent the same 2007: Herrera-Viedma, Lopez-Herrera, Porcel, 2005), political information analysis(Arf, 2005), etc. It is a tool based on the concept of linguis (b)if a1 a2 then(Sk, a,)is smaller than(S,%) tic variable proposed by Zadeh(1975). Next we analyze the two ap- (c)if a,>a2 then(Sk, a1) is bigger than(S 2). proaches of FLM that we use in our system. 3. Aggregation operators. The aggregation of information consists of obtaining a value that summarizes a set of values, therefore, 2. 1. The 2-tuple fuzzy linguistic approach the result of the aggregation of a set of 2-tuples must be a 2-tuple. The 2-tuple FLM(Herrera Martinez, 2000) is a continuous In the literature we can find many aggregation operators which model of representation of information that allows to reduce the allow us to combine the information according to different crite- loss of information typical of other fuzzy linguistic approaches ria. Using functions 4 and 4 that transform without loss of (classical and ordinal Herrera Herrera-Viedma, 1997: Zadeh, nformation numerical values into linguistic 2-tuples and vice 1975). To define it we have to establish the 2-tuple representation versa, any of the existing aggregation operator can be easily model and the 2-tuple computational model to represent and extended for dealing with linguistic 2-tuples. Some examples are aggregate the linguistic information, respectively. Let S=(so,.,Ssl be a linguistic term set with odd cardinality, Definition 3. Arithmetic mean: Let x=((r1, a1)..... (rn, n)be a set where the mid term represents a indifference value and the rest of of linguistic 2-tuples, the 2-tuple arithmetic mean xe is computed the terms are symmetric relate to it. we assume that the semanti of labels is given by means of triangular membership functions and consider all terms distributed on a scale on which a total order is defined, S <S<i<j. In this fuzzy linguistic context, if a sym- bolic method(Herrera Herrera-Viedma, 1997: Herrera et al 1996)aggregating linguistic information obtains a value BE[0,g]. Definition 4. Weighted average operator: Let x=((r1,a1),., g), then an approximation function is used to ex-(a, a.) be a set of linguistic 2-tuples and w= wi,..., wn) be thei press the result in S. associated weights. The 2-tuple weighted average xis Definition 1.(Herrera E Martinez, 2000 ) Let B be the result of an aggregation of the indexes of a set of labels assessed in a linguistic Tn,an)=4 erm set S, i.e, the result of a symbolic aggregation operation
matically update the user profile (Hanani et al., 2001; Popescul et al., 2001; Reisnick & Varian, 1997). Another important aspect that we must have in mind when we design a recommender system is the method to gather user information. In order to discriminate between relevant and irrelevant information for a user, we must have some information about this user, i.e., we must know the user preferences. Information about user preferences can be obtained in two different ways (Hanani et al., 2001; Quiroga & Mostafa, 2002), implicit and explicit mode, although these ways not be mutually exclusive. The implicit approach is implemented by inference from some kind of observation. The observation is applied to user behavior or to detecting a user’s environment (such as bookmarks or visited URL). The user preferences are updated by detecting changes while observing the user. On the other hand, the explicit approach, interacts with the users by acquiring feedback on information that is filtered, that is, the user expresses some specifications of what they desire. This last approach is very used (Hanani et al., 2001; Popescul et al., 2001; Reisnick & Varian, 1997). 2.2. Fuzzy linguistic modeling There are situations in which the information cannot be assessed precisely in a quantitative form but may be in a qualitative one. For example, when attempting to qualify phenomena related to human perception, we are often led to use words in natural language instead of numerical values. In other cases, precise quantitative information cannot be stated because either it is unavailable or the cost for its computation is too high and an approximate value can be applicable. The use of Fuzzy Sets Theory has given very good results for modeling qualitative information (Zadeh, 1975) and it has proven to be useful in many problems, e.g., in decision making (Herrera, Herrera-Viedma, & Verdegay, 1996; Herrera et al., 1996; Herrera, Herrera-Viedma, & Verdegay, 1998; Xu, 2006), quality evaluation (Herrera-Viedma, Pasi, López-Herrera, & Porcel, 2006; Herrera-Viedma & Peis, 2003; Herrera-Viedma, Peis, Morales del Castillo, Alonso, & Anaya, 2007), information retrieval (HerreraViedma, 2001; Herrera-Viedma, 2001; Herrera-Viedma & LópezHerrera, 2007; Herrera-Viedma, López-Herrera, Luque, & Porcel, 2007; Herrera-Viedma, López-Herrera, & Porcel, 2005), political analysis (Arfi, 2005), etc. It is a tool based on the concept of linguistic variable proposed by Zadeh (1975). Next we analyze the two approaches of FLM that we use in our system. 2.2.1. The 2-tuple fuzzy linguistic approach The 2-tuple FLM (Herrera & Martı´ nez, 2000) is a continuous model of representation of information that allows to reduce the loss of information typical of other fuzzy linguistic approaches (classical and ordinal Herrera & Herrera-Viedma, 1997; Zadeh, 1975). To define it we have to establish the 2-tuple representation model and the 2-tuple computational model to represent and aggregate the linguistic information, respectively. Let S ¼ fs0; ... ; sg g be a linguistic term set with odd cardinality, where the mid term represents a indifference value and the rest of the terms are symmetric relate to it. We assume that the semantics of labels is given by means of triangular membership functions and consider all terms distributed on a scale on which a total order is defined, si 6 sj () i 6 j. In this fuzzy linguistic context, if a symbolic method (Herrera & Herrera-Viedma, 1997; Herrera et al., 1996) aggregating linguistic information obtains a value b 2 ½0; g, and b R f0; ... ; gg; then an approximation function is used to express the result in S. Definition 1. (Herrera & Martı´nez, 2000). Let b be the result of an aggregation of the indexes of a set of labels assessed in a linguistic term set S, i.e., the result of a symbolic aggregation operation, b 2 ½0; g. Let i ¼ roundðbÞ and a ¼ b i be two values, such that, i 2 ½0; g and a 2 ½:5; :5Þ then a is called a Symbolic Translation. The 2-tuple fuzzy linguistic approach is developed from the concept of symbolic translation by representing the linguistic information by means of 2-tuples ðsi; aiÞ, si 2 S and ai 2 ½:5; :5Þ: si represents the linguistic label of the information, and ai is a numerical value expressing the value of the translation from the original result b to the closest index label, i, in the linguistic term set (si 2 S). This model defines a set of transformation functions between numeric values and 2-tuples. Definition 2. (Herrera & Martı´nez, 2000). Let S ¼ fs0; ... ; sgg be a linguistic term set and b 2 ½0; g a value representing the result of a symbolic aggregation operation, then the 2-tuple that expresses the equivalent information to b is obtained with the following function: D : ½0; g ! S ½0:5; 0:5Þ; DðbÞ¼ðsi; aÞ; with si i ¼ roundðbÞ; a ¼ b i a 2 ½:5; :5Þ; where roundðÞ is the usual round operation, si has the closest index label to ‘‘b” and ‘‘a” is the value of the symbolic translation. For all D there exists D1 , defined as D1 ðsi; aÞ ¼ i þ a. On the other hand, it is obvious that the conversion of a linguistic term into a linguistic 2-tuple consists of adding a symbolic translation value of 0: si 2 S ) ðsi; 0Þ. The computational model is defined by presenting the following operators: 1. Negation operator: Negððsi; aÞÞ ¼ Dðg ðD1 ðsi; aÞÞÞ. 2. Comparison of 2-tuples ðsk; a1Þ and ðsl; a2Þ: If k < l then ðsk; a1Þ is smaller than ðsl; a2Þ. If k ¼ l then (a) if a1 ¼ a2 then ðsk; a1Þ and ðsl; a2Þ represent the same information, (b) if a1 < a2 then ðsk; a1Þ is smaller than ðsl; a2Þ, (c) if a1 > a2 then ðsk; a1Þ is bigger than ðsl; a2Þ. 3. Aggregation operators. The aggregation of information consists of obtaining a value that summarizes a set of values, therefore, the result of the aggregation of a set of 2-tuples must be a 2-tuple. In the literature we can find many aggregation operators which allow us to combine the information according to different criteria. Using functions D and D1 that transform without loss of information numerical values into linguistic 2-tuples and viceversa, any of the existing aggregation operator can be easily extended for dealing with linguistic 2-tuples. Some examples are Definition 3. Arithmetic mean: Let x ¼ fðr1; a1Þ; ... ;ðrn; anÞg be a set of linguistic 2-tuples, the 2-tuple arithmetic mean xe is computed as xe ½ðr1; a1Þ; ... ;ðrn; anÞ ¼ D Xn i¼1 1 n D1 ðri; aiÞ ! ¼ D 1 n Xn i¼1 bi !: Definition 4. Weighted average operator: Let x ¼ fðr1; a1Þ; ... ; ðrn; anÞg be a set of linguistic 2-tuples and W ¼ fw1; ... ; wng be their associated weights. The 2-tuple weighted average xw is xw½ðr1;a1Þ;...;ðrn;anÞ ¼ D Pn i¼1D1 ðri P ;aiÞ wi n i¼1 wi ! ¼ D Pn i¼1bi P wi n i¼1 wi : C. Porcel et al. / Expert Systems with Applications 36 (2009) 5173–5183 5175
C Porcel et al / Expert Systems with Applications 36(2009)5173-5183 Definition 5. Linguistic weighted average operator: Let x=((r1, a1) (. (n, Mn)) be a set of linguistic 2-tuples and W=l(wi, am (Wn, a)) be their linguistic 2-tuple asso linguistic weighted average x is x[(r1,x1),(W1,x7)…(rn,n),Wwn,xn)=4 ∑1B1·Bw with B=A"(, a) and Bw=4"(w, a]). 2.2.2. The multi-granular fuzzy linguistic modeling In any fuzzy linguistic approach, an important parameter to determinate is the "granularity of uncertainty",i.e, the cardinali of the linguistic term set S. According to the uncertainty degi that an expert qualifying a phenomenon has on it, the linguistic term set chosen to provide his knowledge will have more or less terms. When different experts have different uncertainty degrees Fig. 2. Linguistic Hierarchy of 3, 5 and 9 labels. on the phenomenon, then several linguistic term sets with a differ ent granularity of uncertainty are necessary(herrera Martinez 2001: Herrera-Viedma et al, 2005). The use of different labels sets we select a level to make uniform the information( for instance, the to assess information is also necessary when an expert has to as- great granularity level)and then we can use the operators defined ess different concepts, as for example it happens in information in the 2-tuple FLM. retrieval problems, to evaluate the importance of the query terms and the relevance of the retrieved documents(Herrera-Viedma et 3. SIRE2IN, a Recommender system for research resources L 2003). In such situations, we need tools for the management of multi-granular linguistic information. In Herrera Martinez (2001)is proposed a multi-granular 2-tuple FLM based on the con- based on multi-granular FLM F SIRE2IN, a recommender system In this section, we pre cept of linguistic hierarchy( Cordon, Herrera, Zwir, 2001). A Linguistic Hierarchy, LH, is a set of levels I(t, n(t), i.e., As we said in the introduction, the Tto technicians manage and LH=U,I(t, n(t)). where each level t is a linguistic term set with dif- SPread a lot of information about research information such as calls or projects. Nowadays, this amount of information is growing up ferent granularity n(t)from the remaining of levels of the hierarchy and the experts are in need of automatic tools to filter and spread ( Cordon et al, 2001 ). The levels are ordered according to their the information in a simple and timely manner. Because of this,our granularity. i.e,a level t+ 1 provides a linguistic refinement of system incorporates in its activity a filtering process that follows the previous level t. We can define a level from its predecessor le- the content-based approach. Moreover, to improve the representa- el as: I(t, n(t))-(t+1,2,.,n(t)-1). Table 1 shows the granu- tion of the information in the system we use multi-granular lin- larity needed in each linguistic term set of the level t depending on the value n(t) defined in the first level (3 and 7, respectively).a guistic information, that is, different label sets to represent the graphical example of a linguistic hierarchy is shown in Fig. 2 ifferent concepts to be assessed for different users in the filtering In Herrera Martinez(2001) was demonstrated that the lin activity guistic hierarchies are useful to represent the multi-granular lin Then, SIRE2IN filters the incoming information stream and gen guistic information and allow to combine multi-granular erates useful recommendations to the suitable researchers in accordance with their research areas. For each user the system linguistic information without loss of information. To do this, a generates an email with a summary about the resources, its rele- family of transformation functions between labels from different vance degrees and recomm ns about collaboration Definition 6. Let LH=U(t, n(t)) be a linguistic hierarchy whose linguistic term sets are denoted as S()=(so",., s n(t-1). The 3. 1. Architecture transformation function between a 2-tuple that belongs to level t and another 2-tuple in level tzt is defined The architecture of SIRE2IN(Fig 3)has three main components: F:lt,n(t)→l(t,n(t) Resources management. This module is the responsible one of (=9-4(remo0=1) management the information sources from which the Tto xperts receive all the information about research resources. It obtains an internal representation of these items. Examples of As it was pointed out in Herrera Martinez(2001) this family of information sources are Internet, news bulletins, distribution transformation functions is bijective. This result guarantees the lists, forums, etc. To manage the items, we represent them in transformations between levels of a linguistic hierarchy are carried accordance with its scope using the UNESco terminology for ut without loss of information. To define the computational model, the science and technology(The UNESCO terminology, XXXX). This terminology is composed by three levels and each one is a refinement of the previous level. The first level includes gen eral topics and they are codified by two digits. Each topic includes some disciplines codified by four digits in a second Level 3 level. The third level is composed by subdisciplines that sent the activities developed in each discipline; these sub I(1.3 l(2.5) plines are codified by six digits. We are going to operat the first and second levels, because we think the third level sup-
Definition 5. Linguistic weighted average operator: Let x ¼ fðr1; a1Þ; ð... ;ðrn; anÞg be a set of linguistic 2-tuples and W ¼ fðw1; aw 1 Þ; ... ; ðwn; aw n Þg be their linguistic 2-tuple associated weights. The 2-tuple linguistic weighted average xw l is xw l ½ððr1; a1Þ;ðw1; aw 1 ÞÞ...ððrn; anÞ;ðwn; aw n ÞÞ ¼ D Pn i¼1bi P bWi n i¼1bWi !; with bi ¼ D1 ðri; aiÞ and bWi ¼ D1ðwi; aw i Þ. 2.2.2. The multi-granular fuzzy linguistic modeling In any fuzzy linguistic approach, an important parameter to determinate is the ‘‘granularity of uncertainty”, i.e., the cardinality of the linguistic term set S. According to the uncertainty degree that an expert qualifying a phenomenon has on it, the linguistic term set chosen to provide his knowledge will have more or less terms. When different experts have different uncertainty degrees on the phenomenon, then several linguistic term sets with a different granularity of uncertainty are necessary (Herrera & Martı´ nez, 2001; Herrera-Viedma et al., 2005). The use of different labels sets to assess information is also necessary when an expert has to assess different concepts, as for example it happens in information retrieval problems, to evaluate the importance of the query terms and the relevance of the retrieved documents (Herrera-Viedma et al., 2003). In such situations, we need tools for the management of multi-granular linguistic information. In Herrera & Martı´nez (2001) is proposed a multi-granular 2-tuple FLM based on the concept of linguistic hierarchy (Cordón, Herrera, & Zwir, 2001). A Linguistic Hierarchy, LH, is a set of levels lðt; nðtÞÞ, i.e., LH ¼ S tlðt; nðtÞÞ, where each level t is a linguistic term set with different granularity nðtÞ from the remaining of levels of the hierarchy (Cordón et al., 2001). The levels are ordered according to their granularity, i.e., a level t þ 1 provides a linguistic refinement of the previous level t. We can define a level from its predecessor level as: lðt; nðtÞÞ ! lðt þ 1; 2; ... ; nðtÞ 1Þ. Table 1 shows the granularity needed in each linguistic term set of the level t depending on the value n(t) defined in the first level (3 and 7, respectively). A graphical example of a linguistic hierarchy is shown in Fig. 2. In Herrera & Martı´nez (2001) was demonstrated that the linguistic hierarchies are useful to represent the multi-granular linguistic information and allow to combine multi-granular linguistic information without loss of information. To do this, a family of transformation functions between labels from different levels was defined: Definition 6. Let LH ¼ S tlðt; nðtÞÞ be a linguistic hierarchy whose linguistic term sets are denoted as SnðtÞ ¼ fs nðtÞ 0 ; ... ; s nðtÞ nðtÞ1g. The transformation function between a 2-tuple that belongs to level t and another 2-tuple in level t 0 –t is defined as: TFt t0 : lðt; nðtÞÞ ! lðt 0 ; nðt 0 ÞÞ; TFt t0ðs nðtÞ i ; anðtÞ Þ ¼ D D1 ðs nðtÞ i ; anðtÞ Þðnðt 0 Þ 1Þ nðtÞ 1 !: As it was pointed out in Herrera & Martı´nez (2001) this family of transformation functions is bijective. This result guarantees the transformations between levels of a linguistic hierarchy are carried out without loss of information. To define the computational model, we select a level to make uniform the information (for instance, the great granularity level) and then we can use the operators defined in the 2-tuple FLM. 3. SIRE2IN, a Recommender system for research resources In this section, we present SIRE2IN, a recommender system based on multi-granular FLM. As we said in the introduction, the TTO technicians manage and spread a lot of information about research information such as calls or projects. Nowadays, this amount of information is growing up and the experts are in need of automatic tools to filter and spread the information in a simple and timely manner. Because of this, our system incorporates in its activity a filtering process that follows the content-based approach. Moreover, to improve the representation of the information in the system we use multi-granular linguistic information, that is, different label sets to represent the different concepts to be assessed for different users in the filtering activity. Then, SIRE2IN filters the incoming information stream and generates useful recommendations to the suitable researchers in accordance with their research areas. For each user the system generates an email with a summary about the resources, its relevance degrees and recommendations about collaboration possibilities. 3.1. Architecture of SIRE2IN The architecture of SIRE2IN (Fig. 3) has three main components: Resources management. This module is the responsible one of management the information sources from which the TTO experts receive all the information about research resources. It obtains an internal representation of these items. Examples of information sources are Internet, news bulletins, distribution lists, forums, etc. To manage the items, we represent them in accordance with its scope using the UNESCO terminology for the science and technology (The UNESCO terminology, XXXX). This terminology is composed by three levels and each one is a refinement of the previous level. The first level includes general topics and they are codified by two digits. Each topic includes some disciplines codified by four digits in a second level. The third level is composed by subdisciplines that represent the activities developed in each discipline; these subdisciplines are codified by six digits. We are going to operate with the first and second levels, because we think the third level supFig. 2. Linguistic Hierarchy of 3, 5 and 9 labels. Table 1 Linguistic hierarchies Level 1 Level 2 Level 3 lðt; nðtÞÞ l(1, 3) l(2, 5) l(3, 9) lðt; nðtÞÞ l(1, 7) l(2, 13) 5176 C. Porcel et al. / Expert Systems with Applications 36 (2009) 5173–5183
C Porcel et aL. /Expert Systems with Applications 36(2009)5173-518 5177 Information sources Resources representation -Relevant resourcess users Users User profiles Fig. 3. Structure of SIRE2IN ply a discrimination level too much high and this could difficult. target: this field indicates the kind of users which is oriented the the interaction with the users moreover for each resource we resource, that is researchers, companies or anybody, store another kind of information that the system uses in the fil- minimum and maximum amount: it indicates the minimum and maxi- tering process mum amount that the user can solicit User profiles management. The users can be researchers of the. scope: the system manages the resources in accordance with their scope University or employees of the environment companies. In both nt the resource scope we use the vector model where for each cases, the system operates with an internal representation of the resource the system stores a vector VR, i.e., a ordered list of terms. To user's preferences or needs, that is, the system represents each uild this vector we follow the unesco ter gy(The UNESCO user through an user profile. To define a user profile we use terminology, XXXX), specifically we use the second level. This level the basic information about the user and his/ her topics of inter has 248 disciplines, so the vector must have 248 positions, one position est, represented also by the UNESCO terminology(The UNESCO for each discipline. In each position the vector stores a 2-tuple linguistic terminology, XXXX), i.e. each user have a list of UNESCO codes alue which represents the importance degree for the resource scope of ccording to hisher information needs or interests. Both the UNESCO code represented esearch groups and companies have assigned a set of UNESCO codes that define their research activity. So, initially the system To set up a user profile we use the following information: assigns to each user the UNESCO codes of his/ her research grou or company and afterwards, users can update their profiles in a s il feedback phase in which the users can express some explicit password: necessary to access the system, specifications of their preferences dni: identity national document, Filtering process. This component filters the incoming informa- name and surname tion to deliver it to the fitting users. The filtering process is based department and center if the user is a University researcher or the com- on a matching process. As our system is a content-based recom- ly if the user is a company employee, mender system, it filters the information by matching the terms address phone number, mobile phone and fa terms used in the representation of resources. Later, we will email: elemental information to send the resources study this process in detail taking into account the data recommendations structures research group: only if the user belongs to a research group. We use a code which is a string composed by six digits, three characters indicating 3. 2. Data structures he research area and three numbers identifying the group collaboration preferences: if the user want to collaborate with other In this subsection, we are going to discuss the data structur researchers of a distinct group, with companies, with anybody or with nat we need to represent all the information about the users and research resources. We must have in mind that the system minimum and maximum amount: the users define the interval in which tores this information because it does not work with explicit user they have interested in solicit a call, topics of interest: to represent the topics of interest we use the vector To characterize a research resource, we use the following model too, where for each user the system stores a vector VU. To build his vector we follow the UNESCO terminology(The UNESCO termi- nology, XXXX), specifically we use the second level. This level has 248 disciplines, so the vector must have 248 positions, one position for each abstract discipline. In each position the vector stores a 2-tuple linguistic value which represents the importance degree for the users topic of interest date of the UNESCO code represented in that position. b it other hand, to represent the linguistic information we not send all the information but summarized information and the use different label sets, i.e. the communication among the users o access the resource and the system is carried out by using multi-granular linguistic
ply a discrimination level too much high and this could difficult the interaction with the users. Moreover, for each resource we store another kind of information that the system uses in the filtering process. User profiles management. The users can be researchers of the University or employees of the environment companies. In both cases, the system operates with an internal representation of the user’s preferences or needs, that is, the system represents each user through an user profile. To define a user profile we use the basic information about the user and his/her topics of interest, represented also by the UNESCO terminology (The UNESCO terminology, XXXX), i.e. each user have a list of UNESCO codes according to his/her information needs or interests. Both research groups and companies have assigned a set of UNESCO codes that define their research activity. So, initially the system assigns to each user the UNESCO codes of his/her research group or company and afterwards, users can update their profiles in a feedback phase in which the users can express some explicit specifications of their preferences. Filtering process. This component filters the incoming information to deliver it to the fitting users. The filtering process is based on a matching process. As our system is a content-based recommender system, it filters the information by matching the terms used in the representation of user profiles against the index terms used in the representation of resources. Later, we will study this process in detail taking into account the data structures. 3.2. Data structures In this subsection, we are going to discuss the data structures that we need to represent all the information about the users and research resources. We must have in mind that the system stores this information because it does not work with explicit user queries. To characterize a research resource, we use the following information: titular, abstract, text, date, source, link: when the system sends the users information about a resource, it does not send all the information but summarized information and the link to access the resource, target: this field indicates the kind of users which is oriented the resource, that is researchers, companies or anybody, minimum and maximum amount: it indicates the minimum and maximum amount that the user can solicit, scope: the system manages the resources in accordance with their scope. To represent the resource scope we use the vector model where for each resource the system stores a vector VR, i.e., a ordered list of terms. To build this vector we follow the UNESCO terminology (The UNESCO terminology, XXXX), specifically we use the second level. This level has 248 disciplines, so the vector must have 248 positions, one position for each discipline. In each position the vector stores a 2-tuple linguistic value which represents the importance degree for the resource scope of the UNESCO code represented in that position. To set up a user profile we use the following information: user’s identity: usually his/her mail, password: necessary to access the system, dni: identity national document, name and surname, department and center if the user is a University researcher or the company if the user is a company employee, address, phone number, mobile phone and fax, email: elemental information to send the resources and recommendations, research group: only if the user belongs to a research group. We use a code which is a string composed by six digits, three characters indicating the research area and three numbers identifying the group, collaboration preferences: if the user want to collaborate with other researchers of a distinct group, with companies, with anybody or with nobody, minimum and maximum amount: the users define the interval in which they have interested in solicit a call, topics of interest: to represent the topics of interest we use the vector model too, where for each user the system stores a vector VU. To build this vector we follow the UNESCO terminology (The UNESCO terminology, XXXX), specifically we use the second level. This level has 248 disciplines, so the vector must have 248 positions, one position for each discipline. In each position the vector stores a 2-tuple linguistic value which represents the importance degree for the user’s topic of interest of the UNESCO code represented in that position. On the other hand, to represent the linguistic information we use different label sets, i.e. the communication among the users and the system is carried out by using multi-granular linguistic Information sources Resources insertion process Resources representation Matching Process Relevant resourcess for users Feedback Users User profiles Users insertion process Resources management User profile management Filtering process Fig. 3. Structure of SIRE2IN. C. Porcel et al. / Expert Systems with Applications 36 (2009) 5173–5183 5177