iteration i and i+ l is less than a very small e) or a certain number of iterations is reached d eq. (4) as relational method inside ot require that the neighbors I' are in the training set, but are using the probabilities of unlabeled nodes. For PWA this means that in each iteration we use the probabilities of the neigh borhood estimated in the previous iteration collectively. PWA combined with collective inference is denoted as pwa* in the following For WeightedAverage we did not use relaxation labeling but applied a so called one shot-estimation [5, 7]. We did only use the neighbors with known classes, i. e, in the first iteration we exploit only nodes from the training set, while in the next iteration we used also test nodes that have been classified in the previous iterations. The procedure stops when all test nodes could be classified or a specific number of iterations is reached Hence, the tag probabilities are not being re-estimated like for the relaxation labeling but only estimated once. Thus, WA combined with the one-shot-estimation procedure is denoted as wa 4.2 Ensemble Ensemble classification may lead to significant improvement on classification accu- racy, since uncorrelated errors made by the individual classifiers are removed by the combination of different classifiers [2, 6]. Furthermore, ensemble classification reduces variance and bias We have decided to combine WA* and PWA*through a simple unweighted voting, since voting performs particularly well when the results of individual classifiers are similar; as we will see in Section 5, WA and PWA yielded very similar results in our holdout set After performing the individual classifiers, we receive probability distributions for each classifier Ki as output and build the arithmetic mean of the tag-assignment proba- bilities for each test post and tag: P()=∑P(l)L=AB()≠0.t∈T 4.3 Weighting schemes The weight w in eq (3)and(4) is an important factor in the estimation of tag probabil ities, since it describes the strength of the relation between a and x'. There are several 1. For two nodes(r, T')E Rres, compute their similarity by representing z and r'as user-tag profile vectors. Each component of the profile vector corresponds to the urrences between users and tags n({user(x)}×R×{t})t∈r
iteration i and i + 1 is less than a very small ) or a certain number of iterations is reached. We used eq. (4) as relational method inside the loop, where we do not require that the neighbors x 0 are in the training set, but are using the probabilities of unlabeled nodes. For PWA this means that in each iteration we use the probabilities of the neighborhood estimated in the previous iteration collectively. PWA combined with collective inference is denoted as PWA* in the following. For WeightedAverage we did not use relaxation labeling but applied a so called oneshot-estimation [5, 7]. We did only use the neighbors with known classes, i.e., in the first iteration we exploit only nodes from the training set, while in the next iteration we used also test nodes that have been classified in the previous iterations. The procedure stops when all test nodes could be classified or a specific number of iterations is reached. Hence, the tag probabilities are not being re-estimated like for the relaxation labeling but only estimated once. Thus, WA combined with the one-shot-estimation procedure is denoted as WA*. 4.2 Ensemble Ensemble classification may lead to significant improvement on classification accuracy, since uncorrelated errors made by the individual classifiers are removed by the combination of different classifiers [2, 6]. Furthermore, ensemble classification reduces variance and bias. We have decided to combine WA* and PWA* through a simple unweighted voting, since voting performs particularly well when the results of individual classifiers are similar; as we will see in Section 5, WA* and PWA* yielded very similar results in our holdout set. After performing the individual classifiers, we receive probability distributions for each classifier Kl as output and build the arithmetic mean of the tag-assignment probabilities for each test post and tag: P(t|x) = 1 L · X L l=1 Pl(t|x), L := |Kl |Pl(t|x) 6= 0, t ∈ T| (6) 4.3 Weighting Schemes The weight w in eq. (3) and (4) is an important factor in the estimation of tag probabilities, since it describes the strength of the relation between x and x 0 . There are several ways to estimate these weights: 1. For two nodes (x, x0 ) ∈ Rres, compute their similarity by representing x and x 0 as user-tag profile vectors. Each component of the profile vector corresponds to the count of co-occurrences between users and tags: φ user-tag := (|Y ∩ ({user(x)} × R × {t})|)t∈T 11
2. Similarly to 1, for two nodes(, r)E Ruser, the node similarity is computed by representing a and z'as resource-tag profile vectors res-tag (Y∩(U×{res(x)}×{t})te 3. Similar to 2, but a and a' are represented as resource-user profile vectors where each component corresponds to the count of co-occurrences between resources and n({u}×{res(x)}×T))a∈U 4. The same as in l, but the node similarity is computed w.r.t. to user-resource profile vectors (Yn({user(x)}×{r}×m)r∈R The edge weight is finally computed by applying the cosine similarity over the desired profile vectors sim(o(), o(r)) loto) log e) In our experiments we basically used the scheme l, since there i data and therefore we can always build user- tag pre Evaluation All the results presented in this section are reported in terms of Fl-score, the official measure used by the graph-based tag recommendation task of the ECML PKDD Dis covery Challenge 2009. For a given E Xtest the F1-Score is computed as follows 2. Recall(i'(z).Precision(i(z) Recall(T(a))+ Precision(T(r) Although the methods presented in Section 4 usually do not have free parameters we realized that Ruser and Rres can have a different impact in the recommendation quality (cf. Figures 2 and 3), and thereby we introduced a parameter to reward the best relations in Ruser by a factor c E N: if Rres yields better recommendations than ruser for example, all edge weights in Rrssr that refer to Rres are multiplied by c For searching the best c value we performed a greedy search over the factor range 1,,] on a holdout set created by randomly selecting 800 posts from the training data. Tables 1 and 2 show the characteristics of the training and test/holdout datasets respectively. Figure 2 presents the results of WA*-Full, i.e., WA* performed over Ruser for different c values on the holdout set according to the Fl-score. We also plot th results of WA*-Res and WA *-Usr(i.e, WA*on Rres and Ruser resp. After finding the best c value on the holdout set, we applied WA*-Full, PWA*-Full and the ensemble(c f. eq. 6)to the challenge test dataset(see Figure 3). Note that the Since the results of PWA*and WA*are very similar, we just report on WA*
2. Similarly to 1, for two nodes (x, x0 ) ∈ Ruser, the node similarity is computed by representing x and x 0 as resource-tag profile vectors: φ res-tag := (|Y ∩ (U × {res(x)} × {t})|)t∈T 3. Similar to 2, but x and x 0 are represented as resource-user profile vectors where each component corresponds to the count of co-occurrences between resources and users: φ res-user := (|Y ∩ ({u} × {res(x)} × T)|)u∈U 4. The same as in 1, but the node similarity is computed w.r.t. to user-resource profile vectors: φ user-res := (|Y ∩ ({user(x)} × {r} × T)|)r∈R The edge weight is finally computed by applying the cosine similarity over the desired profile vectors: sim(φ(x), φ(x 0 )) := hφ(x), φ(x 0 )i kφ(x)kkφ(x 0)k (7) In our experiments we basically used the scheme 1, since there is no new user in the data and therefore we can always build user-tag profile vectors. 5 Evaluation All the results presented in this section are reported in terms of F1-score, the official measure used by the graph-based tag recommendation task of the ECML PKDD Discovery Challenge 2009. For a given x ∈ Xtest the F1-Score is computed as follows: F1-score Tˆ(x) = 2 · Recall Tˆ(x) · Precision Tˆ(x) Recall Tˆ(x) + Precision Tˆ(x) (8) Although the methods presented in Section 4 usually do not have free parameters, we realized that Ruser and Rres can have a different impact in the recommendation quality (cf. Figures 2 and 3), and thereby we introduced a parameter to reward the best relations in Rres user by a factor c ∈ N: if Rres yields better recommendations than Ruser for example, all edge weights in Rres user that refer to Rres are multiplied by c. For searching the best c value we performed a greedy search over the factor range {1, ..., 4} on a holdout set created by randomly selecting 800 posts from the training data. Tables 1 and 2 show the characteristics of the training and test/holdout datasets respectively. Figure 2 presents the results of WA*-Full1 , i.e., WA* performed over Rres user, for different c values on the holdout set according to the F1-score. We also plot the results of WA*-Res and WA*-Usr (i.e., WA* on Rres and Ruser resp.). After finding the best c value on the holdout set, we applied WA*-Full, PWA*-Full, and the ensemble (c.f. eq. 6) to the challenge test dataset (see Figure 3). Note that the 1 Since the results of PWA* and WA* are very similar, we just report on WA*. 12
Table 1. Characteristics of 2-core BibSonomy 292788 ge test 667 Table 2. Characteristics of the holdout set and the challenge test dataset. results on the challenge test dataset are much lower than those on the holdout set. It may indicate that either our holdout set was not a good representative of the population or that the challenge test dataset represents a concept drift. We plan to further investigate the reasons underlying this large deviation According to the rules of the challenge, the Fl-score is measured over the Top-5 recommended tags, even though one is not forced to always recommend 5 tags. This is an important remark because if one recommends more tags than the true number of tags attached to a particular test post, one can lower precision. Therefore, we estimate the number of tags to be recommended to each test post by taking the average number of tags used by each test user to his resources. If a given test user has tagged his resources with 3 tags in average, for example, we recommend the Top-3 tags delivered by our algorithms for all test posts in which this user appears 6 Conclusions In this paper we proposed to approach the graph-based tag recommendation task of the ECML PKDD Discovery Challenge 2009 by means of relational classification. We first turned the usual tripartite graph of social tagging systems into a homogeneous post graph, whereby simple statistical relational methods can be easily applied. Our methods are training free and the prediction runtime only depends on the number of neighbors and tags, which is fast since the training data is sparse. The models we presented also incorporate a semi-supervised component that can evtl. benefit from test data. We pre- sented two relational classification methods, namely WA*and PWA", and one ensemble based on unweighted voting over the tag probabilities delivered by these methods We also introduced a parameter in order to reward more informative relations, which as learned through a greedy search in a holdout set In future work we want to investigate new kinds of relations between the posts(e.g content-based), other ensemble techniques, and new methods for automatically learning more informative weights
dataset |U| |R| |T| |Y | |X| BibSonomy 1,185 22,389 13,276 253,615 64,406 Table 1. Characteristics of 2-core BibSonomy. dataset |U| |R| |Xtest| Holdout 292 788 800 Challenge test 136 667 778 Table 2. Characteristics of the holdout set and the challenge test dataset. results on the challenge test dataset are much lower than those on the holdout set. It may indicate that either our holdout set was not a good representative of the population or that the challenge test dataset represents a concept drift. We plan to further investigate the reasons underlying this large deviation. According to the rules of the challenge, the F1-score is measured over the Top-5 recommended tags, even though one is not forced to always recommend 5 tags. This is an important remark because if one recommends more tags than the true number of tags attached to a particular test post, one can lower precision. Therefore, we estimate the number of tags to be recommended to each test post by taking the average number of tags used by each test user to his resources. If a given test user has tagged his resources with 3 tags in average, for example, we recommend the Top-3 tags delivered by our algorithms for all test posts in which this user appears. 6 Conclusions In this paper we proposed to approach the graph-based tag recommendation task of the ECML PKDD Discovery Challenge 2009 by means of relational classification. We first turned the usual tripartite graph of social tagging systems into a homogeneous post graph, whereby simple statistical relational methods can be easily applied. Our methods are training free and the prediction runtime only depends on the number of neighbors and tags, which is fast since the training data is sparse. The models we presented also incorporate a semi-supervised component that can evtl. benefit from test data. We presented two relational classification methods, namely WA* and PWA*, and one ensemble based on unweighted voting over the tag probabilities delivered by these methods. We also introduced a parameter in order to reward more informative relations, which was learned through a greedy search in a holdout set. In future work we want to investigate new kinds of relations between the posts (e.g. content-based), other ensemble techniques, and new methods for automatically learning more informative weights. 13
Parameter tuning on the Holdout Test Data 2 WA·Fu(c=1) 0.1 WA*-Full (C=2) WAFu(c=3.5)… WA*Usr…o n Fig. 2. Parameter search of WA*Full in a holdout set. Best c value found equals 3.5 7 Acknowledgements This work is supported by CNPq, an institution of Brazilian Government for scien- tificandtechnologicdevelopmentandthex-Mediaproject(www.x-media-project.org) sponsored by the European Commission as part of the Information Society Technolo- gies (IST)programme under EC grant number IST-FP6-026978. The authors also grate fully acknowledge the partial co-funding of their work through the European Commis- sion fp7 project Mymedia(wWw.mymediaproject. org) under the grant agreement no 215006. For your inquiries please contact info@mymediaproject. org. References 1. Soumen Chakrabarti, Byron E Dom, and Piotr Indyk. Enhanced hypertext categorization using hyperlinks In Laura M. Haas and Ashutosh Tiwary, editors, Proceedings of SIGMOD. 98, pages 307-318. ACM Press, New York, US, 1998 2. Thomas G. Dietterich Machine-learning research: Four current directions. The Al Magazine, 18(4):97-136,1998 3. Robert Jaeschke. Leandro Marinho. Andreas Hotho. Lars Schmidt -Thieme. and Gerd Stumme. Tag recommendations in social bookmarking systems. Al Communications, pages 4. Qing Lu and Lise Getoor. Link-based classification using labeled and unlabeled data. icml 2003 workshop on the continuum from labeled to unlabeled data Inin Machine Learning and Data Mining, 2003
0 0.1 0.2 0.3 0.4 0.5 1 2 3 4 5 F1 Top-n Parameter tuning on the Holdout Test Data WA*-Full (c=1) WA*-Full (c=2) WA*-Full (c=3.5) WA*-Full (c=4) WA*-Res WA*-Usr Fig. 2. Parameter search of WA*-Full in a holdout set. Best c value found equals 3.5 7 Acknowledgements This work is supported by CNPq, an institution of Brazilian Government for scientific and technologic development, and the X-Media project (www.x-media-project.org) sponsored by the European Commission as part of the Information Society Technologies (IST) programme under EC grant number IST-FP6-026978. The authors also gratefully acknowledge the partial co-funding of their work through the European Commission FP7 project MyMedia (www.mymediaproject.org) under the grant agreement no. 215006. For your inquiries please contact info@mymediaproject.org. References 1. Soumen Chakrabarti, Byron E. Dom, and Piotr Indyk. Enhanced hypertext categorization using hyperlinks. In Laura M. Haas and Ashutosh Tiwary, editors, Proceedings of SIGMOD- 98, pages 307–318. ACM Press, New York, US, 1998. 2. Thomas G. Dietterich. Machine-learning research: Four current directions. The AI Magazine, 18(4):97–136, 1998. 3. Robert Jaeschke, Leandro Marinho, Andreas Hotho, Lars Schmidt-Thieme, and Gerd Stumme. Tag recommendations in social bookmarking systems. AI Communications, pages 231–247, 2008. 4. Qing Lu and Lise Getoor. Link-based classification using labeled and unlabeled data. icml 2003 workshop on the continuum from labeled to unlabeled data. In in Machine Learning and Data Mining, 2003. 14
Results on the Challenge Test Data 0.35 WA*-Full PWA-Ful 0.05 Ensemble WA'-Res NA*-Usr Fig 3. Results in the challenge test datatest 5. S. Macskassy and F Provost. A simple relational classifier. In Proceedings of the 2nd Work shop on Multi-Relational Data Mining, KDD2003, pages 64-76, 2003 6. Christine preisach and lars schmidt-Thieme. Relational ensemble classification. In /CDM 06: Proceedings of the Sixth International Conference on Data Mining, pages 499-509 Washington, DC, USA, 2006. IEEE Computer Society 7. Christine Preisach and Lars Schmidt- Thieme Ensembles of relational classifiers. Knowledge Id Information Systems. 14: 249-272, 2007 8. Steffen Rendle, Leandro B Marinho, Alexandros Nanopoulos, and Lars Schimdt-Thier Learning optimal ranking with tensor factorization for tag recommendation. In KDD 09 Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 727-736. ACM, 2009
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 1 2 3 4 5 F1 Top-n Results on the Challenge Test Data WA*-Full PWA*-Full Ensemble WA*-Res WA*-Usr Fig. 3. Results in the challenge test datatest. 5. S. Macskassy and F. Provost. A simple relational classifier. In Proceedings of the 2nd Workshop on Multi-Relational Data Mining, KDD2003, pages 64–76, 2003. 6. Christine Preisach and Lars Schmidt-Thieme. Relational ensemble classification. In ICDM ’06: Proceedings of the Sixth International Conference on Data Mining, pages 499–509, Washington, DC, USA, 2006. IEEE Computer Society. 7. Christine Preisach and Lars Schmidt-Thieme. Ensembles of relational classifiers. Knowledge and Information Systems, 14:249–272, 2007. 8. Steffen Rendle, Leandro B. Marinho, Alexandros Nanopoulos, and Lars Schimdt-Thieme. Learning optimal ranking with tensor factorization for tag recommendation. In KDD ’09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 727–736. ACM, 2009. 15