《电子商务 E-business》阅读文献：collaborative tagging as a tripartite network.pdf

I. INTRODUCTION Recently, new kinds of websites have been dedicated to the sharing of people's habits and tastes, examples including their preferences in music, scientific articles, movies, websites These sites allow members to upload from their own computer a library that characterises their habits in the corresponding topic (an iTunes music library for instance), and next to create a web page containing this list of items. Additionally, the website proposes the users to discover new content by comparing their taste with that of other users, thereby helping them discover new musics/books/websites.. that should(statistically) fit their profile This method rests on a feedback between the users and a central server, and is usually called collaborative filtering. The emergence of these collaborative websites answers the needs of Internet users to retrieve useful and coherent informations from the millions of pages and data that form the Web. Let us stress that the use of statistical methods in order to make coherent suggestions from a user profile is common in commercial websites, i.e Amazon. The main particularities of colaborative systems are:(i) their non-commercial purpose, even though the frontier with commercial companies is more and more vague(see for instance the acquisition of del icio us by Yahoo in November 2005 ); (ii)their transparency, namely these sites are relatively open and do not hide the profiles of each user, contrary to Amazon for instance. From a scientific point of view, this transparency opens perspectives in order to perform large scale experiences(including thousands of people) on taste formation quantitative sociology, musicology. The available data also suggest alternative methods in order to perform large scale classifications of music/science/internet. Those sub-divisions should be based on the intrinsic structure of the audience of the items In parallel with this sharing and statistical comparing of content, collaborative websites usually propose tagging possibilities. This process, called"folksonomy"(short for "folk taxonomy) means that the websites allow users to publicly tag their shared content, the key point being that their tag is not only accessible to themselves, but also to the whole ensemble of users. For instance, in the case of music sharing habits, a group like The Beatles is described in different ways, i.e. pop, 60s, britpop., that depend on the different backgrounds, tastes, music knowledge or network of acquaintances. of the users Both methods, i.e. collaborative filtering(CF) and collaborative tagging(CT) lead to complex networks from which structures have to be extracted in order to deliver useful infor

I. INTRODUCTION Recently, new kinds of websites have been dedicated to the sharing of people’s habits and tastes, examples including their preferences in music, scientific articles, movies, websites... These sites allow members to upload from their own computer a library that characterises their habits in the corresponding topic (an iTunes music library for instance), and next to create a web page containing this list of items. Additionally, the website proposes the users to discover new content by comparing their taste with that of other users, thereby helping them discover new musics/books/websites... that should (statistically) fit their profile. This method rests on a feedback between the users and a central server, and is usually called collaborative filtering. The emergence of these collaborative websites answers the needs of Internet users to retrieve useful and coherent informations from the millions of pages and data that form the Web. Let us stress that the use of statistical methods in order to make coherent suggestions from a user profile is common in commercial websites, i.e. Amazon. The main particularities of collaborative systems are: (i) their non-commercial purpose, even though the frontier with commercial companies is more and more vague (see for instance the acquisition of del.icio.us by Yahoo in November 2005); (ii) their transparency, namely these sites are relatively open and do not hide the profiles of each user, contrary to Amazon for instance. From a scientific point of view, this transparency opens perspectives in order to perform large scale experiences (including thousands of people) on taste formation, quantitative sociology, musicology... The available data also suggest alternative methods in order to perform large scale classifications of music/science/internet. Those sub-divisions should be based on the intrinsic structure of the audience of the items. In parallel with this sharing and statistical comparing of content, collaborative websites usually propose tagging possibilities. This process, called ”folksonomy” (short for ”folk taxonomy”) means that the websites allow users to publicly tag their shared content, the key point being that their tag is not only accessible to themselves, but also to the whole ensemble of users. For instance, in the case of music sharing habits, a group like The Beatles is described in different ways, i.e. pop, 60s, britpop..., that depend on the different backgrounds, tastes, music knowledge or network of acquaintances... of the users. Both methods, i.e. collaborative filtering (CF) and collaborative tagging (CT) lead to complex networks from which structures have to be extracted in order to deliver useful infor- 2

mations to users. In this work, we discuss methods that lead to the identification of a priori unknown collective behaviours, and to a hierarchical representation of the network struc- turing. To do so, we focus on empirical data extracted from websites specialised in musi e. g. audioscrobbler. com and musicmobs. com, and in scientific articles, i.e. citeulike. com. We how that the tagging collaborative process leads to a tripartite network, i.e. a network with three different kinds of nodes(the users, the items and the tags) and where the links relate three nodes of different kinds. The next step of the analysis consists in projecting the tri- partite network on lower order networks. To do so, we evaluate the correlations between the items/tags, depending on their use. Filtering methods 1, i.e. percolation idea-based(PIB methods, allow to uncover the collective behaviours. The resulting hierarchical structure of the network leads to a statistical definition of the notion of genre, and draws a direct link between collaborative filtering and taxonomy. Finally, we discuss methods for measuring the diversity of people 2 II. CLASSIFICATION METHODS In this section, we give a short review of the usual strategies that can be used to classify and organise content 3, as well as their main differences 1. Taxonomies Taxonomies include the Dewey Decimal classification for libraries, computer directory systems, the Linnean system of classifying living things. By construction, a taxonomy is hierarchical and exclusive. In these systems, each item is associated to one category which belongs to a more general category, each category belonging to a more general one until the root of the tree is attained. For instance. the music artists Charlie parker and Charles Mingus can reasonably be classified in the categories Bebop and Free Jazz, both of them belonging to the category Jazz. By construction, taxonomies lead to an automatic tructuring of content into hierarchical structures, that allow users to search with different levels of specificity

mations to users. In this work, we discuss methods that lead to the identification of a priori unknown collective behaviours, and to a hierarchical representation of the network structuring. To do so, we focus on empirical data extracted from websites specialised in music, e.g. audioscrobbler.com and musicmobs.com, and in scientific articles, i.e. citeulike.com. We show that the tagging collaborative process leads to a tripartite network, i.e. a network with three different kinds of nodes (the users, the items and the tags) and where the links relate three nodes of different kinds. The next step of the analysis consists in projecting the tripartite network on lower order networks. To do so, we evaluate the correlations between the items/tags, depending on their use. Filtering methods [1], i.e. percolation idea-based (PIB) methods, allow to uncover the collective behaviours. The resulting hierarchical structure of the network leads to a statistical definition of the notion of genre, and draws a direct link between collaborative filtering and taxonomy. Finally, we discuss methods for measuring the diversity of people [2]. II. CLASSIFICATION METHODS In this section, we give a short review of the usual strategies that can be used in order to classify and organise content [3], as well as their main differences. 1. Taxonomies Taxonomies include the Dewey Decimal classification for libraries, computer directory systems, the Linnean system of classifying living things... By construction, a taxonomy is hierarchical and exclusive. In these systems, each item is associated to one category, which belongs to a more general category, each category belonging to a more general one until the root of the tree is attained. For instance, the music artists Charlie Parker and Charles Mingus can reasonably be classified in the categories Bebop and Free Jazz, both of them belonging to the category Jazz. By construction, taxonomies lead to an automatic structuring of content into hierarchical structures, that allow users to search with different levels of specificity. 3

2. Tagging systems Tagging systems are non-hierarchical and non-exclusive. They consist in associating to each item a list of keywords, all the keywords being considered at the same level. Tagging systems are especially adapted for content that is not easily categorisable into exclusive categories, and for situations when no hierarchical difference exists between categories. Let us take the example of music. In addition to the usual genre classification of a music group a listener may consider additional terms describing its mood, i.e. Sad, Nervous, Happy a taxonomical description requires a hierarchical organisation, i.e. a music group is placed in a directory Jazz/Sad or in a directory Sad/Jazz. In a case when the importance of each characteristic is not clear, such a hierarchy is obviously not adequate, and may lead to problems in order to retrieve all relevant items. For instance a music group placed in Sad/Jazz is not found is the hierarchically higher category Jaz Collective description Usually, the choice of the set of tags available is done by an authority, such as a librarian or an editor, while the attribution of these tags is performed by the same authority or by the creators of the item, i.e. the authors of a scientific paper It is only recently that websites have led to the emergence of collaborative tagging, also called folksonomy. Contrary to the usual tagging classification systems, folksonomy is: (i) anarchic: the choice for the keywords is not restrained by any carcan(contrary to PACS classifications in physics literature for instance), but may include any word composed of letters (ii)democratic: the tagging is equivalently performed by a large ensemble of persons, and not by a central one In itself, folksonomy is especially suitable for systems where no authority is present in order to organise the classifications. That is one of the reasons why it is gaining popularity on the web. The democratic aspect of the method also leads to a very rich description for each item. Namely items that are tagged by many persons are usually characterised by a spectrum of tags, revealing the diverse levels of descriptions associated to them. Nonetheless the richness of the methods may also be a weakness in practice, in order to retrieve useful

2. Tagging systems Tagging systems are non-hierarchical and non-exclusive. They consist in associating to each item a list of keywords, all the keywords being considered at the same level. Tagging systems are especially adapted for content that is not easily categorisable into exclusive categories, and for situations when no hierarchical difference exists between categories. Let us take the example of music. In addition to the usual genre classification of a music group, a listener may consider additional terms describing its mood, i.e. Sad, Nervous, Happy... A taxonomical description requires a hierarchical organisation, i.e. a music group is placed in a directory Jazz/Sad or in a directory Sad/Jazz. In a case when the importance of each characteristic is not clear, such a hierarchy is obviously not adequate, and may lead to problems in order to retrieve all relevant items. For instance, a music group placed in Sad/Jazz is not found is the hieriarchically higher category Jazz. 3. Collective description Usually, the choice of the set of tags available is done by an authority, such as a librarian or an editor, while the attribution of these tags is performed by the same authority or by the creators of the item, i.e. the authors of a scientific paper. It is only recently that websites have led to the emergence of collaborative tagging, also called folksonomy. Contrary to the usual tagging classification systems, folksonomy is: (i) anarchic: the choice for the keywords is not restrained by any carcan (contrary to PACS classifications in physics literature for instance), but may include any word composed of letters. (ii) democratic: the tagging is equivalently performed by a large ensemble of persons, and not by a central one. In itself, folksonomy is especially suitable for systems where no authority is present in order to organise the classifications. That is one of the reasons why it is gaining popularity on the web. The democratic aspect of the method also leads to a very rich description for each item. Namely items that are tagged by many persons are usually characterised by a spectrum of tags, revealing the diverse levels of descriptions associated to them. Nonetheless, the richness of the methods may also be a weakness in practice, in order to retrieve useful 4