A Comparative Study of Users'Microbloggin Behavior on sina weibo and twitter Qi Gao!, Fabian Abell, Geert-Jan Houben, Yong Yu2 1 Web Information Systems, Delft University of Technology 2 APEX Data Knowledge Management Lab, Shanghai Jiaotong University (agao,f.abel,g.j.p.m.houben]otudelftnl,yyu@apex.stju.edu.cn Abstract. In this article, we analyze and compare user behavior on vo different microblogging platforms:(1)Sina Weibo which is the most popular microblogging service in China and(2) Twitter. Such a com- arison has not been done before at this scale and is therefore essential for understanding user behavior on microblogging services. In our study we analyze more than 40 million microblogging activities and investigate microblogging behavior from different angles. We(i) analyze how people access microblogs and (ii) compare the writing style of Sina Weibo and Twitter users by analyzing textual features of microposts. Based on se- mantics and sentiments that our user modeling framework extracts from English and Chinese posts, we study and compare(iii)the topics and (iv) sentiment polarities of posts on Sina Weibo and Twitter. Furthermore (v) we investigate the temporal dynamics of the microblogging behavior such as the drift of user interests over time Our results reveal significant differences in the microblogging behavior on Sina Weibo and twitter and deliver valuable insights for multilingual and culture-aware user modeling based on microblogging data. We also explore the correlation between some of these differences and cultural models from social science research Key words: user modeling, microblogging, comparative usage analysis Microblogging services such as Twitter allow people to publish, share and dis- cuss short messages on the Web. Nowadays, Twitter users publish more than 200 million posts, so-called tweets, per day. In China, Sina Weibo is lead ing the microblogging market since twitter is unavailable. Both Sina Weibo and Twitter basically feature the same functionality. For example, both services limit che lengths of microposts to 140 characters and allow users to organize them- selves in a follower-followee network, where people follow the message updates of other users(unidirectional relationship). Sina Weibo and Twitter provide(real time) access to the microposts via APIs and therefore allow for investigating and analyzing interesting applications and functionality such as event detection [1 2] or recommending Web sites 3 By analyzing individual microblogging activities, it is possible to learn about the characteristics, preferences and concerns of users. In previous work, we there- http://blog.twittercom/2011/06/200-million-tweets-per-day.html http://www.weibo.com
A Comparative Study of Users’ Microblogging Behavior on Sina Weibo and Twitter Qi Gao1 , Fabian Abel1 , Geert-Jan Houben1 , Yong Yu2 1 Web Information Systems, Delft University of Technology 2 APEX Data & Knowledge Management Lab, Shanghai Jiaotong University {q.gao,f.abel,g.j.p.m.houben}@tudelft.nl, yyu@apex.stju.edu.cn Abstract. In this article, we analyze and compare user behavior on two different microblogging platforms: (1) Sina Weibo which is the most popular microblogging service in China and (2) Twitter. Such a comparison has not been done before at this scale and is therefore essential for understanding user behavior on microblogging services. In our study, we analyze more than 40 million microblogging activities and investigate microblogging behavior from different angles. We (i) analyze how people access microblogs and (ii) compare the writing style of Sina Weibo and Twitter users by analyzing textual features of microposts. Based on semantics and sentiments that our user modeling framework extracts from English and Chinese posts, we study and compare (iii) the topics and (iv) sentiment polarities of posts on Sina Weibo and Twitter. Furthermore, (v) we investigate the temporal dynamics of the microblogging behavior such as the drift of user interests over time. Our results reveal significant differences in the microblogging behavior on Sina Weibo and Twitter and deliver valuable insights for multilingual and culture-aware user modeling based on microblogging data. We also explore the correlation between some of these differences and cultural models from social science research. Key words: user modeling, microblogging, comparative usage analysis 1 Introduction Microblogging services such as Twitter allow people to publish, share and discuss short messages on the Web. Nowadays, Twitter users publish more than 200 million posts, so-called tweets, per day3 . In China, Sina Weibo4 is leading the microblogging market since Twitter is unavailable. Both Sina Weibo and Twitter basically feature the same functionality. For example, both services limit the lengths of microposts to 140 characters and allow users to organize themselves in a follower-followee network, where people follow the message updates of other users (unidirectional relationship). Sina Weibo and Twitter provide (realtime) access to the microposts via APIs and therefore allow for investigating and analyzing interesting applications and functionality such as event detection [1, 2] or recommending Web sites [3]. By analyzing individual microblogging activities, it is possible to learn about the characteristics, preferences and concerns of users. In previous work, we therefore introduced a semantic user modeling framework for inferring user interests 3 http://blog.twitter.com/2011/06/200-million-tweets-per-day.html 4 http://www.weibo.com/
Qi Gao, Fabian Abel, Geert-Jan Houben, Yong Yu from Twitter activities and proved its efficiency in a news recommendation sys- tem 4. In this paper, we extend this Twitter-based user modeling framework to also allow for sentiment analysis and user modeling based on Chinese microblog posts. We conduct, to the best of our knowledge, the first comparative study of the microblogging behavior on Sina Weibo and twitter and relate our findings to theories and models from social science. The main contributions of our work can be summarized as follows We extend our framework for user modeling based on usage data from m croblogging services with functionality for sentiment analysis and semantic enrichment of Chinese microblog posts We conduct intensive analyses based on more than 40 million microblog posts and compare the microblogging behavior on Sina Weibo and Twitter regard- ing five dimensions: (i) access behavior, (i) syntactic content analysis, (ii) semantic content analysis, (iv) sentiment analysis, (v)temporal behavior We relate our findings to theories about cultural stereotypes developed in social sciences and therefore explain how our insights can allow for culture- aware user modeling based on microblogging streams 2 Related work Various types of research efforts have been conducted on Twitter data recently ranging from information propagation 5, 6] to applications such as Twitter-based early warning systems 1. Furthermore, user modeling and personalization re- search started to study twitter. Chen et al. investigate recommender systems on Twitter that consider social network features or the popularity of items in the Twitter network 3. In previous work, we developed a Twitter-based user mod- eling framework for inferring user interests 4 and studied different applications hat exploit the framework for personalization 7I Research on cultural characteristics of user behavior on the social Web has lso been initiated. For example, Mandl 8 investigates how blog pages, espe- cially the communication patterns between bloggers and commentators, from China differ from the ones from Germany. He correlates his findings to cultural dimensions proposed by Hofstede et al. 9. Chen et al. analyze the tagging behav- ior of two user groups from two popular social music sites in China and Europe espectively [10 and observe differences between the two cultural groups, e. g Chinese users have a smaller tendency to apply subjective tags but prefer the usage of factual tags. So far, there exists little knowledge about the differences and commonalities regarding the microblogging behavior of users from different cultural groups. Yu et al. compare popular trending topics on Sina Weibo with those on Twitter [11], but only compare global trends and do not study individ ual user behavior. In this paper, we close this gap: based on our extended user modeling framework, we conduct a large-scale analysis and comparison of users microblogging behavior on Sina Weibo and Twitter 3 Research Methodology and Evaluation Platform In this section, we detail our research questions and present our enhanced user modeling environment that allows us to investigate the research questions
2 Qi Gao, Fabian Abel, Geert-Jan Houben, Yong Yu from Twitter activities and proved its efficiency in a news recommendation system [4]. In this paper, we extend this Twitter-based user modeling framework to also allow for sentiment analysis and user modeling based on Chinese microblog posts. We conduct, to the best of our knowledge, the first comparative study of the microblogging behavior on Sina Weibo and Twitter and relate our findings to theories and models from social science. The main contributions of our work can be summarized as follows. – We extend our framework for user modeling based on usage data from microblogging services with functionality for sentiment analysis and semantic enrichment of Chinese microblog posts. – We conduct intensive analyses based on more than 40 million microblog posts and compare the microblogging behavior on Sina Weibo and Twitter regarding five dimensions: (i) access behavior, (ii) syntactic content analysis, (iii) semantic content analysis, (iv) sentiment analysis, (v) temporal behavior. – We relate our findings to theories about cultural stereotypes developed in social sciences and therefore explain how our insights can allow for cultureaware user modeling based on microblogging streams. 2 Related Work Various types of research efforts have been conducted on Twitter data recently ranging from information propagation [5, 6] to applications such as Twitter-based early warning systems [1]. Furthermore, user modeling and personalization research started to study Twitter. Chen et al. investigate recommender systems on Twitter that consider social network features or the popularity of items in the Twitter network [3]. In previous work, we developed a Twitter-based user modeling framework for inferring user interests [4] and studied different applications that exploit the framework for personalization [7]. Research on cultural characteristics of user behavior on the Social Web has also been initiated. For example, Mandl [8] investigates how blog pages, especially the communication patterns between bloggers and commentators, from China differ from the ones from Germany. He correlates his findings to cultural dimensions proposed by Hofstede et al. [9]. Chen et al. analyze the tagging behavior of two user groups from two popular social music sites in China and Europe respectively [10] and observe differences between the two cultural groups, e.g. Chinese users have a smaller tendency to apply subjective tags but prefer the usage of factual tags. So far, there exists little knowledge about the differences and commonalities regarding the microblogging behavior of users from different cultural groups. Yu et al. compare popular trending topics on Sina Weibo with those on Twitter [11], but only compare global trends and do not study individual user behavior. In this paper, we close this gap: based on our extended user modeling framework, we conduct a large-scale analysis and comparison of users’ microblogging behavior on Sina Weibo and Twitter. 3 Research Methodology and Evaluation Platform In this section, we detail our research questions and present our enhanced user modeling environment that allows us to investigate the research questions
Microblogging Behavior on Sina Weibo and Twitter 3.1 Research Questions Our research goal is to analyze and compare user behavior on Sina Weibo and Twitter to gain insights for user modeling on microblogging streams. Therefore we investigate(1) how people access microblogging services, (2)the content, ( 3) semantics and(4)sentiment of microblog posts and (5)the temporal behavior of users' microblogging activities Analysis of Access Behavior Microblogging services such as Sina Weibo and Twitter can be accessed via different client applications from both mobile devices and desktop devices. User behavior that can be observed on a microblogging service may be influenced by the way in which a user accesses the service. We is first study the following research questions RQ1: How do people access Sina Weibo and twitter respectively to publish RQ2: To what extent do individual users access a microblogging service from different client applications? Syntactic Content Analysis Both Sina Weibo and twitter limit the length of posts to 140 characters. This limitation impacts the writing style of microblog users and may result in characteristic usage patterns that we would like to com- pare between Sina Weibo(Chinese)and Twitter(English) RQ3: How does the usage of hashtags, URLS and other syntactic patterns (e. g punctuation) differ between Sina Weibo and Twitter for both (i) the re user population and (ii)individual users? To what extent is the usage of hashtags and URLs infl Semantic Content Analysis To better understand the meaning of the mes- sages that users post on microblogging services, we analyze the semantics and vestigate the following RQ5: What kind of topics and concepts do users mention and discuss on Sina Weibo and twitter respectively? RQ6: To what extent do the types of concepts that users mention in thei posts depend on the client applications via which they publish their posts? Sentiment Analysis Microblogs allow users to express and discuss their opin- ions about topics that people are concerned with. We therefore analyze the sentiment of Chinese and English messages and study the following questions: RQT: To what extent do users reveal their sentiment on Sina Weibo and Twitter respectively? RQ8: To what extent does the sentiment correlate with the type of topics and concepts that people mention in their Sina Weibo and Twitter messages? Analysis of Temporal Behavior The users' microblogging behavior may change over time and may, for example, differ between working hours and leisure time. Therefore, we investigate the following research questions RQ9: How does the posting behavior of users, particularly regarding the type of topics that the users mention, change between weekdays and weekends on Sina weibo and twitter? RQ10: How do individual user interests change over time in the two mi- croblogging services?
Microblogging Behavior on Sina Weibo and Twitter 3 3.1 Research Questions Our research goal is to analyze and compare user behavior on Sina Weibo and Twitter to gain insights for user modeling on microblogging streams. Therefore, we investigate (1) how people access microblogging services, (2) the content, (3) semantics and (4) sentiment of microblog posts and (5) the temporal behavior of users’ microblogging activities. Analysis of Access Behavior Microblogging services such as Sina Weibo and Twitter can be accessed via different client applications from both mobile devices and desktop devices. User behavior that can be observed on a microblogging service may be influenced by the way in which a user accesses the service. We thus first study the following research questions: – RQ1: How do people access Sina Weibo and Twitter respectively to publish microposts? – RQ2: To what extent do individual users access a microblogging service from different client applications? Syntactic Content Analysis Both Sina Weibo and Twitter limit the length of posts to 140 characters. This limitation impacts the writing style of microblog users and may result in characteristic usage patterns that we would like to compare between Sina Weibo (Chinese) and Twitter (English): – RQ3: How does the usage of hashtags, URLs and other syntactic patterns (e.g. punctuation) differ between Sina Weibo and Twitter for both (i) the entire user population and (ii) individual users? – RQ4: To what extent is the usage of hashtags and URLs influenced by the users’ access behavior? Semantic Content Analysis To better understand the meaning of the messages that users post on microblogging services, we analyze the semantics and investigate the following aspects: – RQ5: What kind of topics and concepts do users mention and discuss on Sina Weibo and Twitter respectively? – RQ6: To what extent do the types of concepts that users mention in their posts depend on the client applications via which they publish their posts? Sentiment Analysis Microblogs allow users to express and discuss their opinions about topics that people are concerned with. We therefore analyze the sentiment of Chinese and English messages and study the following questions: – RQ7: To what extent do users reveal their sentiment on Sina Weibo and Twitter respectively? – RQ8: To what extent does the sentiment correlate with the type of topics and concepts that people mention in their Sina Weibo and Twitter messages? Analysis of Temporal Behavior The users’ microblogging behavior may change over time and may, for example, differ between working hours and leisure time. Therefore, we investigate the following research questions: – RQ9: How does the posting behavior of users, particularly regarding the type of topics that the users mention, change between weekdays and weekends on Sina Weibo and Twitter? – RQ10: How do individual user interests change over time in the two microblogging services?
Qi Gao, Fabian Abel, Geert-Jan Houben, Yong Yu 3.2 Evaluation platform Extended User Modeling Framework for Microblogging Services. In previous work, we developed a Twitter-based user modeling framework for infer ring user interest from tweets 4, 7. Our framework monitors Twitter activities of a user and enriches the semantics of her Twitter messages by extracting mean- ingful concepts and topics(e.g. named entities) from the messages'content and by linking posts to external relevant Web resources such as new articles. Diffe ent weighting schemes such as time-sensitive or term-frequency-based functions allow for estimating to what extent a user might be interested in a given con- cept at a particular point in time. The generated user profiles can therefore b considered as a set of weighted semantic concepts. In this paper, we extend our framework with three core features: (1)function- ity for monitoring microblogging activities and collecting microposts published on Sina Weibo, (2)named entity recognition for Chinese microposts and (3)sen- timent analysis for both Chinese and English microposts. We use ICTCALS as part-of-speech tagger for Chinese text and extract named entities such as loca- tions, organizations and persons from Chinese posts. We implemented a baseline approach to analyze the sentiment of Chinese and English microposts as pro- posed in [12. Given these additional features, we are able to apply the same user modeling techniques on both microblogging services Sina Weibo and t wit ter and can therefore analyze and compare user characteristics and behavior on the Asian and Western microblogging platforms Data Collection Given the framework, we collected microposts over a period of more than two months via the Sina Weibo Open API and the Twitter Streaming API respectively. For Twitter, we started from a seed set of 56 Twitter users and then we gradually extended this set in a snowball manner. Overall, we collected more than 24 million tweets published by more than 1 million users. For Sina Weibo, since it does not provide functionality similar to Twitters Streaming API, we monitored the most recent public microposts and finally collected more than 22 million microposts published by more than 6 million users. Twitter posts nd Sina Weibo posts were then processed by our framework in order to enrich the semantics of the posts(e. g. entity extraction, sentiment analysis). To better understand the behavior on the level of individual users, we extracted a sample of 1200 active Twitter users(who post in English) and 2616 active Sina Weibo users. The majority of the Twitter users(more than 80%)is-according to their Twitter profile from the United States while the great majority of the Sina Weibo users(more than 95%)is located in China. For a detailed description or he dataset characteristics we refer the reader to 4 and 2 respectively 4 Analysis of User Behavior on Sina Weibo and Twitter Based on the more than 40 million posts that we collected from Sina Weibo and Twitter and processed with our user modeling framework, we study the users behavior on the two platforms and answer the research questions regarding the five dimensions ranging from access behavior to temporal behavior
4 Qi Gao, Fabian Abel, Geert-Jan Houben, Yong Yu 3.2 Evaluation Platform Extended User Modeling Framework for Microblogging Services. In previous work, we developed a Twitter-based user modeling framework for inferring user interest from tweets [4, 7]. Our framework monitors Twitter activities of a user and enriches the semantics of her Twitter messages by extracting meaningful concepts and topics (e.g. named entities) from the messages’ content and by linking posts to external relevant Web resources such as new articles. Different weighting schemes such as time-sensitive or term-frequency-based functions allow for estimating to what extent a user might be interested in a given concept at a particular point in time. The generated user profiles can therefore be considered as a set of weighted semantic concepts. In this paper, we extend our framework with three core features: (1) functionality for monitoring microblogging activities and collecting microposts published on Sina Weibo, (2) named entity recognition for Chinese microposts and (3) sentiment analysis for both Chinese and English microposts. We use ICTCALS5 as part-of-speech tagger for Chinese text and extract named entities such as locations, organizations and persons from Chinese posts. We implemented a baseline approach to analyze the sentiment of Chinese and English microposts as proposed in [12]. Given these additional features, we are able to apply the same user modeling techniques on both microblogging services Sina Weibo and Twitter and can therefore analyze and compare user characteristics and behavior on the Asian and Western microblogging platforms. Data Collection Given the framework, we collected microposts over a period of more than two months via the Sina Weibo Open API and the Twitter Streaming API respectively. For Twitter, we started from a seed set of 56 Twitter users and then we gradually extended this set in a snowball manner. Overall, we collected more than 24 million tweets published by more than 1 million users. For Sina Weibo, since it does not provide functionality similar to Twitter’s Streaming API, we monitored the most recent public microposts and finally collected more than 22 million microposts published by more than 6 million users. Twitter posts and Sina Weibo posts were then processed by our framework in order to enrich the semantics of the posts (e.g. entity extraction, sentiment analysis). To better understand the behavior on the level of individual users, we extracted a sample of 1200 active Twitter users (who post in English) and 2616 active Sina Weibo users. The majority of the Twitter users (more than 80%) is – according to their Twitter profile – from the United States while the great majority of the Sina Weibo users (more than 95%) is located in China. For a detailed description on the dataset characteristics we refer the reader to [4] and [2] respectively. 4 Analysis of User Behavior on Sina Weibo and Twitter Based on the more than 40 million posts that we collected from Sina Weibo and Twitter and processed with our user modeling framework, we study the users’ behavior on the two platforms and answer the research questions regarding the five dimensions ranging from access behavior to temporal behavior
Microblogging Behavior on Sina Weibo and Twitter Weibo Welb posted 54.966.2 another platform Table 1. Number of posts published via Fig. 1. Number of distinct access clients for individual users 4.1 Analysis of Access Behavior Results We first analyzed the most popular client applications that people use to publish posts on Sina Weibo and twitter. On both platforms, the Web interface is the most popular way to access the microblogging services: 43. 1% of the posts are published via the Web on Sina Weibo and 38.5% on Twitter. Other popular clients on Sina Weibo are mainly designed for mobile devices such as the iPhone(7.6%)and Nokia devices(9.4%). Among the most popular Twitter lients are many desktop-based applications such as Tweet Deck, via which 10.7% of the posts are published. Moreover, we observe on both platforms that people publish posts that are rather byproducts of activities the users perform on other platforms. For example, 1.3% of the posts in our Twitter dataset are published user's Twitter timeline whenever she publishes a new blog artice cements on a ia Twitterfeed, an application that allows for publishing annour In Table 1, we overview the type of client applications that people use to publish microblog posts. We therefore manually categorized the 50 most popular clients, that generate more than 90% of the posts on both microblogging services We observe that the fraction of posts that are published via mobile devices is significantly higher on Sina Weibo(45.1%)in comparison to Twitter (33.8%) Furthermore, we discover that the fraction of posts which are rather byproducts of other Web activities of the users -hence where the intent of the actual user activity was not targeted towards Sina Weibo or twitter- is almost three times higher on Sina Weibo(9.4%)than on Twitter (3. 3%) In Fig. 1, we plot for each of the sample users the number of distinct ap- plications which they utilize for publishing microposts. We see that on Twitter more than 95% of the people use more than one client application while on Sina Weibo around 65% of the users switch between different clients Findings From the results above, we conclude the analysis of access behavior with two main findings, referring to the research questions RQI and RQ2 F1: On both platforms, the major way to accessing the microblogging ser- vices is via the official Web interfaces or desktop-based applications. Chinese users seem to differ from the English-spoken Twitter users regarding two core aspects:(i) they use mobile applications more extensively and (ii) publish microposts more often as a byproduct of their other Social Web activities Shttp://ictclas.org/
Microblogging Behavior on Sina Weibo and Twitter 5 type of access fraction of posts Weibo Twitter posted on a Web or 54.9 66.2 desktop application posted on a mobile 45.1 33.8 application primary product of 90.6 96.7 microblogging activity byproduct of an activity 9.4 3.3 on another platform Table 1. Number of posts published via different categories of access clients 0% 20% 40% 60% 80% 100% users 1 10 100 number of distinct clients Weibo Twitter Fig. 1. Number of distinct access clients for individual users 4.1 Analysis of Access Behavior Results We first analyzed the most popular client applications that people use to publish posts on Sina Weibo and Twitter. On both platforms, the Web interface is the most popular way to access the microblogging services: 43.1% of the posts are published via the Web on Sina Weibo and 38.5% on Twitter. Other popular clients on Sina Weibo are mainly designed for mobile devices such as the iPhone (7.6%) and Nokia devices (9.4%). Among the most popular Twitter clients are many desktop-based applications such as TweetDeck, via which 10.7% of the posts are published. Moreover, we observe on both platforms that people publish posts that are rather byproducts of activities the users perform on other platforms. For example, 1.3% of the posts in our Twitter dataset are published via Twitterfeed, an application that allows for publishing announcements on a user’s Twitter timeline whenever she publishes a new blog article. In Table 1, we overview the type of client applications that people use to publish microblog posts. We therefore manually categorized the 50 most popular clients, that generate more than 90% of the posts on both microblogging services. We observe that the fraction of posts that are published via mobile devices is significantly higher on Sina Weibo (45.1%) in comparison to Twitter (33.8%). Furthermore, we discover that the fraction of posts which are rather byproducts of other Web activities of the users – hence where the intent of the actual user activity was not targeted towards Sina Weibo or Twitter – is almost three times higher on Sina Weibo (9.4%) than on Twitter (3.3%). In Fig. 1, we plot for each of the sample users the number of distinct applications which they utilize for publishing microposts. We see that on Twitter more than 95% of the people use more than one client application while on Sina Weibo around 65% of the users switch between different clients. Findings From the results above, we conclude the analysis of access behavior with two main findings, referring to the research questions RQ1 and RQ2 : – F1: On both platforms, the major way to accessing the microblogging services is via the official Web interfaces or desktop-based applications. Chinese users seem to differ from the English-spoken Twitter users regarding two core aspects: (i) they use mobile applications more extensively and (ii) publish microposts more often as a byproduct of their other Social Web activities. 5 http://ictclas.org/