Chinese social media reaction to the MERS-CoV and avian influenza A(H7N9) outbreaks

Background As internet and social media use have skyrocketed, epidemiologists have begun to use online data such as Google query data and Twitter trends to track the activity levels of influenza and other infectious diseases. In China, Weibo is an extremely popular microblogging site that is equivalent to Twitter. Capitalizing on the wealth of public opinion data contained in posts on Weibo, this study used Weibo as a measure of the Chinese people’s reactions to two different outbreaks: the 2012 Middle East Respiratory Syndrome Coronavirus (MERS-CoV) outbreak, and the 2013 outbreak of human infection of avian influenza A(H7N9) in China. Methods Keyword searches were performed in Weibo data collected by The University of Hong Kong’s Weiboscope project. Baseline values were determined for each keyword and reaction values per million posts in the days after outbreak information was released to the public. Results The results show that the Chinese people reacted significantly to both outbreaks online, where their social media reaction was two orders of magnitude stronger to the H7N9 influenza outbreak that happened in China than the MERS-CoV outbreak that was far away from China. Conclusions These results demonstrate that social media could be a useful measure of public awareness and reaction to disease outbreak information released by health authorities.


Background
Digital epidemiology is a quickly growing field that uses digital (e.g. Internet) information to study the distribution of diseases and other health conditions over time and in different geographical areas [1,2]. Various online data have been harnessed for public health surveillance purposes [3]. For example, search engine query data from Google have been used to estimate weekly influenza activity in a number of countries (Google Flu Trends) [4] and Google query data in French were correlated with French surveillance data for influenza, acute diarrhea and chickenpox [5]. Search engine query data from other search engines, namely Yahoo and Baidu, also correlated well with influenza surveillance data in the US and China, respectively [6,7]. Online news data from HealthMap [8] were used to track the 2010 Haitian cholera outbreak, along with social media data (Twitter) [9].
Social media data could be harnessed to analyze the public's concern about an infectious disease outbreak. Scientists studied Twitter data to monitor influenza activity [10], public concern about H1N1 influenza [11,12], and sentiments about H1N1 influenza vaccination [13]. Algorithms were developed to distinguish tweets that mentioned someone's experiences with influenza from those that expressed worries about it [14]. The 2013 H7N9 influenza outbreak in China also drew the attention of epidemiologists toward the potential ability to monitor disease outbreaks using digital data [15].
Weibo, translated "microblog", is the Chinese social media equivalent to Twitter. Like Twitter, Weibo allows users to post and share messages carrying at most 140 Chinese characters. Users may optionally attach links, images, or videos to their messages. Weibo also allows users to "follow" others' Weibo accounts ("friends") or to repost (or "retweet", in Twitter parlance) another user's posts to one's own readership ("followers"). Despite the government's control on the Internet content [16], Weibo still enables Chinese people to publish messages about public incidents or disseminate information during natural disasters [17]. It was described by Western media as a new "free speech platform" [18]. One major Weibo service provider in China, Sina Weibo, claimed to have over 500 million registered users at the end of 2012 [19].
Our study is the first to use Chinese social media (Weibo) data to study the Chinese online community's reaction to the release of official outbreak data from health authorities, namely the outbreaks of MERS-CoV in 2012 [20] and of human infections of avian influenza A(H7N9) in 2013 [21,22]. Our hypothesis was that China's online community would have a stronger reaction to an outbreak in China than one outside China. Our analysis allows health authorities and the media to better understand the online dynamics of health communications in outbreak scenarios.

Data acquisition and sampling
Weibo data were collected by The University of Hong Kong's Weiboscope project. The project's primary aim is to develop a data collection and visualization system for better understanding of Weibo in China. Details of the methodology have been reported elsewhere [16]. In summary, the project generated a list of about 350,000 indexed microbloggers by searching the Sina Weibo user database systematically using the Application Programming Interface (API) functions provided by Sina Weibo. The inclusion criterion was those users who have at least 1,000 followers. We used high-follower-count samples for two reasons: first, in social media, high-followercount users are relatively more influential and can often draw disproportionally larger public attention [23]. Second, this sampling strategy can minimize the influence of spam accounts, which were found widespread in China's social media [24]. Because of the heightened restriction on Sina Weibo API access, the microbloggers included in the data acquisition since January 2013 were restricted to a selective group of around 50,000 "opinion leaders" with at least 10,000 followers. This group of microbloggers was selected for analysis in the current study in order to have fair comparison between the keyword frequencies in 2012 and 2013.
For each indexed microblogger on the list, all new Weibo messages posted were fetched periodically by using Sina Weibo's user timeline API function. Newly collected messages were cached in the database for future data analysis. The frequency of revisiting the user timeline of the indexed microbloggers varied from every three minutes to once monthly, which depended on multiple factors that were chosen to maximize detection of each user's posts [16] while making efficient use of the per-hour API rate limit imposed by Sina Weibo as well as our limited computing resources (See Additional file 2 -Appendix for more details).

Keyword detection and data analysis
The Weibo raw data was acquired over the period of January 1, 2012 to June 30, 2013 in Comma-Separated Values (CSV) format and sorted by week [16]. The CSV files contain useful metadata available for analysis, including the Weibo posts, the created date and user ID data. The user IDs were "hashed" before storing them, meaning they were converted into a different string of characters so that the user ID is not directly displayed in the database. The first line of each file describes the properties of the file, followed by the Weibo post record.
Keyword detection started with a simple stringsearching algorithm; given a keyword of a particular disease, for example, H7N9, the algorithm searched every Weibo post and recorded if and how many times the particular keyword appeared in the data file. Table 1 shows the list of keywords that were used in the searching process and were included in the final analysis. Figure 1 shows the workflow for keyword selection and analysis. Figure S1 in Additional file 2 -Appendix shows the flowchart of the Keyword Detection Scheme. Please refer to Additional file 2 -Appendix for more details.
We used official press releases of outbreak data by WHO and the Chinese government as "signals" (or the assumed sources of outbreak news) to which the Chinese online community reacted. The Global Alert and Response press release by WHO on September 23, 2012 was used as a "signal" for news on MERS-CoV (then known as "a novel coronavirus") [20], and the March 31, 2013 press release by the Chinese National Health and Family Planning Commission was used as a "signal" for news on human infections of avian influenza A(H7N9) [22]. Statistical analysis was performed using Microsoft Excel, SAS 9.3 Base and R 2.15.3. We first established the baseline for each keyword and then measured the online response (both magnitude and time to peak) compared to the baseline. We normalized the number of posts with a particular keyword on a given day by dividing it by the total number of posts in our sample for that day, and then multiplying it by 1,000,000 to obtain the number of tweets with a particular keyword per 1 million tweets. The 2012 data (January 3 -December 30) was used to establish the baseline data for Weibo posts with keywords "avian flu" and "H7N9". Likewise, part of the 2012 data, prior to September 23, 2012, was used to establish the baseline for the keywords that were related to MERS-CoV. We chose 2012 as the baseline year, assuming that the underlying Weibo conversations about health-related information were not significantly different between 2012 and 2013. One-sample t-test (twosided) was used to measure the statistical significance of the difference between the peaks and their corresponding baseline values.
A new website dedicated to this project, named Weibo-Health [25], was created to share our updated results with public health researchers and practitioners.

Human infections of avian influenza A(H7N9), March -April 2013
The reaction to the news of human infection of avian influenza A(H7N9) was very profound in the Chinese online community. Among the users with ≥10,000 followers, a peak of 33,904 per million Weibo posts (t = −20,836; p < 0.001) that contain the keywords "禽流感" (Qinliugan in pinyin, a Mandarin Chinese phonetic script, avian flu) or "H7N9" or both was observed on April 5, 2013, five days after the Chinese government press release on March 31, 2013. This was 1093.6 times the standard deviation (s.d.) away from the mean of the baseline value in 2012 (mean, 24.19; s.d., 30.98) ( Table 2). After the peak, there was a quick decline in Weibo discussion on this topic. The number of Weibo posts that contain "H7N9" and/or "禽流感" (avian flu) declined to 7,469 per million on April 12 (a decline of 3,638.7 posts per day from April 5 to 12, assuming a linear trend, R 2 = 0.9433). On April 13, the Chinese National Health and Family Planning Commission announced that there was a H7N9-positive case in Beijing.
The H7N9 avian flu-related posts doubled (15,864 per million, t = −9,741; p < 0.001). After this second peak, the attention waned and the number of posts on H7N9 avian flu declined at a rate of 1,873.6 per million per day to 1,883 per million on April 20, 2013 ( Figure 2). If only the keyword "H7N9" was used, the signal was even more sensitive. Given its very low baseline in 2012 (mean, 0.027 per million posts, s.d. 0.265), its peak of 8,803 per million posts (t = −632,933; p < 0.001) was 33,220 s.d. away from the baseline mean.
Baseline and peak values are presented as number per million Weibo posts that contain keywords for avian flu and H7N9 in our samples of about 50,000 users with ≥10,000 followers, in 2012 and 2013.
In our pilot studies, we had also tried the keywords "流行性感冒" (liúxíngxìng gǎnmào; influenza) and "流感" (liúgǎn; short form for liúxíngxìng gǎnmào flu; English equivalent: flu). For the former, few posts (per day) contained this formal technical term, and so we decided to drop it in further analysis (data not shown). For the latter, since the keyword "禽流感" (avian flu) is more specific and it actually contained the term "流感" (flu), we decided to use "禽流感" (avian flu) in our analysis instead of "流感" (flu) (data not shown).

MERS-CoV, September 2012
The Chinese online community also reacted to the news of a novel coronavirus, now known as MERS-CoV, identified in a patient in the UK, but in a less pronounced way ( Figure 3; Table 3).  Nine different keywords that were related to SARS were tested, and three of them were found both sensitive and specific enough to reflect the Chinese online community's reaction to this novel coronavirus (Table 1) (Table 4).

SARS-related posts during the H7N9 outbreak, 2013
We also studied how the traffic of Weibo posts carrying SARS-related keywords reacted to the H7N9 outbreak. Beginning on March 31, 2013, Weibo posts with keywords "非典" (Feidian, shortened for atypical pneumonia) or the *Baseline data were based on data from January 3 to December 30, 2012. We did not use the data for Jan 1-2, 2012 because of a peak for "禽流感" (Qinliugan, avian flu) on Jan 2, 2012 that was considered as an outlier. That peak was a result of the news released on that day about a patient who died of highly pathogenic avian influenza in Shenzhen, Guangdong Province, China, on December 31, 2011 [26]. Unit: per million posts. †p < 0.001. s.d., standard deviation. English acronym SARS rocketed, and reached a peak on April 3, 2013. Likewise, Weibo posts with keywords "沙士" (SARS) or "冠状病毒" (Coronavirus) increased, and reached a peak on April 5, 2013 ( Figure 4).

Discussion
The Chinese online community reacted rapidly to news about infectious disease outbreaks both within and beyond China, as shown in our study. This paper is the first to document this online response using Weibo and to compare the reaction to the MERS-CoV outbreak in 2012 with the reaction to the human infections of avian influenza A(H7N9) in 2013. We found that the reaction  to the H7N9 outbreak in 2013 was about two orders of magnitude stronger than the one to the MERS-CoV outbreak in 2012. The results confirmed our hypothesis that the Chinese online community reacted more strongly to an outbreak that was in China than one outside China. The reaction in the Chinese online community exploded within the first five days of the first case report of three human cases (two in Shanghai and one in Anhui) of avian influenza A(H7N9) [22]. Within these five days, more cases were identified in Shanghai and in two neighboring provinces of Jiangsu and Zhejiang. However, attention soon declined rapidly. It declined until April 13, 2013, when the Chinese government announced that a child was found H7N9-positive in Beijing, the capital of China. This piece of news triggered a second explosion of online discussion via Weibo on that day. Attention then declined rapidly again (Figure 2).
Keywords that were sensitive and specific to the signals were identified. Keywords like "H7N9" and "冠状病 毒" (Coronavirus) were highly sensitive and specific. Keywords like "禽流感" (avian flu) and SARS, while less specific, remained sensitive enough to detect the signals.
While the keyword "非典" (Feidian, shortened for atypical pneumonia) was not sensitive to the news of MERS-CoV on September 23, 2012 (Figure 3b), we would like to highlight its significance in the lexicon of the current Chinese online community as one of its most frequently used term for SARS in online discussion. As a keyword, "非典" (Feidian) was sensitive to rumors of SARS in the city of Baoding, China, on February 19, 2012. The rumors were later rejected by the Chinese authorities on February 26, 2012 when the possibility of SARS infection among feverish hospitalized patients in a hospital in Baoding was excluded (Figure 3b) [27]. This keyword, however, also led to a "false positive". On July 21, 2012, there was a severe flood in Beijing, resulting in dozens of deaths. The Chinese online community complained about the Beijing municipal government's disaster management. The government reacted by holding a press conference on July 24, saying that they had learned the lessons of SARS in 2003 and did not conceal the true death toll [28]. This incident also led to a peak in posts with the keyword "非典" (Feidian) (Figure 3b). On January 30, 2013, in a telephone interview with the China Central Television, Prof. ZHONG Nan-Shan, a well-respected medical researcher with a reputation as a leader in fighting against SARS in 2003 in China, mentioned that air pollution in China was more dreadful than "非典" (Feidian) because no one could escape from it [29]. His quote from the interview also led to a peak of Weibo posts with the keyword "非典" (Feidian) (Figure 4).
The observation that Weibo posts with the keywords "非典" (Feidian) and SARS rose to 3131. 9   the MERS-CoV outbreak. These results again confirmed our hypothesis that the Chinese online community reacted more strongly to an outbreak that happened in China than one outside China. Drawing on the social amplification of risk model [31], public risk perception is shaped by a process of interplays between psychological, cultural, social, and institutional factors that may result in amplifying or attenuating the public attention to risk. Mass communication is among the list of factors. Public health officials have long recognised the role of the mass media in disseminating risk and emergency information before, during, and after a catastrophe [32]. The World Health Organization establishes guidelines for "effective media communication", through which the authorities are able to disseminate information to the public [33]. Communication during crisis was traditionally understood to be a one-way and top-down process, in which the public are assumed to be "deficient" in knowledge, while the scientists, public health experts, and emergency managers, are "sufficient" [34]. But this presumption was profoundly challenged by the emergence of social media. For instance, Leung and Nicoll argued that the 2009 H1N1 pandemic was the first pandemic in which social media "challenged conventional public health communication" [35]. In China, online messages were published ahead of the official statement in the 2008 Sichuan Earthquake [36]. Social media enabled people under crisis to share information and experience and to seek message credibility and confirmation via multiple media platforms and social networks [34]. Our study demonstrated that official data released by health authorities, whether in Beijing or Geneva, received strong reactions in the Chinese online community. With such knowledge, social media should be incorporated in the best practices for risk and crisis communication [37]. Social media data can also provide health authorities, researchers and the media a quantifiable measure of public attention towards a particular disease outbreak [11].
Social media, in addition to being a tool to release and track official outbreak information [38], offers a new opportunity for public health practitioners to understand social and behavioral barriers to infection control, to identify misinformation and emerging rumors [39], and to better understand the sentiments and risk perception associated with outbreaks and preventive and control measures [13]. In turn, these will help facilitate better health communication between public health agencies and the society at large, as well as among citizens themselves.
With our Weibo data, there are at least two potential directions for future research. First, we can study how information about a given disease spread across the social network as represented by Weibo. Kwak et al. [40] identified a non-power-law follower distribution, a short effective diameter and low reciprocity in Twitter follower-following topology, which was different from most human social networks. Over 85% of the top trending topics on Twitter are headline news or persistent news. Once retweeted, a tweet would reach an average of 1,000 users regardless of original tweet's number of followers [40]. However, a previous study has found that Chinese Weibo exhibits a distinct pattern of information dissemination [41]. For example, the network connections between Chinese microbloggers are markedly hierarchical than those between Twitter users, i.e. Chinese users tend to follow those at a higher or similar social level [42]; majority of Weibo posts are indeed re-posts that are originated from a small percentage of original messages [24]. It will be very interesting if further research can shed light upon how information sharing over Weibo can affect human response to the diseases off-line.
Second, content analysis of Weibo posts will enable us to analyze human attitudes or reactions toward health hazard [43]. The research can be extended to investigate anxiety or fear towards the infectious diseases themselves and towards the outbreak information transmitted via the Weibo social network. Similar research on influenza has been conducted using Twitter data [12,14]. Data mining methods, like topic models [44], may be attempted.
There are a few limitations to our study. The sampled microbloggers in our study were limited to those who have more than 10,000 followers. Despite the fact that these microbloggers are more likely to be authentic users rather than spam accounts, the samples constitute less than 0.1% of the overall microblogger population [23]. However, a random sampling study finds that Weibo content contribution is unevenly distributed among users [23]. Over half of Sina Weibo subscribers have never posted, whereas about 5% of Weibo users contributed more than 80% of the original posts [23]. Hence, the sampled microbloggers in our study were the most influential microbloggers who contributed a majority of Weibo posts and drew the most attention in terms of the number of reposts and comments [23]. Therefore, for the purpose of this study, this group of highfollower-count microbloggers should be deemed fairly representative of the public attention towards the MERS-CoV and H7N9 outbreaks. But the reader should note that the findings of our study might not be generalizable to the samples collected by other sampling strategies. The operational parameters of sampling were not determined to optimize collection of data specific to a given disease. Future research is warranted to reconfirm the research findings by using a research design that is customized for specific epidemiologic research purposes.

Conclusion
This is the first paper that documents the online Chinese community's reaction to the MERS-CoV outbreak in the Middle East and Europe in 2012, as well as the reaction to the H7N9 outbreak in China in 2013. The reaction to H7N9 was two orders of magnitude stronger than the reaction to MERS-CoV. Similar to the public reaction on the street, the online community's reaction is stronger when the disease outbreak happens nearby. Our study demonstrates the usefulness of using social media to measure the public reaction to disease outbreak information released by health authorities.