Transmissibility of COVID-19 in 11 major cities in China and its association with temperature and humidity in Beijing, Shanghai, Guangzhou, and Chengdu

Background The new coronavirus disease COVID-19 began in December 2019 and has spread rapidly by human-to-human transmission. This study evaluated the transmissibility of the infectious disease and analyzed its association with temperature and humidity to study the propagation pattern of COVID-19. Methods In this study, we revised the reported data in Wuhan based on several assumptions to estimate the actual number of confirmed cases considering that perhaps not all cases could be detected and reported in the complex situation there. Then we used the equation derived from the Susceptible-Exposed-Infectious-Recovered (SEIR) model to calculate R0 from January 24, 2020 to February 13, 2020 in 11 major cities in China for comparison. With the calculation results, we conducted correlation analysis and regression analysis between R0 and temperature and humidity for four major cities in China to see the association between the transmissibility of COVID-19 and the weather variables. Results It was estimated that the cumulative number of confirmed cases had exceeded 45 000 by February 13, 2020 in Wuhan. The average R0 in Wuhan was 2.7, significantly higher than those in other cities ranging from 1.8 to 2.4. The inflection points in the cities outside Hubei Province were between January 30, 2020 and February 3, 2020, while there had not been an obvious downward trend of R0 in Wuhan. R0 negatively correlated with both temperature and humidity, which was significant at the 0.01 level. Conclusions The transmissibility of COVID-19 was strong and importance should be attached to the intervention of its transmission especially in Wuhan. According to the correlation between R0 and weather, the spread of disease will be suppressed as the weather warms.


Background
On December 8, 2019, the first case of unexplained pneumonia was officially reported in Wuhan, the capital of Hubei Province in China [1]. There have been reports of the new coronavirus disease (coronavirus disease 2019, COVID-19, named by the World Health Organization on February 11, 2020) since December 2019 [1,2]. As was reported by the National Health Commission of the People's Republic of China, the number of confirmed cases had reached 63 851 by February 13, 2020 in China, including 1380 deaths. On the same day, Hubei Province alone totally had 51 986 confirmed cases including 1318 deaths, accounting for 81.4% and 95.5% of the whole country respectively. Among them there were 35 991 confirmed cases and 1016 deaths in Wuhan, accounting for 69.2% and 77.1% of the number in Hubei Province respectively [3]. The cumulative number of confirmed cases keeps rising, indicating the strong transmissibility of COVID-19, especially in Wuhan, Hubei Province. Therefore, it is of great importance to adopt reasonable indicators to assess the transmission ability of the disease, based on which effective intervention and control measures could be put forward [4,5].
The basic reproduction number (R 0 ) refers to the expected number of cases generated from a single case when all people are susceptible to infection [6]. It is widely used to evaluate the transmission ability of an emerging infectious disease and determine what degree of control measures should be taken to eradicate the disease [7][8][9][10]. When R 0 > 1, the disease starts to spread; and when R 0 < 1, the disease is effectively controlled [11]. R 0 is influenced by many other factors except for the characteristics of the disease itself, such as conditions of the environment, policies of the government, people's awareness of infectious diseases, and social behavior. Therefore, we can use R 0 to measure the transmissibility of COVID-19 and analyze its influencing factors, which provides data support for suggestion-proposing and decision-making.
Research on transmissible diseases like influenza [12], severe acute respiratory syndrome (SARS) [13] and Middle East respiratory syndrome (MERS) [14] has found that disease transmission is associated with temperature and humidity of the environment [15][16][17][18][19][20]. In terms of biological methods, influenza virus spread was found to be promoted by cold temperature and low relative humidity with the guinea pig as a model host [21]; besides, an experiment on the SARS coronavirus indicated that high temperature and high humidity suppressed the spread of the virus [22]; similarly, MERS coronavirus was more stable when temperature or humidity was lower [23]. In terms of statistical methods, case studies of SARS in four major cities in China suggested that the transmissibility had a close relationship with temperature and its variation [24]; and a regression equation was derived to show how temperature, relative humidity, and wind velocity affected the transmission of SARS [25]. Thus we wonder if the spread of COVID-19 follows a similar pattern. Considering that R 0 is useful for measuring the transmission ability of infectious diseases, we conducted association analyses between R 0 and temperature, relative humidity, and absolute humidity respectively. Statistical methods such as correlation and regression were adopted for the analysis.
This paper measured the transmissibility of COVID-19 with R 0 and analyzed its correlation with temperature and humidity. First, we revised the epidemiological data in Wuhan to make R 0 more accurate. Second, we calculated R 0 and compared the average value and developing trend of R 0 in 11 cities including Wuhan. Third, we conducted correlation and regression analysis between R 0 and temperature and humidity to see the association between R 0 and weather.

Data acquisition and preprocessing
The daily accumulative number of confirmed cases and new additions are reported by the National Health Commission of the People's Republic of China as well as the health commission of each province on the official website. An R package has been developed to access the epidemiological data directly [26]. The R package was used by us to acquire the number of total cases and new additions from January 18, 2020 to February 13, 2020 in Wuhan, Hubei Province considering that the situation there was complex and needed much attention. Besides, we also collected the daily-reported accumulative number of confirmed cases from January 24, 2020 to February 13, 2020 in 10 Chinese major cities outside Hubei Province including Beijing, Chengdu, Chongqing, Guangzhou, Hangzhou, Hefei, Nanjing, Shanghai, Shenzhen, and Zhengzhou (listed by initials) for further calculation, estimation, and analysis. The reasons for selecting those 10 cities were that they were first-tier cities or capital cities in China with the top number of cases. Certainly, Wuhan also met the criteria. Those cities could well represent the process status of the disease based on which disposal measures could be put forward.
As for Wuhan, it was estimated by Imperial College London, UK that the total number of confirmed diagnoses had reached 4000 by January 18, 2020 [27], which was much higher than the officially reported number. So we attempted to revise the data in Wuhan to infer the actual transmissibility of the new coronavirus. With the substantial enhancement of case detection and reporting, the differences between the official numbers and the estimates are predicted to be fewer and fewer. There are several assumptions for the data-preprocessing procedure: 1) The first case appeared on December 8, 2019 in Wuhan and transmission started from that day on [1,28]. 2) The cumulative number of cases Y (t) by day t since the first single case followed the exponential function Y (t) = e λt in early development [29].
3) The cumulative number of cases on January 18, 2020 was 4000, that was, Y (41) = 4000 [27]. 4) From February 13, 2020 on, all cases in Wuhan can be confirmed and the number of daily new cases is correct, given that the number of newly confirmed diagnoses on February 12, 2020 in Wuhan increased significantly, exceeding 10 000.
Based on those assumptions, the data-revising procedure in Wuhan is as follows:  As for other cities outside Hubei Province, it is assumed that the officially reported data is accurate. Based on the relationship ln[ Y (t)] = λt, we performed logarithmic fitting between the cumulative number of diagnoses and time and inferred that transmission started on December 27, 2019 outside Hubei Province.

Calculation of the basic reproduction number
The basic reproduction number indicates the average number of people infected by a patient during the infectious period in the absence of control interventions [6]. It is also denoted R 0 , which measures the transmissibility of infectious diseases. There are several ways to estimate R 0 , including formula derivation [30,31] and model fitting [32][33][34].
We describe the transmission pattern of COVID-19 with the Susceptible-Exposed-Infectious-Recovered (SEIR) model. In the exposed stage, an individual infection is not able to infect others. The duration of the exposed stage T E is also called the latent period. While in the infectious stage with a duration of T I , an infected person does infect susceptible people. Assuming that the cumulative number of confirmed diagnoses increases exponentially in the early stages of an epidemic, the relationship between the basic reproduction number R 0 and the exponential growth rate λ can be written as [35].
The serial interval T g is the sum of T E and T I . Let f = T E /T g be the ratio of the latent period to the serial interval, and then the basic reproduction number can be expressed as [29].
The exponential growth rate is λ = ln[ Y (t)] /t, where t is the number of days required to generate the cumulative number of Y (t) cases from the first case. According to the research on the first 425 patients with confirmed COVID-19, the mean latent period T E = 5.2 (days) and the mean serial interval T g = 7.5 (days) [36]. Adopting these values, we can calculate the ratio of the latent period to the serial interval by f = T E /T g = 5.2/7.5 = 0.69.

Correlation and regression analysis between R 0 and weather
Correlation analysis is a commonly used statistical method to study the relationship between variables [37]. Regression analysis determines the quantitative relationship between two variables in statistics [38]. Among all kinds of regression methods, linear regression establishes the relationship between the dependent variable Y and the independent variable X with a linear equation Y = a + bX [39]. There are two coefficients in the equation, a as the intercept and b as the slope. We performed correlation analysis and linear regression between R 0 and weather 1) We collected the data of the daily average temperature and relative humidity from January 24, 2020 to February 13, 2020 in four Chinese major cities which were Beijing (the capital of China), Shanghai (the municipality of China), Guangzhou (the capital of Guangdong Province) and Chengdu (the capital of Sichuan Province). We calculated absolute humidity from the temperature and relative humidity. 2) We imported the data of temperature, relative humidity, and absolute humidity together with R 0 into the SPSS software and added cities as the classification label. 3) Through correlation analysis, the Pearson correlation coefficients between R 0 and temperature, relative humidity, and absolute humidity were calculated respectively. 4) Through regression analysis, the intercept a and the slope b of the linear equation were estimated with R 0 as the dependent variable Y and temperature, relative humidity or absolute humidity as the independent variable X. 5) We split the data by the city label and repeated procedure 3 and 4 for each city separately.

Sensitivity analysis of R 0
To analyze the sensitivity of R 0 to the three key parameters in Eq. (4): R 0 to λ, T g and f respectively: The sensitivity of the basic reproduction number R 0 to the exponential growth rate λ, the serial interval T g , and the latent period ratio f can be estimated according to the range of variables and the scale of partial derivatives.

Comparisons of transmission among different cities
The comparison between officially reported data and revised data in Wuhan is presented in Fig. 1 with important points marked on it. The estimated number of cumulative cases was higher than the official number every day, and it had reached 46 933 by February 13, 2020, which was 1.3 times that of the official number 35 991. The unusual high peak of new cases on February 12, 2020 was smoothed by revision.
The calculation results of the basic reproduction number R 0 from January 24, 2020 to February 13, 2020 in 11 Chinese major cities are shown in Fig. 2. The values with the label "Wuhan" were calculated using the officially reported number of cases, while those with "Wuhan (revised)" were calculated using the revised number of cases. In this way, the broken line of "Wuhan" reflects the changing trend of R 0 , and the one of "Wuhan (revised)" reflects the value size of R 0 . It is assumed that the cumulative number of confirmed cases reported officially in cities outside Hubei Province is accurate, so the broken lines of the other 10 cities represent not only trends but also actual values.
As can be seen from Fig. 2, R 0 in Wuhan is significantly higher than those in cities outside Hubei Province. Besides, R 0 in cities outside Hubei Province has begun to decrease, while R 0 in Wuhan does not show a significant downward trend.
For a more detailed analysis, the average basic reproduction number of the 21 days in each city and the date of the inflection point are presented in Table 1. The cities are listed by the average R 0 from high to low. The inflection point refers to the day after which R 0 shows a downward trend. It can be seen from Table 1 that the average R 0 in Wuhan far exceeds those in other cities, which is 0.3 higher than that in Chongqing, the city which ranks second. It should be noted that the average R 0 in Wuhan is calculated with the revised data to better fit the real value. In fact, the average basic reproduction number calculated with the officially reported data is also much higher than those in other cities, which is 2.4.
The inflection points of cities outside Hubei Province range from January 30 to February 3, while the inflection point of Wuhan had not appeared because the number of confirmed cases had kept increasing rapidly by February 13, 2020. Although R 0 in Wuhan reaches a peak on February 12, it cannot be determined that February 12 is the inflection point. Because since that day, Hubei Province has included the number of clinically diagnosed cases into the number of confirmed cases. The modification of the diagnostic criteria leads to a sudden increase of newly confirmed patients, which explains why R 0 is particularly high on February 12.

Correlation between R 0 and temperature
The Pearson correlation coefficients and significance between R 0 and temperature are shown in Table 2. The row of "Summary" suggests that calculated as a whole, the correlation between R 0 and temperature is statistically significant at the 0.01 level. The correlation coefficient is -0.459, so R 0 and temperature have a negative correlation, which means that R 0 decreases as the temperature increases. The higher the temperature, the lower  14.9 • C, and 9.9 • C respectively. There is not a significant relationship between the average R 0 in a city versus its average temperature (r = −0.486, P > 0.5).
Linear regression was performed on the data for all cities combined as well as the data in Shanghai and Chengdu which showed a significant correlation. Table 3 presents the linear regression results. Replace a and b in the equation R 0 = a + bT (where T is temperature) with the corresponding actual values in Table 3, and correlation between R 0 and temperature can be expressed more precisely. For example, the linear regression equation of Shanghai is written as R 0 = 2.424 − 0.026T. It can be inferred from b < 0 that R 0 negatively correlates with temperature in Shanghai, which is consistent with the correlation analysis result above.
We plotted every pair of temperature and R 0 in a city or the whole data on the scatter figure to make correlation more intuitive, which was presented in Fig. 3. The regression lines followed the corresponding linear regression equations.

Correlation between R 0 and relative humidity
The Pearson correlation coefficients and significance between R 0 and relative humidity are presented in Table 4. According to the first row, the correlation between R 0 and relative humidity is statistically significant at the 0.01 level in general. The correlation coefficient is -0.391, indicating that R 0 decreases as the relative humidity increases. As for the analysis of each city, R 0 negatively correlates with relative humidity in Beijing and Shanghai, which is significant at the 0.01 level. While the correlation is significantly positive in Chengdu at the 0.01 level, which implies that the transmission ability and relative humidity have consistent trends there. Correlation is not significant in Guangzhou.
The correlation was significant in Beijing, Shanghai, and Chengdu, and thus we conducted linear regression on the data of the three cities as well as the summary of all cities. The linear regression results are presented in Table 5. Replace a and b in the equation R 0 = a + b RH (where RH is relative humidity) with the corresponding actual values in Table 5, and the correlation between R 0 and relative humidity can be expressed with a quantitative method.
The scatterplots and corresponding regression lines of relative humidity and R 0 summarized across all cities and by individual cities are presented in Fig. 4.

Correlation between R 0 and absolute humidity
The Pearson correlation coefficients and significance between R 0 and absolute humidity are presented in Table 6. The negative correlation between R 0 and absolute humidity is significant in general as well as in Beijing, Shanghai and Guangzhou and the absolute values of the Pearson correlation coefficients for absolute humidity are larger than those for relative humidity, indicating that the relationship is stronger for absolute humidity than We conducted linear regression on the data of Beijing, Shanghai, Guangzhou as well as the summary of all cities. The linear regression results are presented in Table 7. Replace a and b in the equation R 0 = a + b AH (where AH is absolute humidity) with the corresponding actual values in Table 7, and the correlation between R 0 and absolute humidity can be expressed with a quantitative method.
The scatterplots and corresponding regression lines of absolute humidity and R 0 summarized across all cities and by individual cities are presented in Fig. 5.

Sensitivity of R 0 to parameters
Substitute the variables in Eqs. (4-7) with λ = 0.1372 (which is the average λ from January 24 to February 13 in Beijing), T g = 7.5 and f = 0.69, and the specific values can be calculated: When the variables fluctuate within a small range around the given value, R 0 increases as λ or T g increases and decreases as f increases. λ, T g and f range at the scales of 10 −2 , 10 0 and 10 −1 respectively. And the scales of their partial derivatives are 10 1 , 10 −1 and 10 −1 . Thus the fluctuation scales of R 0 are 10 −1 , 10 −1 and 10 −2 corresponding to λ, T g and f , which implies that R 0 is more sensitive to λ and T g than f . The accuracy of parameters or variables is important for the estimation of the basic reproduction number. As the research on COVID-19 progresses, we can get more precise data and better describe the transmission pattern of the new coronavirus. But the calculation in this paper still makes sense, considering that we focus on relative values instead of absolute values of R 0 in comparison and correlation analysis. Results are reasonable as long as we use the consistent equation and parameters to calculate R 0 . By comparison, we can see that the control of COVID-19 is especially urgent in Wuhan and people in other cities should also attach importance to inhibiting the spread of the disease. The vigilance cannot languish until R 0 drops below 1.

Differences between correlation and causation
In this paper, we discovered the negative correlation between the transmissibility of COVID-19 and temperature and humidity. However, it should be emphasized that correlation is different from causation. According to the Oxford Dictionary, correlation is a connection between two things in which one thing changes as the other does, while causation is the process of one event causing or producing another event. We are not able to infer the causal relationship between two variables solely based on the correlation between them. Correlation is the necessary and insufficient condition of causation. Our results indicated that the transmissibility of COVID-19 was likely to decrease as the temperature and humidity increased. But it did not mean that the increase of temperature or humidity was the cause of the decrease of the transmissibility. We were not able to control other variables in the observation, such as population migration and interventions, which might also affect the transmissibility of COVID-19. So perhaps future work is needed to find out if the changes in temperature or humidity cause the changes in the transmissibility. For example, biological experiments can be conducted by setting the temperature or humidity as the independent variable and the transmissibility of the coronavirus as the dependent variable and controlling other irrelevant variables with the elimination method, constant method, matching method or randomization. Nevertheless, this paper makes sense in terms of confirming that the transmissibility of COVID-19 has a correlation with temperature and humidity and that there is probably a causation relationship between them which deserves further research.

Effects of temperature and humidity on the transmission of COVID-19
A recent study indicated that temperature and relative humidity held no significant associations with the transmissibility of COVID-19 [40]. It is a very comprehensive and well-conducted research, but we took a step further to take the time series into account by using everyday temperature and humidity. The results show that the overall correlation between R 0 and temperature or humidity is significantly negative, which is consistent with the results of the biological and statistical research on other infectious diseases. It could be explained in several aspects. First, in terms of biological characteristics, a lot of research has confirmed that viruses decay more quickly at high temperature and high humidity [19,41,42]. Second, in terms of the transmission media, viruses spread as droplets or aerosols, which maintain large particle sizes at high humidity and thus can settle rapidly or be blocked by masks, nasal cavity, etc [19]. Third, in terms of human immunity, high temperature and high humidity protect the immune organs and benefit people's health. To sum up, the spread of COVID-19 is likely to weaken at relatively high temperature and humidity and special attention should be paid to the prevention and control of COVID-19 in the coming winter.  As for the correlation in each city, R 0 negatively correlates with both temperature and humidity in Shanghai; R 0 negatively correlates with humidity in Beijing, while the correlation with temperature is not significant; R 0 negatively correlates with absolute humidity in Guangzhou, while the correlation with temperature and relative humidity is not significant; R 0 negatively correlates with temperature in Chengdu, while the correlation with relative humidity is positive and the correlation with absolute humidity is not significant. The deviation of the results may be due to several factors.
First, considering that COVID-19 began in winter, people's activity and virus transmission mainly occur indoors. In China, the cities north of the Qinling Mountains-Huaihe River Line have central heating indoors in winter. Beijing is north of the Qinling Mountains-Huaihe River Line and Shanghai, Guangzhou, and Chengdu are south of the line. Therefore, the indoor temperature is probably much higher than the outdoor temperature in Beijing, while the indoor temperature may follow a similar pattern as the outdoor temperature in the other three cities. The indoor temperature is probably higher in Beijing than that in Shanghai, Guangzhou, and Chengdu. Although the indoor temperature and the outdoor temperature may have some association, it would be better if we could measure the indoor temperature directly. As for humidity, it has been found that outdoor absolute humidity can be more reliably used as a proxy for indoor exposure compared with relative humidity [43,44]. Therefore, the correlation between R 0 and absolute humidity may better reveal the situation indoors than relative humidity. Actually, the Pearson correlation coefficients between R 0 and absolute humidity are larger than those between R 0 and relative humidity, proving that the relationship is stronger for absolute humidity than relative humidity.
Second, although Beijing, Shanghai, Guangzhou, and Chengdu are all first-tier cities in China with many similarities like buildings, there are some differences between Chengdu and the other cities that may help explain the positive correlation with relative humidity. Chengdu is located in the southwest of China, the west of Sichuan Basin and the hinterland of Chengdu Plain with a subtropical monsoon humid climate, different from Beijing which has a warm temperate semi-humid continental monsoon climate. The air is more humid in Chengdu than that in Beijing. The climate in Chengdu is similar to the subtropical monsoon climate in Shanghai and Guangzhou, but Chengdu is an inland city while Shanghai and Guangzhou are coastal cities.
Third, the effect of weather on COVID-19 is complicated. The joint distribution between weather and potential confounders should be taken into account. For example, population movement might trigger the transmission of COVID-19 [45]. As for the effects of interventions, we have plotted the time series of temperatures from January 24, 2020 to February 13, 2020 in Beijing, Shanghai, Guangzhou, and Chengdu in Additional file 1: Figure S1. It could be seen from the figure that the temperature kept fluctuating during this period. Considering that the strength of interventions was relatively steady without big fluctuations, which was dif-ferent from the trends of temperature, perhaps the effects of interventions could be separated from the trends in temperature.

Conclusions
In this paper, we calculated and compared the basic reproduction number of COVID-19 in 11 major cities in China and analyzed its association with temperature and humidity in Beijing, Shanghai, Guangzhou, and Chengdu to find out the transmissibility of COVID-19 in different cities and its changing trend with the weather. We conclude that the spread of COVID-19 is most violent in Wuhan, Hubei Province and R 0 negatively correlates with temperature, relative humidity, and absolute humidity. Therefore, effective action should be taken to control the transmission of COVID-19 especially in Hubei Province and the transmissibility is predicted to be reduced as the weather warms.
Additional file 1: Figure S1. The time series of temperature in Beijing, Shanghai, Guangzhou and Chengdu.