A five-compartment model of age-specific transmissibility of SARS-CoV-2

Background The novel coronavirus, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2, also called 2019-nCoV) causes different morbidity risks to individuals in different age groups. This study attempts to quantify the age-specific transmissibility using a mathematical model. Methods An epidemiological model with five compartments (susceptible–exposed–symptomatic–asymptomatic–recovered/removed [SEIAR]) was developed based on observed transmission features. Coronavirus disease 2019 (COVID-19) cases were divided into four age groups: group 1, those ≤ 14 years old; group 2, those 15 to 44 years old; group 3, those 45 to 64 years old; and group 4, those ≥ 65 years old. The model was initially based on cases (including imported cases and secondary cases) collected in Hunan Province from January 5 to February 19, 2020. Another dataset, from Jilin Province, was used to test the model. Results The age-specific SEIAR model fitted the data well in each age group (P < 0.001). In Hunan Province, the highest transmissibility was from age group 4 to 3 (median: β43 = 7.71 × 10− 9; SAR43 = 3.86 × 10− 8), followed by group 3 to 4 (median: β34 = 3.07 × 10− 9; SAR34 = 1.53 × 10− 8), group 2 to 2 (median: β22 = 1.24 × 10− 9; SAR22 = 6.21 × 10− 9), and group 3 to 1 (median: β31 = 4.10 × 10− 10; SAR31 = 2.08 × 10− 9). The lowest transmissibility was from age group 3 to 3 (median: β33 = 1.64 × 10− 19; SAR33 = 8.19 × 10− 19), followed by group 4 to 4 (median: β44 = 3.66 × 10− 17; SAR44 = 1.83 × 10− 16), group 3 to 2 (median: β32 = 1.21 × 10− 16; SAR32 = 6.06 × 10− 16), and group 1 to 4 (median: β14 = 7.20 × 10− 14; SAR14 = 3.60 × 10− 13). In Jilin Province, the highest transmissibility occurred from age group 4 to 4 (median: β43 = 4.27 × 10− 8; SAR43 = 2.13 × 10− 7), followed by group 3 to 4 (median: β34 = 1.81 × 10− 8; SAR34 = 9.03 × 10− 8). Conclusions SARS-CoV-2 exhibits high transmissibility between middle-aged (45 to 64 years old) and elderly (≥ 65 years old) people. Children (≤ 14 years old) have very low susceptibility to COVID-19. This study will improve our understanding of the transmission feature of SARS-CoV-2 in different age groups and suggest the most prevention measures should be applied to middle-aged and elderly people.

Conclusions: SARS-CoV-2 exhibits high transmissibility between middle-aged (45 to 64 years old) and elderly (≥ 65 years old) people. Children (≤ 14 years old) have very low susceptibility to COVID-19. This study will improve our understanding of the transmission feature of SARS-CoV-2 in different age groups and suggest the most prevention measures should be applied to middle-aged and elderly people.

Background
Coronavirus disease 2019 , caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2, also called 2019-nCoV), has spread around the world. It has been evident from the start of the pandemic that different age groups are at different risks for COVID-19. During the earliest stage of the spread (before January 2, 2020), it was noticed that most transmission involved persons over 18 years old, especially persons aged 25 to 64 [1]. For example, a study involving 425 COVID-19 patients found that they ranged in age from 15 to 89, with a median of 59. Different case distributions were reported in four age groups: 0 to 14, 15 to 44, 45 to 64, and ≥ 65 years [2]. In China, persons 30 to 79 years old accounted for 86.6% of diagnosed cases [3]. The distribution of cases in the Republic of Korea was mainly concentrated in the range of 20-50 years; the peak age was 30 [4]. It has been reported that older people exhibit a high risk of developing severe symptoms or even dying [5]. However, another study indicates that children and adults face the same infectious risk [6], although there have also been reports that younger adults, especially those aged 20 to 24 years, face an increasing risk of COVID-19 after an intervention [7]. Research has also shown that individuals over 65 years of age are more susceptible to infection than those 14 to 64 years old (odds ratio: 1.47, 95% CI: 1.12-1.92) [8]. Looking at these diverse results of age characteristics, it is clear that the role of age in transmission needs to be clarified. In this study, interpersonal transmission of COVID-19 will be further explored to provide improved estimation of transmissibility at different ages.
Several approaches to the mathematical modeling of COVID-19 [9,10], such as calculating the basic reproduction number (R 0 ) using the serial intervals and intrinsic growth rate [2,9,10], or using ordinary differential equations and Markov chain Monte Carlo methods [11], have been proposed. Compartmental models are often applied to infectious diseases, and have sometimes been used to study age-dependent effects [12,13]. For example, Chen et al. developed a Bats-Hosts-Reservoir-People (BHRP) transmission network model and simplified the BHRP model as a person-person (PP) transmission network model to calculate the transmissibility of SARS-CoV-2 [12]. The age-specific transmissibility of influenza A (H1N1) has been studied in a model with five compartments (the Susceptible-Exposed-Symptomatic-Asymptomatic-Recovered/removed [SEIA R] model) [13]. However, no such compartmental model is available for quantifying the age-specific transmissibility of SARS-CoV-2.
In this paper an age-specific SEIAR model based on the PP model is proposed. It is employed to estimate the age-specific transmissibility of SARS-CoV-2 by fitting data collected in Hunan province between January 5 and February 19, 2020. A dataset of COVID-19 cases from Jilin Province is used to test the model further.

Data collection
The present model is based on COVID-19 cases data collected by the Hunan Provincial Center for Disease Control and Prevention (Hunan Provincial CDC) from January 5 to February 19, 2020. The data included patient gender, age, inter-provincial travel history, case type (symptomatic/asymptomatic), exposure date, date of onset, and date of diagnosis. To further test the model, a separate dataset (including age, travel history, case type, and date of onset) collected in Jilin Province from January 5 to February 12, 2020 was also used.

Study design
In this study, COVID-19 patients were divided into four age groups, as is done elsewhere in the published literature [2]. Age-group 1 contained those people who were ≤ 14 years old, group 2 aged 15 to 44 years old, group 3 with those 45 to 64 years old, and group 4 of those ≥ 65 years old. Moreover, each age group was divided into two types, including imported cases (patients who had traveled from other provinces) and secondary cases (patients infected within their home province by imported and local cases). All cases were classified as symptomatic or asymptomatic.
Age-specific transmission model The age-specific SEIAR model is based on the natural history of COVID-19. In the model, people are sorted into five compartments (categories): susceptible (S), exposed (E), symptomatic (I), asymptomatic (A), and recovered/removed (R). The definitions of the five categories are presented in Table 1. The model is based on the following assumptions: a) Susceptible individuals infected by contact with two types of cases: symptomatic/asymptomatic cases from other provinces and secondary cases in their home province. The imported symptomatic individuals are placed in the subcategory I p and the imported asymptomatic individuals in the subcategory A p . b) SARS-CoV-2 can be transmitted within each age group. The transmission rate within a given age group i is denoted as β ii . c) SARS-CoV-2 can be transmitted between different age groups. The transmission rate from age group i to j is β ij and that from j to i is β ji . d) The incubation period of an exposed person is 1/ω, the latent period by 1/ω'. The model assumes that the incubation period is equal to the latent period.
Parameter p (0 ≤ p ≤ 1) gives the proportion of individuals who are asymptomatically infected. Exposed individuals move out of the E compartment into the A compartment at a rate of pωE and into the I (symptomatic) compartment at a rate of (1-p)ωE. e) The transmissibility of the virus from members of A and that from members of I differ by a factor κ (0 ≤ κ ≤ 1). f) The model assumes that infected individuals only spread the virus until they are diagnosed, because (whether symptomatic or asymptomatic) they are removed from the population immediately upon diagnosis. More formally, individuals in categories I and A are transferred into category R after an infectious period of 1/γ and 1/γ', respectively. Moreover, some members of I will die as a result of the infection. The case fatality rate is denoted f.
A flowchart of the model is presented in Fig. 1. The equations of the age-specific SEIAR model are The N is defined as total population. The left side of the equation indicates the instantaneous change rate of S, E, I, A, and R at time t. The subscripts i and j (i ≠ j) represent age groups 1 to 4 in the respective equations.

Parameter estimation
According to the literature, the incubation period was 4 days (interquartile range: 2-7) in the early epidemic in Wuhan City [14]; it was 5.1 days (95% CI: 4.5-5.8) according to other publicly reported data from China [15]. The incubation found by a survey in Ningbo City was 5.5 days (range: 2-18); the analysis of right truncation data in Wuhan City showed that the incubation period was 5 days (95% CI: 2-14) [16,17]. However, the latent period has been reported much less often. In this study, a fit of first-hand data from Hubei Province (Additional file 1) using the gamma distribution gave an incubation period of 3 to 4 days for a single exposure (Fig. 2a) and 10 days for single and multiple exposures (Fig. 2b). Fits were also obtained with nine other distributions (normal, lognormal, skew-normal, log-gamma, Weibull minimum, chi-square, Wald, Laplace, and exponential); the normal, lognormal, and skew-normal distributions provided good fits similar to those with the gamma distribution (Fig. 3). In the model, the incubation period was set to 7 days, the average of the single and multiple exposure gamma-function results. Recalling our assumption that the incubation and latent periods are equal, ω = ω' = 0.1429, with a range from 0.05556-0.5.
According to reference [18], asymptomatic cases constitute 5 to 28% of all COVID-19 cases. The asymptomatic proportion in the Diamond Princess cruise ship was 17.9% (95% CI: 15.5-20.2%) [19]. One study adopted the binomial distribution to estimate the asymptomatic ratio as 30.8% (95% CI: 7.7-53.8%) [20]. The asymptomatic proportion was 20.75% in Ningbo City (15.8% among children) [16]. However, another study has indicated that the percentage of asymptomatic cases is much higher (78%) [21]. In the present work, first-hand data were used: according to the reported data in Hunan Province, 392 secondary cases included 79 asymptomatic ones. Therefore, the asymptomatic proportion (p) was set to 79/392 = 0.2015 in the present model.
One study has indicated that the spreading capacity of symptomatic cases is 3.9 times that of asymptomatic cases [22]. Another study indicated that individuals who closely contacted asymptomatic individuals infected after close contact with asymptomatic cases accounted for 4.11%, versus 6.30% for individuals infected after close contact with symptomatic cases [16]. According to a report from the UK [23], an asymptomatic individual may cause 11 infectious cases. In the model, κ is set to 1.0, thus conservatively allowing for the worst-case scenario that asymptomatic and symptomatic persons are equally infectious. Fig. 2 The fitting results of gamma distribution of incubation period. a Single exposed incubation period; b Single and multiple exposed incubation period A mean delay of 5 days has been reported from symptom onset to detection/hospitalization; in Thailand and Japan, patients were hospitalized between 3 and 7 days following onset [24][25][26]. Another study has indicated that the mean time from illness onset to hospital admission (for treatment and/or isolation) is 3 to 4 days without truncation and 5 to 9 days when right truncated [17]. However Xu et al. [27] reported that the median time from illness onset to initial hospital admission was 2 (range: 1-4) days. A study including 45 patients diagnosed prior to January 1, 2020 estimated the mean time from illness onset to first medical visit as 5.8 days (95% CI: 4.3-7.5) [2]. Another study indicated that the median communicable period of 24 asymptomatic cases was 9.5 days (range: 1-21) [28]. In this study, it is assumed that any person diagnosed with COVID-19 will be removed from the population immediately. Therefore, the infectious period is the same as the number of days from illness onset to diagnosis. Chi-square distribution results indicated that the highest frequency corresponded to day 5 (Fig. 4); therefore, the infectious period of the symptomatic and asymptomatic cases was set to 5 days in this study (γ = γ' = 0.2).
According to the analyses of the data collected by Hubei Provincial CDC, a total of four cases died as a result of COVID-19. Thus, the fatality rate (f) was set to 0.003552 (  The age-specific secondary attack rate (SAR) matrix is defined as the four-by-four matrix with elements SAR ij = β ij /γ, i.e. the rate per encounter at which the virus spreads from age group i to age group j divided by the frequency of removal. The diagonal elements of the matrix give the age-specific SAR values within each age group. Instead of the directly calculated SAR value, the min-max normalized (the lower and upper bounds of relative transmissibility) version is used: Moreover, a "knock-out" simulation was performed as in reference [29] to quantify the age-specific transmissibility of SARS-CoV-2. To "knock out" means to cut off the transmission route between or within the various age groups. The simulation was performed for the following scenarios: A) β ii = 0; B) β ji = 0; C) β ij = 0; D) β jj = 0; and E) control (no cutting off of the transmission route).

Sensitivity analysis
In this study, six parameters were used to analyze the sensitivity of the model: κ

Epidemiological characteristics and of COVID-19
Data for 1126 COVID-19 cases were collected in Hunan Province from January 5 to February 19, 2020. The data showed that 734 cases involved people with a history of traveling to other provinces and 392 were secondary cases in Hunan Province (Fig. 5). In the ≤ 14 years old group, there were 14 imported symptomatic cases, 16 imported asymptomatic cases, 13 secondary symptomatic cases, and 17 secondary asymptomatic cases. In the 15 to 44 years old group, there were 318 imported symptomatic cases, 51 imported asymptomatic cases, 118 secondary symptomatic cases, and 29 secondary asymptomatic cases.
In the 45 to 64 years old group, there were 234 imported symptomatic cases, 22 imported asymptomatic cases, 129 secondary symptomatic cases, and 25 secondary asymptomatic cases. In the ≥ 65 years old group, there were 69 imported symptomatic cases, 10 imported asymptomatic cases, 53 secondary symptomatic cases, and 8 secondary asymptomatic cases.

Sensitivity analysis
In this study, we found that all the values of the parameters we set in the model were included in the range of the simulated values of mean ± SD. The three parameters p, ω, and γ were very sensitive for the model, whereas κ and γ' were not (Fig. 11).

Discussion
This is the first study to develop an age-specific SEIAR model for quantifying the transmissibility of COVID-19 within and between various age groups. The model fitted the reported data effectively, and therefore offers the capability of estimating or predicting the age-specific transmissibility of the virus.
For COVID-19, as for other epidemic diseases, the numbers of reported cases differed among people of various ages [2,18,37]. One study indicated an especially high rate of infection in 30-79 years age group [3]. This is similar to the present study's finding that the highest numbers of cases occur among persons 15-44 and 45-64 years old. It is also important to distinguish imported cases from secondary ones. In this study, it was found that susceptible persons in Hunan Province are most often infected by imported cases from elsewhere (especially from Wuhan City). This finding suggested that the monitoring and management of imported cases should be improved further.
The age-specific model fitted the reported data effectively in three age groups, but less effectively for age group 1 (≤ 14 years old). The poor fit in age group 1 was a result of the low number of cases. However, the age-specific SEIAR model was still suitable for this study. These results were consistent  with an earlier study on shigellosis using an age-sexspecific SEIAR model [29]. According to the model for Hunan Province, the highest transmissibility occurred from ≥ 65 years to 45-64 years, followed by that from 45 to 64 years to ≥ 65 yearolds, that among 15-44 year-olds, and that from 45 to 64 year-olds to ≤ 14 year-olds. Another study, adopting a generalized linear mixed model to divide the total population into three age groups, found that the risk of infection in persons ≥ 65 years old is higher than that in persons of 15-64 years old (odds ratio: 1.47, 95% CI: 1.12-1.92) [8]. This result differs slightly from that of the present study, perhaps simply as a result of the difference in the manner in which the total population was divided into groups. Note that the generalized linear mixed model cannot be used to assess transmission features in the population, such as influence of asymptomatic and imported cases.
In the model given here, the lowest transmissibility occurred among 45-64 years age group, followed by that among ≥ 65 year-olds, that from 45 to 64 year-olds to  year-olds, and that from ≤ 14 year-olds to ≥ 65 year-olds. The "knock-out" simulation results differed from the SAR values, with the following order: from ≥ 65 year-olds to [45][46][47][48][49][50][51][52][53][54][55][56][57][58][59][60][61][62][63][64] year-olds, among 15-44 yearolds, from 45 to 64 year-olds to ≥ 65 year-olds, and from 45 to 64 year-olds to ≤14 year-olds. According to the model, the virus was most likely to be transmitted  This may relate to the custom of middle-aged people caring for their ill parents, resulting in a high contact frequency with the elderly. Although physical fitness and resistance in elders are lower than in younger adults, this study found a high transmissibility from ≥ 65 year-olds to 45-64 year-olds in Hunan Province. This may relate to the lifestyle differences between the generations and to clustering in families. A more detailed study and a larger sample are needed to test the model more extensively. Moreover, relatively high transmissibility was observed among the group comprising 15 to 44 year-olds. This is similar to the age-specific transmissibility of influenza A (H1N1) [13]. Therefore, age-specific control and prevention interventions are necessary. The SAR value is very small (nearly zero) because the model was built on the total population of Hunan Province (68 988 303 persons are so big that it drowns out the signal). SAR was only used in the comparison of relative transmissibility between different age groups, as in a similar study of a sex-based and age-based model of shigellosis in Hubei Province [29]. The results for Jilin Province (especially the importance of transmission among the elderly) differed somewhat from those for Hunan Province (where the most important transmission route is from middle-aged to elders). This may relate to the small sample size of COVID-19 in Jilin Province. Nevertheless, the most significant transmission in Jilin did involve middle-aged and elderly individuals. Here, too, those 15-44 years old have relatively low susceptibility and those ≤ 14 years old very low susceptibility to COVID-19.
The reasons for the age-specific transmissibility differences remain unclear but may be related to the different kinds of contact characteristics of various age groups. Adults are more likely to work outside and to come into contact with different individuals in workplaces, buses, subways, or airplanes. Even under the powerful intervention and management implemented in China, young and middle-aged people would still engage in certain cluster activities such as visiting relatives and having parties. However, children or younger people may have stayed at home constantly during the outbreak and been less likely to be infected, except by adults or elderly people in the same family.
The results were more certain for all parameters that were collected from first-hand data of Hunan Province. Some studies have indicated that infection may occur at the end of an incubation period [38,39]. According to one survey, in 59 out of 468 reports the infected person exhibited symptoms earlier than the person who infected them [40], implying that it is possible to infect other people during the incubation period. However, according to the survey conducted by Hunan Provincial CDC, there is no obvious evidence that exposed persons are infective during the incubation period. This issue will have to be resolved for effective application of this model in other areas. Bai et al. [41] reported an asymptomatic proportion of 0.17. Previous research demonstrated that an asymptomatic infection can shed SARS-CoV-2 for 5 days [42]. This is consistent with the parameter values in the proposed model. However, not enough evidence or first-hand data analyses were available to provide clear epidemiological estimates of the parameters ω' and γ', which are related to asymptomatic Fig. 10 Results of the "knock-out" simulation from the age-specific SEIAR model. β ij refers to transmission relative rate of age group from i to j, i and j represent subscript 1 to 4, subscript 1 was defined as ≤ 14 years, subscript 2 was defined as 15-44 years, subscript 3 was defined as 45-64 years, subscript 4 was defined as ≥ 65 years individuals. Additional epidemiological data are required to explore these parameters. Furthermore, the results of sensitivity analysis also showed that additional accurate first-hand data are needed to better determine the three parameters p, ω, and γ.
Owing to the poor fit in the youngest age group, additional first-hand data are necessary to verify the age-specific SEIAR model. Xie et al. found a positive linear relationship between mean temperature and the number of COVID-19 cases with a threshold of 3°C [43]. Ma et al. found that the temperature and humidity may affect COVID-19 mortality [44]. Furthermore, some studies have suggested that COVID-19 incidence is connected not only to meteorological factors but also to population size [45][46][47]. The present study focused on a short period; therefore, these factors were not considered owing to the limited availability of data. In future work, the meteorological factor related to COVID-19 should be further explored.