Risk factors associated with mortality of COVID-19 in 3125 counties of the United States
Infectious Diseases of Poverty volume 10, Article number: 3 (2021)
The number of cumulative confirmed cases of COVID-19 in the United States has risen sharply since March 2020. A county health ranking and roadmaps program has been established to identify factors associated with disparity in mobility and mortality of COVID-19 in all counties in the United States. The risk factors associated with county-level mortality of COVID-19 with various levels of prevalence are not well understood.
Using the data obtained from the County Health Rankings and Roadmaps program, this study applied a negative binomial design to the county-level mortality counts of COVID-19 as of August 27, 2020 in the United States. In this design, the infected counties were categorized into three levels of infections using clustering analysis based on time-varying cumulative confirmed cases from March 1 to August 27, 2020. COVID-19 patients were not analyzed individually but were aggregated at the county-level, where the county-level deaths of COVID-19 confirmed by the local health agencies. Clustering analysis and Kruskal–Wallis tests were used in our statistical analysis.
A total of 3125 infected counties were assigned into three classes corresponding to low, median, and high prevalence levels of infection. Several risk factors were significantly associated with the mortality counts of COVID-19, where higher level of air pollution (0.153, P < 0.001) increased the mortality in the low prevalence counties and elder individuals were more vulnerable in both the median (0.049, P < 0.001) and high (0.114, P < 0.001) prevalence counties. The segregation between non-Whites and Whites (low: 0.015, P < 0.001; median:0.025, P < 0.001; high: 0.019, P = 0.005) and higher Hispanic population (low and median: 0.020, P < 0.001; high: 0.014, P = 0.009) had higher likelihood of risk of the deaths in all infected counties.
The mortality of COVID-19 depended on sex, race/ethnicity, and outdoor environment. The increasing awareness of the impact of these significant factors may help decision makers, the public health officials, and the general public better control the risk of pandemic, particularly in the reduction in the mortality of COVID-19.
Coronavirus disease 2019 (COVID-19) is an infectious disease caused by a novel coronavirus with an estimated average incubation period of 5.1 days . It spreads through person-to-person transmission, and has now infected 215 countries and regions with over 24 million total confirmed cases as of August 27, 2020 . The United States had 5 867 785 confirmed cases on August 27, 2020, the highest in the world, but there were only 69 confirmed cases on March 1, 2020 .
The United States has been suffering from a severe epidemic, with COVID-19 related deaths occurring all over the country. For instance, New York City had the largest number of total deaths (23 674), accounting for the majority of deaths in the infected counties, while no one in King county, Texas was infected as of August 27, 2020 . Therefore, it is of great interest to find out the risk factors that influence the number of deaths of COVID-19. It is known that infectious diseases are affected by factors other than medical treatments [4, 5]. For example, influenza A is associated with obesity , and the spread of the 2003 severe acute respiratory syndrome (SARS) events depends on seasonal temperature changes .
The County Health Rankings and Roadmaps program was launched by both the Robert Wood Johnson Foundation and the University of Wisconsin Population Health Institute . This program has been providing annual sustainable source data including health outcomes, health behaviors, clinical care, social and economic factors, physical environment and demographics since 2010. We explored putative risk factors that may affect the mortality of COVID-19 in different areas of the United States in order to increase awareness of the disparity and aid the development of risk reduction strategies.
We collected the number of cumulative confirmed cases and deaths from March 1 to August 27, 2020, for counties in the United States from the New York Times . The COVID-19 confirmed cases and deaths were identified by the laboratory RNA test and specific criteria for symptoms and exposures from health departments and US Centers for Disease Control and Prevention (CDC). The county health rankings reports from year 2020 were compiled from the County Health Rankings and Roadmaps program official website . There were 77 measures in each of 3142 counties, including the health outcome, health behaviors, clinical care, social and economic factors, physical environment, and demographics. We refer to the official website of the County Health Rankings and Roadmaps program  for detailed information.
As of August 27, 2020, a total of 3208 counties reported confirmed cases in the United States, leaving 3125 counties with both confirmed cases of COVID-19 and county health ranking data recorded to be analyzed in this study. The total number of deaths as of August 27, 2020 was considered as the outcome of this study.
Assessment of covariates in health factors
We divided the putative risk factors  into five categories: health behaviors (e.g., access to exercise opportunities, insufficient sleep), clinical care (e.g. primary care physicians ratio), social and economic factors (e.g., racial segregation index), physical environment (e.g., transit problems and air quality), and demographics (age, sex, rural, and race/ethnicity). For example, there were previous studies which identified the air pollution may relate to high levels of COVID-19  and elder population had the high risk in the COVID-19 . Besides these identified risk factors, we were interested in the adverse health factors may link to the mortality of COVID-19. The descriptive definition, sources and literature of 12 risk factors are presented in Table 1. All deaths resulted from complications of COVID-19.
The trend of the cumulative confirmed cases varied greatly in counties of the United States. We used the partitioning around medoids (PAM) clustering algorithm [12, 13] to assign counties with similar trends into a homogenous class after standardizing the time series of cumulative confirmed cases from March 1 to August 27, 2020. Based on the clustering results, we used the Kruskal–Wallis test  to detect whether there were significant differences in the distributions of 12 risk factors across different classes of counties. The 12 risk factors were used to build a negative binomial model [15, 16] for every class of the counties. The analysis was conducted in R version 3.6.1. This is an open source statistical analysis software available from R project https://cloud.r-project.org/.
We randomly divided counties (samples) into training (70% of the counties) and testing (30% of the counties) in each class. The model obtained from the training data was employed to predict the death counts of COVID-19 in the testing data, and the accuracy was assessed by the root mean square error (RMSE) of the mortality ratio (the number of deaths divided by the number of cumulative confirmed cases).
Three classes of county-level infection in the United States
The clustering analysis grouped the 3125 counties were assigned into three classes. There were 2751 counties in the first class with the lowest overall cumulative confirmed cases. Its medoid was Halifax County in Virginia. There were 294 counties in the second class with a median level of overall cumulative confirmed cases. Its medoid was St. Clair County in Illinois. There were 80 counties in the third class with the highest overall cumulative confirmed cases. Its medoid was Marion County in Indiana. Here, the PAM algorithm selected the county with most representative data as the medoid in a class [12, 13]. The geographical distribution of the counties by class was shown in Fig. 1, where the size of a circle indicated the cumulative confirmed cases on August 27, 2020. The distribution of deaths on August 27, 2020, which clearly differed among the three classes, was also presented in Fig. 1. Note that the east, south, and west coasts were the most severely hit areas by COVID-19. Most counties in the high prevalence class were from Massachusetts, New York, New Jersey, Florida, Texas and California .
Distributions of 12 selected risk factors in the three classes of counties
The distributions of the 12 selected risk factors by the class of counties are displayed in Fig. 2. The distributions were significant different (P < 0.001) for all 12 risk factors. For example, the average population in the low prevalence class was 38 444, which was 10% and 3% of the average populations in the median and high prevalence classes, respectively. The average proportion of rural residents in the low prevalence class was 64.47%, versus 2.72% in the high prevalence class. The segregation index of non-Whites versus Whites was the largest in the high prevalence class, but the smallest in the low prevalence class.
Factors influencing mortality of COVID-19 in the three classes
There were three common factors, namely, residential segregation between non-Whites and Whites, resident population, and the percentage of Hispanic population, which had statistically significant (P < 0.05) effects on mortality in all classes. The negative binomial model was used to understand the within-class effects of residential segregation between non-Whites and Whites and the percentage of Hispanic population on mortality of COVID-19 as shown in Fig. 3. Note that the higher values of both residential segregation between non-Whites and Whites and the percentage of Hispanic population the higher mortality of COVID-19. In the high prevalence class, an increase in both the residential segregation between non-Whites and Whites and the percentage of Hispanic population resulted in more deaths than other two classes of counties.
The significant factors specific to each class based on the training data are presented in Table 2. Specifically, in the low prevalence class, nine variables were significantly associated with the mortality of COVID-19. Higher values in theaverage daily density of PM2.5 (0.153, P < 0.001), the percentage of workforce driving alone to work (0.039, P < 0.001), the percentage of workforce that had more than 30 min commute driving alone (0.015, P < 0.001), the percentage of adults who reported less than average 7 h sleeping (0.073, P < 0.001), resident population (P < 0.001), the percentage of Hispanic (0.020, P < 0.001) and female population (0.054, P < 0.001) and segregation index (0.015, P < 0.001), significantly increased the number of deaths, while more people living in rural areas (− 0.014, P < 0.001) decreased the number of deaths of COVID-19.
In the median prevalence class, eight variables were significantly associated with the deaths of COVID-19. Higher values in the percentage of workforce driving alone to work (0.016, P = 0.008), the percentage of workforce that had more than 30 min commute driving alone (0.013, P < 0.001), resident population (P < 0.001), the percentage of population aged over 65 (0.049, P < 0.001), the percentage of Hispanic (0.020, P < 0.001) and female population (0.127, P = 0.001) and segregation index (0.025, P < 0.001) led to an increase in deaths as opposite to a decrease in deaths of COVD-19 for more people living in rural areas (− 0.011, P = 0.007).
In the high prevalence class, four variables were significantly associated with mortality. Higher values in resident population (P < 0.001), the percentage of population aged over 65 (0.114, P < 0.001), the percentage of Hispanic population (0.014, P = 0.009) and segregation index (0.019, P = 0.005), caused more deaths.
For each class of counties, the model obtained from the training data was employed to predict the deaths of COVID-19 on August 27, 2020 using the testing data. The corresponding RMSE values for the mortality ratio were 0.056%, 0.041%, and 0.088%, respectively, in the low, median, and high prevalence classes.
Using the time trends of the cumulative confirmed cases in 3125 counties in the United States, we categorized those counties into three levels of infection. The low prevalence class counted for 88% of the 3125 counties. Their resident population was remarkably smaller than the other two classes of counties. But the resident population size increased the mortality of COVID-19 regardless of the level of COVID-19 prevalence. A higher population density may increase more contacts in social distancing [17, 18], leading to a higher risk in mortality of COVID-19. On the contrary, a higher percentage of residents living in rural areas in both the low and median prevalence classes of counties may reduce the mortality. Disparities in race and ethnicity were found in the infected populations. For example, Blacks were reported to be prone to COVID-19 [19, 20], and living settings of racial/ethnic minorities were founded to be more crowded, making social distancing difficulty . In this study, we found that Hispanics were more vulnerable. Further investigation is warranted to study the racial disparity in the mortality of COVID-19. However, the segregation index between non-Whites and Whites revealed the racial disparity in health, leading to differences in health status not only at the individual level but also at the community level . A higher values in the segregation index indicated the poor health status, which may increase the mortality of COVID-19 . This health inequality increased the mortality rates of COVID-19 in all classes of counties.
For the low and the median prevalence class of counties, more workforce driving alone to work and commuting long-distance may increase the levels of anxiety , leading to the high mortality in COVID-19. A higher percentage of long-distance commuting workforce was also linked to a high level of anxiety for commuters . And substantial time spent by long-distance commuters could inhibit their healthy behaviors . The stress and less healthy behaviors may increase individual’s vulnerability to COVID-19 [27,28,29]. Also, long-distance commute may be necessary for people who work in relatively higher dense areas where the risk of COVID-19 is high.
The counties in both the low and the median class of prevalence were accounted for 97.44% of the infected counties, where the higher values in the percentage of female population increased the mortality of COVID-19.
The percentage of adults with inadequate sleeping time was found to increase the mortality of COVID-19 in the low prevalence class of counties. Sleeping time was reported to be associated with the health system . The higher number of people who had inadequate sleeping time, the more adverse effects of sleep on immunity were identified . The air quality also was reported to be associated with the mortality rate of COVID-19 [10, 30, 31].
For both the median and the high prevalence class of counties, there was an age trend in the mortality rate of COVID-19. In those counties, there was a higher percentage of elderly, indicating a larger population of individuals aged over 65, which increased the mortality rate of COVID-19 .
One caveat of this study is that we analyzed data up to August 27, 2020, and as the data evolves, the risk factor dynamics may change accordingly.
This study identified several significant risk factors associated with the mortality of COVID-19, and our findings are highly valuable and timely for the decision-makers to develop strategies in reducing the mortality of COVID-19. The study relied on mortality data on August 27, 2020. The counties were randomly divided into the training and testing data once. However, we offered the epidemiological picture to facilitate the identification of important factors influencing the mortality of COVID-19 across different levels of infected counties in the United States. Regardless of the regions, the factors linked to the poor health status contributed to higher mortality of COVID-19. Improving the clinical care and eliminating the racial health inequality, combined with improving physical environment were expected to significantly decrease the mortality rate of COVID-19. Thus, we recommended that local governments should reduce physical and psychological risks in residential environments.
Availability of data and materials
The data that support the findings of this study are available from the New York Times and the County Health Rankings and Roadmaps program website. The data and R files supporting the conclusions of this article are available in the https://github.com/tingT0929/Risk-factors-associated-with-mortality-of-COVID-19.
Lauer SA, Grantz KH, Bi Q, Jones FK, Zheng Q, Meredith HR, et al. The incubation period of coronavirus disease 2019 (COVID-19) from publicly reported confirmed cases: estimation and application. Ann Intern Med. 2020;172(9):577–82. https://doi.org/10.7326/M20-0504.
China, CDC. Distribution of COVID-19 cases in the world.2020. http://2019ncov.chinacdc.cn/2019-nCoV/global.html. Accessed 30 Aug 2020.
China, CDC. Coronavirus disease 2019 (COVID-19). 2020. https://www.cdc.gov/coronavirus/2019-ncov/cases-updates/cases-in-us.html. Accessed 30 Aug 2020.
Hadler JL, Yousey-Hindes K, Pérez A, Anderson EJ, Bargsten M, Bohm SR, et al. Influenza-related hospitalizations and poverty levels—United States, 2010–2012. Morb Mortal Wkly Rep. 2016;65(5):101–5.
Noppert GA, Yang Z, Clarke P, Ye W, Davidson P, Wilson ML. Individual-and neighborhood-level contextual factors are associated with Mycobacterium tuberculosis transmission: genotypic clustering of cases in Michigan, 2004–2012. Ann Epidemiol. 2017;27(6):371-376.e5. https://doi.org/10.1016/j.annepidem.2017.05.009.
Maier HE, Lopez R, Sanchez N, Ng S, Gresh L, Ojeda S, et al. Obesity increases the duration of influenza a virus shedding in adults. J Infect Dis. 2018;218(9):1378–82. https://doi.org/10.1093/infdis/jiy370.
Lin K, Fong DY-T, Zhu B, Karlberg J. Environmental factors on the SARS epidemic: air temperature, passage of time and multiplicative effect of hospital infection. Epidemiol Infect. 2006;134(2):223–30. https://doi.org/10.1017/S0950268805005054.
Robert Wood Johnson Foundation and the University of Wisconsin Population Health Institute. State Rankings Data & Reports. 2020. https://www.countyhealthrankings.org/reports/county-health-rankings-reports. Accessed 20 Aug 2020.
The New York Times. Coronavirus in the U.S.: latest map and case count. 2020. https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html. Accessed 27 Aug 2020.
Conticini E, Frediani B, Caro D. Can atmospheric pollution be considered a co-factor in extremely high level of SARS-CoV-2 lethality in Northern Italy? Environ Pollut. 2020;261:114465. https://doi.org/10.1016/j.envpol.2020.114465.
Onder G, Rezza G, Brusaferro S. Case-fatality rate and characteristics of patients dying in relation to COVID-19 in Italy. JAMA. 2020;323(18):1775–6. https://doi.org/10.1001/jama.2020.4683.
Zhang LS, Yang MJ, Lei DJ. An improved PAM clustering algorithm based on initial clustering centers. Appl Mech Mater. 2012;135:244–9. https://doi.org/10.4028/www.scientific.net/AMM.135-136.244.
Lei D, Zhu Q, Chen J, Lin H. Automatic PAM clustering algorithm for outlier detection. J Softw. 2012;7(5):1045.
Brunner E, Konietschke F, Bathke AC, Pauly M. Ranks and pseudo-ranks-paradoxical results of rank tests. arXiv. 2018;1802.05650.
Hilbe JM. Negative binomial regression. Cambridge: Cambridge University Press; 2011.
Zeileis A, Kleiber C, Jackman S. Regression models for count data in R. J Stat Softw. 2008;27(8):1–25. https://doi.org/10.18637/jss.v027.i08.
Dowd JB, Andriano L, Brazel DM, Rotondi V, Block P, Ding X, et al. Demographic science aids in understanding the spread and fatality rates of COVID-19. PNAS. 2020;117(18):9696–8. https://doi.org/10.1073/pnas.2004911117.
Greenstone M, Nigam V.Does social distancing matter? University of Chicago, Becker Friedman Institute for Economics. Working Paper. 2020; 202026. https://doi.org/10.2139/ssrn.3561244.
Hooper MW, Nápoles AM, Pérez-Stable E. COVID-19 and racial/ethnic disparities. JAMA. 2020;323(24):2466–7. https://doi.org/10.1001/jama.2020.8598.20.
Laurencin CT, McClinton A. The COVID pandemic: a call to action to identify and address racial and ethnic disparities. J Racial Ethn Health Disparities. 2020;7:398–402. https://doi.org/10.1007/s40615-020-00756-0.
Noonan AS, Velasco-Mondragon HE, Wagner FA. Improving the health of African Americans in the USA: an overdue opportunity for social justice. Public Health Rev. 2016;37:12. https://doi.org/10.1186/s40985-016-0025-4.
Williams DR, Collins C. Racial residential segregation. In: LaVeist TA, Isaac LA, editors. Race, ethnicity, and health: a public health reader. Wiley: New Jersey; 2012. p. 331–46.
Van Rooy DL. Effects of automobile commute characteristics on affect and job candidate evaluations: a field experiment. Environ Behav. 2006;38(5):626–55. https://doi.org/10.1177/0013916505280767.
Wolin KY, Bennett GG, McNeill LH, Sorensen G, Emmons KM. Low discretionary time as a barrier to physical activity and intervention uptake. Am J Health Behav. 2008;32(6):563–9. https://doi.org/10.5993/AJHB.32.6.1.
Besedovsky L, Lange T, Haack M. The sleep-immune crosstalk in health and disease. Physiol Rev. 2019;99(3):1325–80. https://doi.org/10.1152/physrev.00010.2018.
Irwin M. Effects of sleep and sleep loss on immunity and cytokines. Brain Behav Immun. 2002;16(5):503–12. https://doi.org/10.1016/S0889-1591(02)00003-X.
Mazza C, Ricci E, Biondi S, Colasanti M, Ferracuti S, Napoli C, et al. A nationwide survey of psychological distress among Italian people during the COVID-19 pandemic: immediate psychological responses and associated factors. Int J Environ Res Public Health. 2020;17(9):3165. https://doi.org/10.3390/ijerph17093165.
Qiu J, Shen B, Zhao M, Wang Z, Xie B, Xu Y. A nationwide survey of psychological distress among Chinese people in the COVID-19 epidemic: implications and policy recommendations. Gen Psychiatr. 2020;33(2):e100213. https://doi.org/10.1136/gpsych-2020-100213.
Wang C, Pan R, Wan X, Tan Y, Xu L, Ho CS, et al. Immediate psychological responses and associated factors during the initial stage of the 2019 coronavirus disease (COVID-19) epidemic among the general population in China. Int J Environ Res Public Health. 2020;17(5):1729. https://doi.org/10.3390/ijerph17051729.
Wu X, Nethery RC, Sabath BM, Braun D, Dominici F. Exposure to air pollution and COVID-19 mortality in the United States. MedRxiv. 2020. https://doi.org/10.1101/2020.04.05.20054502.
Contini D, Costabile F. Does air pollution influence COVID-19 outbreaks? Atmosphere (Basel). 2020;11(4):377. https://doi.org/10.3390/atmos11040377.
Martelletti L, Martelletti P. Air pollution and the novel Covid-19 disease: a putative disease risk factor. SN Compr Clin Med. 2020;2:383–7. https://doi.org/10.1007/s42399-020-00274-4.
Wenham C, Smith J, Morgan R. COVID-19: the gendered impacts of the outbreak. Lancet. 2020;395(10227):846–8. https://doi.org/10.1016/S0140-6736(20)30526-2.
Abdulamir AS, Hafidh RR. The possible immunological pathways for the variable immunopathogenesis of COVID-19 infections among healthy adults, elderly and children. Electron J Gen Med. 2020;17(4):em202. https://doi.org/10.29333/ejgm/7850.
We would like to thank all individuals who are collecting epidemiological data of the COVID-19 outbreak around the world.
This work was supported by the National Natural Science Foundation of China (No. 71991474; No. 12001554; No. 11771462), the Key Research and Development Program of Guangdong, China (No. 2019B020228001), the Science and Technology Program of Guangzhou, China (No. 202002030129), and the Pearl River S&T Nova Program of Guangzhou (No. 201806010142). The funding agencies had no role in the study design, data collection, analysis, decision to publish, or preparation of the manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
About this article
Cite this article
Tian, T., Zhang, J., Hu, L. et al. Risk factors associated with mortality of COVID-19 in 3125 counties of the United States. Infect Dis Poverty 10, 3 (2021). https://doi.org/10.1186/s40249-020-00786-0