Risk factors associated with mortality of COVID-19 in 2692 counties of the United States

Background The number of cumulative conﬁrmed cases of COVID-19 in the United States has risen sharply since March 2020. A county health ranking and roadmaps program has been established to identify factors associated with disparity in mobility and mortality of COVID-19 in all counties in the United States. Methods To various negative binomial was to the into three using


INTRODUCTION
COVID-19 is an infectious disease caused by a novel coronavirus with an estimated average incubation period of 5.1 days (Lauer et al., 2020). It spreads through person-to-person transmission, and has now infected 210 countries and regions with over 2 million total confirmed cases as of April 15, 2020 (National Health Commission of the People's Republic of China, 2020). The United States had 652,474 confirmed cases on April 15, 2020, the highest in the world, but there were only 69 confirmed cases on March 1, 2020 (Centers for Disease Control and Prevention, 2020).
The United States has been suffering from a severe epidemic, with COVID-19 related deaths occurring all over the country. For instance, New York City had the largest number of total deaths, accounting for the vast majority of deaths in the country, while no one in Madison county, North Carolina was infected (Centers for Disease Control and Prevention, 2020). Therefore, it is of great interest to find out the risk factors that influence the number of deaths of COVID-19. It is known that infectious diseases are affected by factors other than medical treatments (Hadler et al., 2016;Noppert et al., 2017). For example, influenza A is associated with obesity (Maier et al., 2018), and the spread of SARS depends on seasonal temperature changes (Lin et al., 2006).

The County Health Rankings and Roadmaps program was launched by both the Robert
Wood Johnson Foundation and the University of Wisconsin Population Health Institute (A Robert Wood Johnson Foundation program, 2020). This program has been providing annual sustainable source data including health outcomes, health behaviors, clinical care, social and economic factors, physical environment and demographics since 2010. We explored putative risk factors that may affect the mortality of COVID-19 in different areas of the United States in order to increase awareness of the disparity and aid the development of risk reduction strategies.

Data sources
We collected the number of cumulative confirmed cases and deaths from March 1 to April 15, 2020, for counties in the United States from the New York Times (New York Times, 2020).
The COVID-19 confirmed cases and deaths were identified by the laboratory RNA test and specific criteria for symptoms and exposures from health departments and U.S. Centers for Disease Control and Prevention (CDC). The county health rankings reports from year 2020 were compiled from the County Health Rankings and Roadmaps program official website (A Robert Wood Johnson Foundation program, 2020). There were 77 measures in each of 3142 counties, including the health outcome, health behaviors, clinical care, social and economic factors, physical environment, and demographics. We refer to the official website of the County Health Rankings and Roadmaps program (A Robert Wood Johnson Foundation program, 2020) for detailed information.

Study areas
As of April 15, 2020, a total of 2,692 counties reported confirmed cases in the United States, leaving 450 counties without confirmed cases of COVID-19 which were excluded from this study. The total number of deaths as of April 15, 2020 was considered as the outcome of this study.

Assessment of covariates in health factors
We divided the putative risk factors (A Robert Wood Johnson Foundation program, 2020) into 5 categories: health behaviors (e.g., access to exercise opportunities, insufficient sleep), clinical care (e.g. primary care physicians ratio), social and economic factors (e.g., racial segregation index), physical environment (e.g., transit problems and air quality), and demographics (age, sex, rural, and race/ethnicity). For example, there were previous studies which identified the air pollution may relate to high levels of COVID-19 (Conticini et al., 2020) and elder population had the high risk in the COVID-19 (Onder et al., 2020). Besides 4 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 2, 2020. ; these identified risk factors, we were interested in the adverse health factors may link to the mortality of COVID-19. Table 1 presented descriptive definition, sources and literature of 12 risk factors. All deaths resulted from complications of COVID-19. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 2, 2020. ; https://doi.org/10.1101/2020.05.18.20105544 doi: medRxiv preprint Continued on next page 6 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

.4 Statistical analysis
The trend of the cumulative confirmed cases varied greatly in counties of the United States.
We used the partitioning around medoids (PAM) clustering algorithm (Zhang et al., 2012;Lei et al., 2012) to assign counties with similar trends into a homogenous class after standardizing the time series of cumulative confirmed cases from March 1 to April 15, 2020. Based on the clustering results, we used the Kruskal-Wallis test (Brunner et al., 2018) to detect whether there were significant differences in the distributions of 12 risk factors across different classes of counties. The 12 risk factors were used to build a negative binomial model (Hilbe, 2011;Zeileis et al., 2008) for every class of the counties. The analysis was conducted in R version 3.6.1.

Validation analysis
We randomly divided counties (samples) into training (70% of the counties) and testing (30% of the counties) in each class. The model obtained from the training data was employed to predict the death counts of COVID-19 in the testing data, and the accuracy was assessed by the root mean square error (RMSE) of the mortality ratio (the number of deaths divided by the number of cumulative confirmed cases).
7 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Three classes of county-level infection in the United States
The clustering analysis grouped the 2,692 counties were assigned into 3 classes. There were 2,523 counties in the first class with the lowest overall cumulative confirmed cases. Its medoid was Austin County in Texas. There were 141 counties in the second class with a median level of overall cumulative confirmed cases. Its medoid was Monroe County in Pennsylvania. There were 28 counties in the third class with the highest overall cumulative confirmed cases. Its medoid was Fairfield County in Connecticut. Here, the PAM algorithm selected the county with most representative data as the medoid in a class (Hilbe, 2011;Zeileis et al., 2008). The geographical distribution of the counties by class was shown in 8 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 2, 2020.  The distributions were significant different (P <0.001) for most of the 12 risk factors. For example, the average population in the low prevalence class was 63,438, which was 8% and 4% of the average populations in the median and high prevalence classes, respectively. The average proportion of rural residents in the low prevalence class was 57.58%, versus 2.5% in the high prevalence class. The segregation index of non-Whites versus Whites was the largest in the high prevalence class, but the smallest in the low prevalence class.

9
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Factors influencing mortality of COVID-19 in the three classes
There was one common factor, namely residential segregation between non-Whites and Whites, which had a statistically significant (P <0.05) effect on mortality in all classes.
The negative binomial model was used to understand the within-class effects of this factor on mortality of COVID-19 as shown in Figure 3. Note that the higher value of residential segregation between non-Whites and Whites the higher mortality of COVID-19. In the high prevalence class, an increase in the residential segregation between non-Whites and Whites resulted in more deaths than other two classes of counties.

10
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 2, 2020. In the median prevalence class, three variables were significantly associated with the deaths of COVID-19. Higher values in the percentage of workforce driving alone to work (0.058, P =0.006), segregation index (0.033, P =0.004), and the percentage of workforce that had more than 30 minutes commute driving alone (0.031, P =0.002) led to an increase in deaths.

11
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 2, 2020. ; https://doi.org/10. 1101 In the high prevalence class, four variables were significantly associated with mortality.
Higher values in the average daily density of PM 2.5 (0.186, P =0.005), segregation index (0.032, P =0.023), the percentage of adults who reported less than average 7 hours sleeping (0.081, P =0.021), and the percentage of population aged over 65 (0.221, P =0.001) caused more deaths.
For each class of counties, the model obtained from the training data was employed to predict the deaths of COVID-19 on April 15, 2020 using the testing data. The corresponding RMSE values for the mortality ratio were 0.09, 0.07 and 0.03, respectively, in the low, median, and high prevalence classes. 12 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 2, 2020.

DISCUSSION
Using the time trends of the cumulative confirmed cases in 2,692 counties in the United States, we categorized those counties into three levels of infection. The low prevalence class counted for 93.7% of the 2,692 counties. Their resident population was remarkably smaller than the other two classes of counties. Thus, the resident population density appeared to be a significant contributor to the mortality of COVID-19. A higher population density may increase more contacts in social distancing (Dowd et al., 2020;Greenstone and Nigam, 2020), leading to a higher risk in mortality of COVID-19. On the contrary, a higher percentage of residents living in rural areas in the median prevalence class of counties may reduce the mortality. The segregation index between non-Whites and Whites revealed the racial disparity in health, leading to differences in health status not only at the individual level but also at the community level (Williams and Collins, 2012). A higher values in the segregation index indicated the poor health status, which may increase the mortality of COVID-19 (Dowd et al., 2020). This health inequality increased the mortality rates of COVID-19 in all classes of counties.
For the low prevalence class of counties, a higher percentage of long-distance commuting workforce was linked to a high level of anxiety for commuters (Van Rooy, 2006). Sleeping time was reported to be associated with the health system (Besedovsky et al., 2019), the higher number of people who have inadequate sleeping time, the adverse effects of sleep on immunity were identified (Irwin, 2002). These two factors together may increase psychological distress and subsequently make people feel vulnerable to COVID-19 (Mazza et al., 2020;Qiu et al., 2020;Wang et al., 2020). Disparities in race and ethnicity were found in the infected populations. For example, Blacks were reported to be prone to COVID-19 (Hooper et al., 2020;Laurencin and McClinton, 2020). However, Hispanic populations in more rural areas may be more protective to COVID-19.
For the median prevalence class of counties, more workforce driving alone to work and commuting long-distance may increase the levels of anxiety (Van Rooy, 2006), leading to the high mortality in COVID-19.

13
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 2, 2020. ; For the high prevalence class of counties, there was an age trend in the mortality rate of COVID-19. In those counties, there was a higher percentage of elderly, indicating a larger population of individuals aged over 65, which increased the mortality rate of COVID-19 (Onder et al., 2020). The air quality also was associated with the mortality rate of COVID-19 (Conticini et al., 2020;Wu et al., 2020;Contini and Costabile, 2020).
This study identified several significant risk factors associated with the mortality of COVID-19, and our findings are highly valuable and timely for the decision-makers to develop strategies in reducing the mortality of COVID-19. The study relied on mortality data on April 15, 2020. The counties were randomly divided into the training and testing data once.
However, we offered the epidemiological picture to facilitate the identification of important factors influencing the mortality of COVID-19 across different levels of infected counties in the United States. Regardless of the regions, the factors linked to the poor health status contributed to higher mortality of COVID-19. Improving the clinical care and eliminating the racial health inequality, combined with improving physical environment were expected to significantly decrease the mortality rate of COVID-19. Thus, we recommended that local governments should reduce physical and psychological risks in residential environments.

ACKNOWLEDGMENTS
We would like to thank all individuals who collected epidemiological data of the COVID-19 outbreak, and the data in the county health ranking and roadmaps program.
14 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 2, 2020. ; https://doi.org/10. 1101