Characterization of SARS-CoV-2 worldwide transmission based on evolutionary dynamics and specific viral mutations in the spike protein

Background The coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome-related coronavirus-2 (SARS-CoV-2) is pandemic. However, the origins and global transmission pattern of SARS-CoV-2 remain largely unknown. We aimed to characterize the origination and transmission of SARS-CoV-2 based on evolutionary dynamics. Methods Using the full-length sequences of SARS-CoV-2 with intact geographic, demographic, and temporal information worldwide from the GISAID database during 26 December 2019 and 30 November 2020, we constructed the transmission tree to depict the evolutionary process by the R package “outbreaker”. The affinity of the mutated receptor-binding region of the spike protein to angiotensin-converting enzyme 2 (ACE2) was predicted using mCSM-PPI2 software. Viral infectivity and antigenicity were tested in ACE2-transfected HEK293T cells by pseudovirus transfection and neutralizing antibody test. Results From 26 December 2019 to 8 March 2020, early stage of the COVID-19 pandemic, SARS-CoV-2 strains identified worldwide were mainly composed of three clusters: the Europe-based cluster including two USA-based sub-clusters; the Asia-based cluster including isolates in China, Japan, the USA, Singapore, Australia, Malaysia, and Italy; and the USA-based cluster. The SARS-CoV-2 strains identified in the USA formed four independent clades while those identified in China formed one clade. After 8 March 2020, the clusters of SARS-CoV-2 strains tended to be independent and became “pure” in each of the major countries. Twenty-two of 60 mutations in the receptor-binding domain of the spike protein were predicted to increase the binding affinity of SARS-CoV-2 to ACE2. Of all predicted mutants, the number of E484K was the largest one with 86 585 sequences, followed by S477N with 55 442 sequences worldwide. In more than ten countries, the frequencies of the isolates with E484K and S477N increased significantly. V367F and N354D mutations increased the infectivity of SARS-CoV-2 pseudoviruses (P < 0.001). SARS-CoV-2 with V367F was more sensitive to the S1-targeting neutralizing antibody than the wild-type counterpart (P < 0.001). Conclusions SARS-CoV-2 strains might have originated in several countries simultaneously under certain evolutionary pressure. Travel restrictions might cause location-specific SARS-CoV-2 clustering. The SARS-CoV-2 evolution appears to facilitate its transmission via altering the affinity to ACE2 or immune evasion. Graphic Abstract Supplementary Information The online version contains supplementary material available at 10.1186/s40249-021-00895-4.

COVID-19 was first diagnosed in the USA on January 19, 2020 [6]. Since then, the number of COVID-19 cases has continually increased globally and become a pandemic. As an RNA virus, SARS-CoV-2 often mutates. Soon after the outbreak, SARS-CoV-2 mutations including D614G appeared [7]. SARS-CoV-2 infects humans via binding to its receptor ACE2, a key step in cell entry. The high-affinity binding of the spike (S) protein to human ACE2 is an essential prerequisite for rapid transmission of SARS-CoV-2 in humans. The strains with mutations at the ACE2 binding site including Alpha (B.1.1.7), Beta (B.1.351), Gamma (P.1), and Delta (B.1.617.2) increase viral infectivity and immune evasion, thus becoming regional adaptive strains [8,9]. The affinity of the S protein binding to human ACE2 reflects the direction of SARS-CoV-2 evolution in humans. It is important to identify the specific mutations worldwide, especially the mutations in the S protein, and their changing affinity to ACE2. However, origination, evolution, and transmission patterns of SARS-CoV-2 remain largely unknown.
Whole genome sequencing, phylogenetic analysis, and transmission reconstruction of pathogens are important tools and promising approaches for understanding the spread of infectious diseases in near real time, allowing to pinpoint outbreak origins and to resolve transmission patterns at multiple geographic scales [10][11][12][13]. Here, we conducted bioinformatics analysis to speculate possible recombination, origins and transmission processes of SARS-CoV-2 and evaluate the influence of mutations in the S protein on the transmission of SARS-CoV-2. Then, cell experiments were performed to evaluate the effects of specific viral mutations on the infectivity and immunoreaction of neutralizing antibody against SARS-CoV-2. This study helps elucidate the evolution of SARS-CoV-2 and develop suitable prophylactic options to fight against COVID-19.

Retrieval of SARS-CoV-2 full-length sequences worldwide
All full-length sequences or segments of human SARS-CoV-2 updated to 30 November 2020 were retrieved from the GISAID database (https:// www. gisaid. org/) [14]. To reconstruct the transmission network of SARS-CoV-2, we included the full-length sequences of SARS-CoV-2 according to the following criteria: (i) the sequences with information of geographic locations where the viruses were identified; (ii) those with the dates of collection and with available information of patients; (iii) the genome length of > 29 000 bp; (iv) undefined bases < 1%; and (v) no insertion or deletion unless verified by submitters. In total, 8795 of the 230 103 sequences met the criteria and were all included in the evolutionary analysis.
The number of confirmed cases of COVID-19 surpassed 100 000 globally and the World Health Organization (WHO) declared COVID-19 as a pandemic on early March 2020 (https:// www. who. int/ news/ item/ 29-06-2020-covid timel ine). Therefore, we defined the period from 26 December 2019 to 8 March 2020 as the early stage of the pandemic. Of the 8795 full-length SARS-CoV-2 strains included in evolutionary analysis, 1861 were harvested at this stage.

Quantitative monitoring of SARS-CoV-2 strains
For monitoring the quantity change of mutant strains, specific sequences or segments were counted through the online tool offered by GISAID, allowing to count the exact number of specific mutants in certain locations and periods (https:// www. gisaid. org/) [14]. The quantitation changes of local SARS-CoV-2 mutant strains were monitored as previously reported [7]. Briefly, the onset time of the local epidemic of mutant strains referred to the date when the cumulative number of specific sequences reached 15. Relevant mutants were analyzed only when the numbers of certain strains reached 100 locally by the deadline (30 May, 2021). The comparison between the proportions of mutant strains before and after the onset time was made by the two-sided Fisher's exact test.

Reconstruction and visualization of transmission tree
The R package "outbreaker", a statistical method exploiting viral genetic sequences and collection dates, was applied to reconstruct the transmission tree [17]. The sequences in the evolutionary dynamics analysis included 1861 strains from 26 December, 2019 to 8 March, 2020; 1432 strains from 9 to 31 March, 2020; 1476 strains from 1 to 30 April, 2020; 1591 strains from 1 May to 30 June, 2020; 1447 strains from 1 July to 31 August, 2020; and 988 strains from 1 September to 30 November, 2020. Among those 8795 sequences, 2000 were randomly selected to depict the evolution network from 26 December 2019 to 30 November, 2020, according to stratified randomization by quantity in each month and a table of random numbers. By combining genomic sequences and collection dates of SARS-CoV-2, network analysis of the viral evolutionary process was performed. Gephi 0.9.2 software (Gephi Consortium 2010) was applied for network visualization [18]. The Force Atlas and Fruchterman Reingold models were applied to align isolates in Gephi. In the network, COVID-19 patients were set as nodes, whose colors represented locations. The distance between clades represented evolutionary distances. Colors of lines between clades represented the direction of evolution. Lines inherited colors from parental clades.

Predicting amino acid mutations and their effect
The 1861 strains from early stage of the SARS-CoV-2 pandemic were selected to analyze amino acid mutations and predict the change of the affinity to ACE2 compared to the reference sequence. Affinity changes were also predicted among strains collected after the early stage, using 2000 randomly selected sequences collected from 9 March, 2020 to 30 November, 2020. The selection of 2000 sequences was finished according to stratified randomization by quantity in each month and a table of random numbers. To summarize amino acid mutations, Glimmer v3.02 was applied to analyze the open reading frames (ORFs) of the S protein of SARS-CoV-2 from nucleotide sequences [19]. ORFs were extracted and translated into amino acids by Bioperl [20]. Multiple sequence alignment was performed for the S proteins by MUSCLE 3.8.31 [21]. Taking the sequence of EPI_ISL_406798 as a reference, we extracted acid amino mutations of the included sequences. The dimer structure of the S protein and ACE2 [in the format of protein data bank (pdb)] was downloaded from the National Microbiology Data Center (accession: NMDCS0000001). Mutations in the receptor-binding domain (RBD) were taken as the input of mCSM-PPI2 (http:// biosig. unime lb. edu. au/ mcsm_ ppi2/) to predict the change of free binding energy, as previously described [22].
Spike-pseudotyped lentiviral particles were quantitated using HEK293T-ACE2 cells. The cells were seeded in 96-well plates at 1 × 10 4 /well. Then, a serial tenfold diluted pseudovirus was added to the cultures to infect cells. Six hours later, the supernatants were removed and replaced with fresh culture medium. Forty-eight hours later, the pseudovirus titer was measured by counting the cells expressing green fluorescent protein under a fluorescence microscope. The measured titer was expressed as transduction units per milliliter (TU/ml).

Pseudovirus infectivity assay and neutralization assay
Pseudovirus infectivity directly corresponded to the relative luminescent units (RLUs) produced by the luciferase gene incorporated into the pseudovirus genome. HEK293T-ACE2 cells were seeded in 96-well plates at 1.5 × 10 4 /well, and 3700 TU pseudoviruses (wild type, V367F mutant, or N354D mutant) were added to the culture medium. Six hours after the infection, the culture medium was replaced with fresh DMEM. Forty-eight hours after the infection, luciferase activity was measured using a luciferase assay kit (Yeasen, No. 11401ES60) according to the manufacturer's instructions.

Statistical analysis
GraphPad Prism 6.0 software and Statistical Package for Social Sciences (SPSS) version 21.0 (IBM Corp., Armonk, NY) were applied to perform all statistical analyses.
The proportions of mutant strains before and after the onset time in different countries were compared by the two-sided Fisher's exact test. The data in the in vitro experiments are presented as mean value and standard deviations (SDs) and student t-test was performed for two-group comparisons. Differences with P-values < 0.05 were deemed statistically significant.

Potential recombination with CoVs from natural reservoirs
The sequences of included CoVs of animal resources were presented in Additional file 1: Table S1. The CoV shared the best similarity to SARS-CoV-2 in the full-length genome was Bat-RaTG13 [4]. Even though, nucleotide variations were equally distributed in the full-length genome of CoVs (Fig. 1). Although two possible recombination events were detected, the recombination might not be real due to the relatively low sequence similarities and geographic separation. These data indicate that the SARS-CoV-2 is a naturally evolved CoV.

Evolutionary analysis of global transmission network of SARS-CoV-2
We first evaluated the transmission network using 2000 representative SARS-CoV-2 strains randomly collected from the 8795 sequences collected during 26 December, 2019 and 30 November, 2020. It was found that the 2000 SARS-CoV-2 strains were clustered into three groups ( Fig. 2A). SARS-CoV-2 strains that clustered together were mainly identified on the same continent (Fig. 2B).
We then evaluated the transmission network using all the full-length SARS-CoV-2 strains collected worldwide at different stages during 2019-2020. At the early stage of the COVID-19 outbreak (26 December, 2019-8 March 2020), SARS-CoV-2 strains across the world were mainly composed of three clusters (Fig. 3). Viruses identified in Italy and England constituted the core of Cluster A. The strains in Cluster A were further divided into four clades: mainly identified in the USA, Italy, the Netherlands, Belgium, England, Scotland, and Brazil. Viral strains in Cluster B were mainly identified in the USA, with only several viruses identified in Canada and Australia. Cluster C contained the strains identified in China, Japan, the USA, Singapore, Australia, Malaysia, and Italy. Strains of Clusters A and B were phylogenetically linked to Cluster C; however, the evolutionary distance between Clusters B and C was longer than that between Clusters A and C. Thus, SARS-CoV-2 strains identified in the USA had at least four independent clades, in which Cluster B was the major one. The colors of the links between Clusters B and C were mostly the same as the main color of Cluster B, indicating that the different clusters of SARS-CoV-2 identified in the USA were cross-linked.
The strains collected from 9 to 31 March, 2020 were divided into three major clusters (Additional file 3: Fig. S2). Unlike the situation in the early stage of the COVID-19 pandemic, the clusters of SARS-CoV-2 strains were mostly independent of each other, especially for the evolutionary relationship between Clusters A and B or among Clusters C, D, and E. The situation was also observed in the subsequent months. Clusters A and C contained SARS-CoV-2 strains identified in the USA, Israel, France, and Singapore, forming the first major group. The strains in Cluster C were identified in Vietnam, China, Italy, Brazil, France, and Spain. The strains in Clusters B and E were independent, having no evolutionary relationship with other groups. The strains in Cluster B were identified in Russia, Italy, Brazil, and Japan.
The evolutionary pattern of global SARS-CoV-2 showed more obvious characteristics of clustering after April 2020. The color of the isolates in each clade was becoming pure, indicating that the SARS-CoV-2 variants in a given country tend to cluster together (Additional file 4: Fig. S3).
From 1 May to 31 August, 2020, the pandemic mitigated. SARS-CoV-2 strains were more identified in India (Clusters B and C) and Singapore (Clusters A and D) (Additional file 5: Fig. S4). Cluster C also contained the strains identified in Saudi Arabia, South Africa, and the USA, while Cluster B also contained strains identified in Brazil and Italy.
The strains identified in India and South Africa formed several clusters in June and August 2020 (Additional file 6: Fig. S5). The strains identified in South Africa shaped the core of Clusters A and D. Strains identified in South Africa and India formed Clusters B and C, two clusters with a weak link. Figure 4 shows the evolutionary relationship of the SARS-CoV-2 strains globally from 1 September to 30 November, 2020. Cluster A contained the SARS-CoV-2 strains identified in Hungary, France, and Italy. It had no links with other clusters. The SARS-CoV-2 strains identified in South Africa were the main strains in Clusters B and C. Cluster C also contained strains identified in

The effect of the S protein mutations on the binding to ACE2
We identified possible mutations in the S protein of SARS-CoV-2 and then estimated the effect of these mutations on the affinity of the S protein binding to ACE2. The S genes from all available SARS-CoV-2 sequences were identified. Among sequences reported at the early stage, we extracted 38 amino acid mutations located within the RBD region of the S protein. Based on sequences reported after 8 March, 2020, 26 amino acid mutations (4 were previously predicted) were extracted. We predicted that the binding free energy of the S proteins in 12 of the 38 mutations at early stage and 12 of the 26 mutations after early stage decreased (affinity increased) ( Table 1). This result indicates that some mutations increase the binding affinity of the S protein to ACE2, thus facilitating the transmission of SARS-CoV-2 in humans.
Then, we monitored the mutations predicted. Countries with more than 100 strains of relative mutations before 30 May, 2021 were included. Of all 60 types of mutants, the number of E484K was the largest with 86 585 sequences, followed by S477N with 55 442 sequences ( Table 2). Up to 30 May, 2021, E484K strains in Brazil and S477N in Australia accounted for more than 50%, while S477N strains accounted for more than 10% in Switzerland, France, and Luxembourg.

Effects of SARS-CoV-2 spike mutations on viral infectivity and the reactivity to the neutralizing antibody
We infected HEK293T cells with SARS-CoV-2 pseudoviruses (wild-type, V367F mutant, and N354D mutant), and then tested the infectivity and immune reactivity. The V367F mutant (5.132 × 10 6 RLU) and the N354D mutant (5.408 × 10 6 RLU) were more highly infectious than the wild-type counterpart (2.243 × 10 6 RLU) (Fig. 5). The immune reactivity was evaluated using SARS-CoV-2 S neutralizing antibody. The N354D mutant and wildtype counterpart showed a similar sensitivity to neutralizing antibody, while the V367F mutant was more sensitive to neutralizing antibody than wild-type counterpart (P < 0.001) (Fig. 6).

Discussion
In this study, we analyzed the origination and evolution of SARS-CoV-2 using public databases and experiments in vitro. In the bioinformatic parts, we offered a pipeline to analyze the dynamics of SARS-CoV-2 evolution globally according to viruses' genome, collecting details including geographic and temporal information. The results were also combined with the recombination analysis, the affinity prediction, and the quantitative monitoring of sequences to depict the nature of SARS-CoV-2 evolution. This pipeline of "evolutionary dynamics" helps identify the origination and transmission pattern of SARS-CoV-2.
Our recombination analysis of SARS-CoV-2 among CoVs from animals indicated that the nucleotide variations of CoVs were equally distributed in their genomes, without insertion or recombination of large fragment(s). The two possible recombination events are less likely to be real because of geographic isolation (Fig. 1). To the best of our knowledge, no evidence proved artificial modification on SARS-CoV-2. Our data support the result of a previous sequence analysis that SARS-CoV-2 should come from natural origin and evolution [24], which is also supported by the WHO report: the spillover of SARS-CoV-2 to human was likely through direct zoonotic transmission or intermediate host but was extremely unlikely due to a laboratory incident (https:// www. who. int/ publi catio ns/i/ item/ who-conve nedglobal-study-of-origi ns-of-sars-cov-2-china-part). Thus, SARS-CoV-2 might come from natural hosts, rather than a man-made CoV.
This evolutionary dynamics provides evidence to determine the origins and transmission of SARS-CoV-2. SARS-CoV-2 strains clustered together are more likely to transmit each other. At the early stage, the strains identified in China, Japan, the USA, Singapore, Australia, Malaysia, and Italy clustered together as Cluster C (Fig. 3), indicating that strains could transmit each other. Strains in Cluster B which was distinct from Cluster C were identified in the USA, Canada, and Australia, indicating this clade is unlikely to be transmitted by the strains identified in China. During the whole process of this period, virus collected in China mainly gathered in one clade and had no strong links with other clusters. As the location with large number of isolates, USA had various kinds of mutant strains which formed at least 4 clades at the same time. According to the transmission network of early stage, no single and obvious source nodes were observed. These data imply that SARS-CoV-2 in China might be introduced from other countries.
In the USA, the first COVID-19 case was diagnosed on January 19, 2020 [6]. However, a recent study indicated that of 7389 routine blood donations in nine states of the USA from December 13, 2019 to January 17, 2020, 1.3% were seropositive for neutralizing antibody against SARS-CoV-2 [25], indicating that SARS-CoV-2 might transmit in the USA prior to January 19, 2020. Retrospective detection of SARS-CoV-2 genome  in respiratory samples of symptomatic patients without relevant travel history indicated that patient tested positive for SARS-CoV-2 in the USA was identified on January 13, 2020 [26]. In Europe, the blood samples collected on November 4, 2019 in France and September to November 25, 2019 in Italy were positive for the antibody against SARS-CoV-2 [27][28][29], prior to the outbreak of COVID-19 in China [4]. SARS-CoV-2 genomic RNA can be detected in sewage systems of different countries during COVID-19 outbreak [30][31][32]. Importantly, SARS-CoV-2 genomic RNA was detected in waste water samples collected on 18 December, 2019 in Italy [33]. Cold-chain delivery of imported fresh seafood was the major way of introducing SARS-CoV-2 into cities including Beijing, Qingdao, Tianjin, and Dalian after May 2020 when the outbreak was well controlled in China [34][35][36]. COVID-19 outbreak occurred during the Spring Festival season. People routinely buy imported seafood to celebrate this holiday. Although nucleic acid test of SARS-CoV-2 was positive in environmental samples from stalls related to patients, SARS-CoV-2 was tested negative in wild animals in the Huanan seafood market. Furthermore, a total of 38 515 livestock and poultry samples and 41 696 wild animal samples from 31 provinces in China during 2018-2020 were tested negative for the antibody against SARS-CoV-2 or tested negative for SARS-CoV-2 nucleic acids (https:// www. who. int/ publi catio ns/i/ item/ who-conve ned-global-study-of-origi ns-of-sars-cov-2-china-part).
Our data, together with the reported evidences, imply that SARS-CoV-2 might originate in several geographic areas including Europe, America, and Asia simultaneously under certain evolutionary pressure. China might not be the original location where the spillover of SARS-CoV-2 from wildlife to humans occurs. The ancestors of SARS-CoV-2 might circulate among natural reservoirs and keep evolving in given ecological environments. The spillover to humans might be a specific stage during evolutionary course, just like SARS-CoV-1 that has disappeared for > 17 years. After the spillover, SARS-CoV-2 strains in different countries had their own directions of evolution, rendering increasingly obvious trends of location-based gathering. The colors of different clusters became "purer" during the global pandemic, with fewer nodes of mixed colors (Fig. 4). Appropriate control strategies from governments help prevent the pandemic [37,38]. Travel restrictions were implemented across the world [39]. After the international travel restrictions, the strains clustered locally and the risks of introducing mutant strains decreased in given countries. Since May 2020, India and South Africa reported a large number of clustered strains. Viruses identified in the two countries played key roles in forming the core of clusters in the transmission network. The mutant strains in both countries showed possible higher infectivity and antigenicity than SARS-CoV-2 strains at the early stage [40,41]. Mutant strains including B.1.617 were epidemic in India, according to the reports of the WHO. Meanwhile, mutant strains identified in South Africa include B.1.351, a strain reported in late 2020 [41]. The time points of the mutant epidemic were consistent with the improved clustering of SARS-CoV-2 in both countries.
The affinity of mutated RBD region to ACE2 was predicted and mutated sequences throughout the pandemic were quantitated in this study. The mutant in RBD may lead to altered ACE2-binding ability and altered antigenicity [42]. We then monitored those mutations until 30 May, 2021. We found that 60 amino acid mutations of the S protein might alter SARS-CoV-2 transmission (Table 1). Of those, E484K was the most frequent one (n = 86 585). E484K was associated with a decreased affinity (Table 2), which is consistent with a previous report [43]. However, E484K lead to immune evasion from both natural and vaccine-induced sera [44,45]. S477N, a mutation mainly identified in the USA, Australia, and some European countries, also had a large number of uploaded sequences. S477N enhance the binding affinity [45]. It was reported that COVID-19 influenced the host immunity [46]. Such process might be altered by  The effect of SARS-CoV-2 spike protein mutations on the reactivity to the neutralizing antibody. Pseudoviruses with the indicated SARS-CoV-2 S proteins (wild-type, N354D, or V367F) were incubated (1 h, room temperature) with different concentrations of neutralization antibody, before being inoculated into HEK293T-ACE2 cells. Efficiency of the transduction was quantified by testing the virus-encoded luciferase activity at 48 h post-transduction. For normalization, inhibition of pseudovirus transduction in HEK293T-ACE2 cells without neutralization antibody was set as 0%. The data are presented as the mean percentages of inhibition, and error bars indicate standard deviation. RLU relative luminescent unit mutant SARS-CoV-2, inducing more severe cases or wider epidemic. Thus, rapid identification of emerging mutants with immune evasion including E484K and those with increased binding affinity such as S477N is important in tracing SARS-CoV-2 evolution.
I468F, Q414E, V367F, A367T, A520S, N354D, and A435S were identified to be the early mutations with the affinity change of ΔΔG wild-mutation > 0.1 kcal/mol (Table 1). Of those, the sequences with I468F, Q414E, A367T, or A435S were not chosen due to only < 10 strains uploaded in 2020. A520S was reported to be associated with low antigenicity [47]. V367F was present at the early stage and thereafter. Thus, V367F and N354D were selected for the in vitro experiments. It was demonstrated that V367F and N354D mutants showed higher infectivity than wild-type counterpart (Fig. 5). For the first time, we demonstrated that the V367F mutant exhibits more sensitivity to the neutralizing antibody than wild-type counterpart (P < 0.001), possibly because this mutation increases the antigenicity [47]. Although V367F increases its binding affinity to ACE2, it increases the reactivity to neutralizing antibody. Thus, the proportion of this mutant did not increase significantly during the pandemic in Western world ( Table 2). SARS-CoV-2 particles contain 24 ± 9 S trimers [48]. It remains to be clarified if SARS-CoV-2 mutations might influence the antigenicity via affecting the conformation and number of trimers of SARS-CoV-2 particles. The neutralizing antibody applied in this study is a kind of monoclonal antibody targeting to the S1 protein, which has a higher reactivity to the V367F-related antigenic determinant. In most cases, however, SARS-CoV-2 mutations facilitate escape from antibody neutralization [49]. The combined application of two or more neutralizing antibodies to SARS-CoV-2 S protein can prevent the mutated viruses [50,51]. N354D mutation increased the infectivity of SARS-CoV-2, but did not alter antibody neutralization. These data indicate that the association of SARS-CoV-2 mutations with antibody neutralization are complicated and need extensively epidemiological studies.
Our study has limitations. First, the effect of combined SARS-CoV-2 mutations was not evaluated due to lack of suitable methods. SARS-CoV-2 mutants that acquire several immune escape mutations may be highly infectious. Second, the effects of the SARS-CoV-2 mutations on the conformation and number of trimers of SARS-CoV-2 are not evaluated in this study. Third, SARS-CoV-2 sequences were often identified and uploaded in countries with a higher level of academic activity, thus introducing a selection bias.
Finally, the numbers of uploaded strains were not consistent with the actual case number.

Conclusions
Conclusively, the present study indicates that SARS-CoV-2 strains might have originated in several countries simultaneously under certain evolutionary pressure. Continent-and country-specific clustering of SARS-CoV-2 strains might be caused by travel restrictions. SARS-CoV-2 evolution affects the transmission via altering the affinity to ACE2, immune escape, and possibly viral replication. The method of evolutionary dynamics in this study can be applied to trace the transmission and predict key SARS-CoV-2 mutations worldwide in the future.