Thank you for visiting nature.com. Mol. Lam, H. M., Ratmann, O. master 4 branches 94 tags Code AngieHinrichs Add entries for pangolin-data/-assignment 1.18.1.1 ( #512) ad16752 4 days ago 990 commits .github/ workflows Update pangolin.yml 7 months ago docs docs need guide tree now 3 years ago pangolin "This is an extremely interesting . A counting renaissance: combining stochastic mapping and empirical Bayes to quickly detect amino acid sites under positive selection. Root-to-tip divergence as a function of sampling time for non-recombinant regions NRR1 and NRR2 and recombination-masked alignment set NRA3. Mol. 87, 62706282 (2013). By mid-January 2020, the virus was spreading widely within Hubei province and by early March SARS-CoV-2 was declared a pandemic8. Specifically, using a formal Bayesian approach42 (see Methods), we estimate a fast evolutionary rate (0.00169 substitutions per siteyr1, 95% highest posterior density (HPD) interval (0.00131,0.00205)) for SARS viruses sampled over a limited timescale (1year), a slower rate (0.00078 (0.00063,0.00092) substitutions per siteyr1) for MERS-CoV on a timescale of about 4years and the slowest rate (0.00024 (0.00019,0.00029) substitutions per siteyr1) for HCoV-OC43 over almost five decades. . 26 March 2020. These are in general agreement with estimates using NRR2 and NRA3, which result in divergence times of 1982 (19482009) and 1948 (18791999), respectively, for SARS-CoV-2, and estimates of 1952 (19061989) and 1970 (19321996), respectively, for the divergence time of SARS-CoV from its closest known bat relative. A hypothesis of snakes as intermediate hosts of SARS-CoV-2 was posited during the early epidemic phase54, but we found no evidence of this55,56; see Extended Data Fig. The coverage threshold and consensus sequence generation threshold were set to 20 and 90 respectively. We used TreeAnnotator to summarize posterior tree distributions and annotated the estimated values to a maximum clade credibility tree, which was visualized using FigTree. Slider with three articles shown per slide. Ji, W., Wang, W., Zhao, X., Zai, J. 26, 450452 (2020). In other words, a true breakpoint is less likely to be called as such (this is breakpoint-conservative), and thus the construction of a non-recombining region may contain true recombination breakpoints (with insufficient evidence to call them as such). Current sampling of pangolins does not implicate them as an intermediate host. Google Scholar. We showed that severe acute respiratory syndrome coronavirus 2 is probably a novel recombinant virus. The Artic Network receives funding from the Wellcome Trust through project no. The fact that they are geographically relatively distant is in agreement with their somewhat distant TMRCA, because the spatial structure suggests that migration between their locations may be uncommon. Duchene, S., Holmes, E. C. & Ho, S. Y. W. Analyses of evolutionary dynamics in viruses are hindered by a time-dependent bias in rate estimates. The S1 protein of Pangolin-CoV is much more closely related to SARS-CoV-2 than to RaTG13. 82, 18191826 (2008). Means and 95% HPD intervals are 0.080 [0.0580.101] and 0.530 [0.3040.780] for the patristic distances between SARS-CoV-2 and RaTG13 (green) and 0.143 [0.1090.180] and 0.154 [0.0930.231] for the patristic distances between SARS-CoV-2 and Pangolin 2019 (orange). Identifying SARS-CoV-2-related coronaviruses in Malayan pangolins. Python 379 102 pangoLEARN Public Store of the trained model for pangolin to access. All four of these breakpoints were also identified with the tree-based recombination detection method GARD35. J. Med. Hu, B. et al. Aside from RaTG13, Pangolin-CoV is the most closely related CoV to SARS-CoV-2. While pangolins could be acting as intermediate hosts for bat viruses to get into humansthey develop severe respiratory disease38 and commonly come into contact with people through traffickingthere is no evidence that pangolin infection is a requirement for bat viruses to cross into humans. This is notable because the variable-loop region contains the six key contact residues in the RBD that give SARS-CoV-2 its ACE2-binding specificity27,37. We thank all authors who have kindly deposited and shared genome data on GISAID. NTD, N-terminal domain; CTD, C-terminal domain. Uncertainty measures are shown in Extended Data Fig. Trends Microbiol. Lu, R. et al. As of December 2, 2021, SJdRP, a medium-sized city in the Northwest region of So Paulo state, Brazil (Fig. 53), this is inferred to have occurred before the divergence of RaTG13 and SARS-CoV-2 and thus should not influence our inferences. PANGOLIN lineage database (15, 16) was used to analyze the frequency of lineages among countries. 36) (RDP, GENECONV, MaxChi, Bootscan, SisScan and 3SEQ) and considered recombination signals detected by more than two methods for breakpoint identification. After removal of A1 and A4, we named the new region A. & Muhire, B. RDP4: Detection and analysis of recombination patterns in virus genomes. b, Similarity plot between SARS-CoV-2 and several selected sequences including RaTG13 (black), SARS-CoV (pink) and two pangolin sequences (orange). Lemey, P., Minin, V. N., Bielejec, F., Pond, S. L. K. & Suchard, M. A. a, Breakpoints identified by 3SEQ illustrated by percentage of sequences (out of 68) that support a particular breakpoint position. Sliding window analysis of changes in the patterns of sequence similarity between human SARS-CoV-2, and pangolin and bat coronaviruses as described further in Fig. While there is evidence of positive selection in the sarbecovirus lineage leading to RaTG13/SARS-CoV-2 (ref. As illustrated by the dashed arrows, these two posteriors motivate our specification of prior distributions with standard deviations inflated 10-fold (light color). B 281, 20140732 (2014). Because there is no single accepted method of inferring breakpoints and identifying clean subregions with high certainty, we implemented several approaches to identifying three classic statistical signals of recombination: mosaicism, phylogenetic incongruence and excessive homoplasy51. RegionsB and C span nt3,6259,150 and 9,26111,795, respectively. is funded by the MRC (no. Proc. When the first genome sequence of SARS-CoV-2, Wuhan-Hu-1, was released on 10January 2020 (GMT) on Virological.org by a consortium led by Zhang6, it enabled immediate analyses of its ancestry. Methods Ecol. PubMedGoogle Scholar. Discovery of a rich gene pool of bat SARS-related coronaviruses provides new insights into the origin of SARS coronavirus. Provided by the Springer Nature SharedIt content-sharing initiative, Molecular and Cellular Biochemistry (2023), Nature Microbiology (Nat Microbiol) Wang, H., Pipes, L. & Nielsen, R. Synonymous mutations and the molecular evolution of SARS-Cov-2 origins. Researchers have found that SARS-CoV-2 in humans shares about 90.3% of its genome sequence with a coronavirus found in pangolins (Cyranoski, 2020). Evol. Biol. In the meantime, to ensure continued support, we are displaying the site without styles To begin characterizing any ancestral relationships for SARS-CoV-2, NRRs of the genome must be identified so that reliable phylogenetic reconstruction and dating can be performed. The genetic distances between SARS-CoV-2 and RaTG13 (bottom) demonstrate that their relationship is consistent across all regions except for the variable loop. & Boni, M. F. Improved algorithmic complexity for the 3SEQ recombination detection algorithm. Maclean, O. Mol. 3) to examine the sensitivity of date estimates to this prior specification. The new paper finds that the genetic sequences of several strains of coronavirus found in pangolins were between 88.5 percent and 92.4 percent similar to those of the novel coronavirus. Sequence similarity. Dudas, G., Carvalho, L. M., Rambaut, A. c, Maximum likelihood phylogenetic trees rooted on a 2007 virus sampled in Kenya (BtKy72; root truncated from images), shown for five BFRs of the sarbecovirus alignment. Early detection via genomics was not possible during Southeast Asias initial outbreaks of avian influenza H5N1 (1997 and 20032004) or the first SARS outbreak (20022003). Green boxplots show the TMRCA estimate for the RaTG13/SARS-CoV-2 lineage and its most closely related pangolin lineage (Guangdong 2019), with the light and dark coloured version based on the HCoV-OC43 and MERS-CoV centred priors, respectively. Suchard, M. A. et al. R. Soc. Among the 68sequences in the aligned sarbecovirus sequence set, 67 show evidence of mosaicism (all DunnSidak-corrected P<4104 and 3SEQ14), indicating involvement in homologous recombination either directly with identifiable parentals or in their deeper shared evolutionary historythat is, due to shared ancestral recombination events. GitHub - cov-lineages/pangolin: Software package for assigning SARS-CoV-2 genome sequences to global lineages. In this approach, we considered a breakpoint as supported only if it had three types of statistical support: from (1) mosaic signals identified by 3SEQ, (2) PI signals identified by building trees around 3SEQs breakpoints and (3) the GARD algorithm35, which identifies breakpoints by identifying PI signals across proposed breakpoints. Genetics 172, 26652681 (2006). Virological.org http://virological.org/t/ncov-2019-codon-usage-and-reservoir-not-snakes-v2/339 (2020). These authors contributed equally: Maciej F. Boni, Philippe Lemey. and T.A.C. This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. obtained the genome sequences of 10 SARS-CoV-2 virus strains through nanopore sequencing of nasopharyngeal swabs in Malta and analyzed the assembled genome with pangolin software, and the results showed that these virus strains were assigned to B.1 lineage, indicating that SARS-CoV-2 was widely spread in Europe (Biazzo et al., 2021). 56, 152179 (1992). Sci. 36)gives a putative recombination-free alignment that we call non-recombinant alignment3 (NRA3) (see Methods). We aimed to analyze 3 naso-oropharyngeal swab samples collected between August and December 2021 to describe the amino acid changes present in the sequence reads that may have a role in the emergence of new . J. Gen. Virol. Influenza viruses reassort17 but they do not undergo homologous recombination within RNA segments18,19, meaning that origins questions for influenza outbreaks can always be reduced to origins questions for each of influenzas eight RNA segments. Pangolin-CoV is 91.02% and 90.55% identical to SARS-CoV-2 and BatCoV RaTG13, respectively, at the whole-genome level. Intraspecies diversity of SARS-like coronaviruses in Rhinolophus sinicus and its implications for the origin of SARS coronaviruses in humans. Because 3SEQ identified ten BFRs >500nt, we used GARDs (v.2.5.0) inference on 10, 11 and 12 breakpoints. Evol. Duchene, S. et al. A deep dive into the genetics of the novel coronavirus shows it seems to have spent some time infecting both bats and pangolins before it jumped into humans, researchers said . 1a-c ), has the third-highest number of confirmed COVID-19 cases in the state of So. Rev. PubMed 4). Ge, X. et al. Now, the two researchers used genomic sequencing to compare the DNA of the new coronavirus in humans with that in animals and found a 99% match with pangolins. Except for specifying that sequences are linear, all settings were kept to their defaults. Indeed, the rates reported by these studies are in line with the short-term SARS rates that we estimate (Fig. Patino-Galindo, J. Maciej F. Boni, Philippe Lemey, Andrew Rambaut or David L. Robertson. & Holmes, E. C. Recombination in evolutionary genomics. The extent of sarbecovirus recombination history can be illustrated by five phylogenetic trees inferred from BFRs or concatenated adjacent BFRs (Fig. # File containing the ID of the samples, the Sequence of the haplotype, the Continent, the country, the Region, the Data, the Lineage of Pangolin and Nextstrain clade, and the haplotype number # In this order # Could be obtained from the database Given what was known about the origins of SARS, as well as identification of SARS-like viruses circulating in bats that had binding sites adapted to human receptors29,30,31, appropriate measures should have been in place for immediate control of outbreaks of novel coronaviruses. Evol. Of the nine breakpoints defining these ten BFRs, four showed phylogenetic incongruence (PI) signals with bootstrap support >80%, adopting previously published criteria on using a combination of mosaic and PI signals to show evidence of past recombination events19. 2, vew007 (2016). A third approach attempted to minimize the number of regions removed while also minimizing signals of mosaicism and homoplasy. ac, Root-to-tip (RtT) divergence as a function of sampling time for the three coronavirus evolutionary histories unfolding over different timescales (HCoV-OC43 (n=37; a) MERS (n=35; b) and SARS (n=69; c)). 35, 247251 (2018). Even before the COVID-19 pandemic, pangolins have been making headlines. Phylogenetic trees and exact breakpoints for all ten BFRs are shown in Supplementary Figs. BEAST inferences made use of the BEAGLE v.3 library68 for efficient likelihood computations. This long divergence period suggests there are unsampled virus lineages circulating in horseshoe bats that have zoonotic potential due to the ancestral position of the human-adapted contact residues in the SARS-CoV-2 RBD. In our second stage, we wanted to construct non-recombinant regions where our approach to breakpoint identification was as conservative as possible. With horseshoe bats currently the most plausible origin of SARS-CoV-2, it is important to consider that sarbecoviruses circulate in a variety of horseshoe bat species with widely overlapping species ranges57. S. China corresponds to Guangxi, Yunnan, Guizhou and Guangdong provinces. Our approach resulted in similar posterior rates using two different prior means, implying that the sarbecovirus data do inform the rate estimate even though a root-to-tip temporal signal was not apparent. & Holmes, E. C. A genomic perspective on the origin and emergence of SARS-CoV-2. The estimated divergence times for the pangolin virus most closely related to the SARS-CoV-2/RaTG13 lineage range from 1851 (17301958) to 1877 (17461986), indicating that these pangolin lineages were acquired from bat viruses divergent to those that gave rise to SARS-CoV-2. Preprint at https://doi.org/10.1101/2020.05.28.122366 (2020). Zhou et al.2 concluded from the genetic proximity of SARS-CoV-2 to RaTG13 that a bat origin for the current COVID-19 outbreak is probable. Sequencing from Malayan pangolins collected during anti-smuggling operations in southern China detected coronavirus lineages related to SARS-CoV-2. There is a 90% DNA match between SARS CoV 2 and a coronavirus in pangolins. Combining regions A, B and C and removing the five named sequences gives us putative NRR1, as an alignment of 63sequences. Publishers note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. G066215N, G0D5117N and G0B9317N)) and by the European Unions Horizon 2020 project MOOD (no. 36, 17931803 (2019). The consistency of the posterior rates for the different prior means also implies that the data do contribute to the evolutionary rate estimate, despite the fact that a temporal signal was visually not apparent (Extended Data Fig. It allows a user to assign a SARS-CoV-2 genome sequence the most likely lineage (Pango lineage) to SARS-CoV-2 query sequences. The idea is that pangolins carrying the virus, SARS-CoV-2, came into contact with humans. While such models have recently been made available, we lack the information to calibrate the rate decline over time (for example, through internal node calibrations44). Evol. Abstract. The histogram allows for the identification of non-recombining regions (NRRs) by revealing regions with no breakpoints. The ongoing pandemic spread of a new human coronavirus, SARS-CoV-2, which is associated with severe pneumonia/disease (COVID-19), has resulted in the generation of tens of thousands of virus genome sequences. The variable-loop region in SARS-CoV-2 shows closer identity to the 2019 pangolin coronavirus sequence than to the RaTG13 bat virus, supported by phylogenetic inference (Fig. Gray inset shows majority rule consensus trees with mean posterior branch lengths for the two regions, with posterior probabilities on the key nodes showing the relationships among SARS-CoV-2, RaTG13, and Pangolin 2019. Nature 583, 282285 (2020). Biol. Biol. 4), that region and shorter BFRs were not included in combined putative non-recombinant regions. 1, vev003 (2015). Lin, X. et al. According to GISAID . 25, 3548 (2017). Bioinformatics 28, 32483256 (2012). Despite the high frequency of recombination among bat viruses, the block-like nature of the recombination patterns across the genome permits retrieval of a clean subalignment for phylogenetic analysis.