Coronavirus genomic surveillance: Where have we been, where are we going next?
- Home
- Coronavirus genomic surveillance: Where have we been, where are we going next?
Nick Loman (University of Birmingham, UK) began his plenary talk by going back to 2019, with the initial identification of a cluster of cases of fatal respiratory illness in Wuhan, China. The cause remained unknown for a couple of weeks, until metagenomic sequencing revealed an association with the novel coronavirus that became known as SARS-CoV-2. On 10th January, Yong-Zhen Zhang (Fudan University, China) shared an early genome sequence for the pathogen; other groups in China soon uploaded further sequences. Nick stressed how crucial this early data sharing was in the fight against COVID-19, providing information essential in understanding the biology of the virus and revealing its relation to SARS-1, and to inform the development of diagnostic tests and kick-start research into vaccines.
Using this first genome sequence, Josh Quick (University of Birmingham, UK) was then able to tailor existing protocols for the sequencing of RNA viruses to deploy them for use in SARS-CoV-2 sequencing. Released by the ARTIC Group, this multiplexed PCR-based preparation method had previously been used to sequence Ebola, Zika, and yellow fever. In this method, the SARS-CoV-2 RNA virus genomes are amplified in 400 bp “tiled” fragments across two pools and sequenced in multiplex. Enabling good coverage across the SARS-CoV-2 genome from very low inputs, Nick described how the open-source protocol has since become the most popular method of sequencing SARS-CoV-2. Josh has continued to optimise the method, and has worked with Oxford Nanopore to help further reduce the cost to under £20/sample on nanopore sequencing devices. Moving on to bioinformatics, Nick described the ARTIC pipeline, which demultiplexes the barcoded data, aligns it to the SARS-CoV-2 reference sequence, performs polishing and variant calling via nanopolish or medaka, and generates consensus genome sequences. These combined efforts meant that by late January 2020, an end-to-end system was available for the sequencing of SARS-CoV-2. The ARTIC protocol was widely adopted and was, in many countries, the method used to sequence the first SARS-CoV-2 genome.
In March, the COVID-19 Genomics UK Consortium (COG-UK) was set up in Birmingham, UK. The group discussed how best to quickly set up a SARS-CoV-2 sequencing network, deciding from the outset that a “federated, distributed model” was essential, linking UK academia with public health agencies and bulk sequencing labs; this was supported by funding from the UK government. Nick described the protocol used at University of Birmingham for the nanopore sequencing of SARS-CoV-2 genomes – highlighting that they were just one site of many contributing genome sequences within COG-UK. They perform nanopore sequencing primarily on the GridION platform, sequencing up to 48 samples in multiplex per flow cell; where higher throughput is required, they sequence 96 samples per flow cell on the PromethION. The sequencing data is then analysed using the ARTIC workflow, and genome sequences are shared with the network, including Public Health England. In total, they have sequenced over 6,600 samples. As a partner in the Climb project, they have been working with a team in Cardiff to ensure the server and software infrastructure is in place to process the huge number of SARS-CoV-2 genomes generated by COG-UK, with over 468,050 genomes processed so far. Sequencing data and metadata is gathered, QC performed, and downstream analysis undertaken. A large part of this downstream analysis is the daily construction of phylogenetic trees and assignment of lineages by Andrew Rambaut and his team, providing up-to-date information crucial to outbreak response.
Early in the pandemic, COG-UK linked sequencing data with travel data to estimate how COVID-19 came to the UK. This revealed >1,300 independent introductions of the virus from mainland Europe into the UK at the start of March, prior to lockdown; Nick described this as a common theme worldwide, noting the importance of understanding importations and their significance in starting an epidemic wave. Moving forward to December 2020, Nick described a ‘shift in gears’ in the behaviour of the virus, with a surge in cases was seen in Kent, UK, at odds with the rest of the UK. Genomic epidemiology revealed that >50% of cases in this region belonged to the same, new lineage: B.1.1.7 – a stark contrast to the mix of multiple lineages observed previously. This suggested that the surge in cases was being driven by the new lineage, and enabled other epidemiological factors to be ruled out. Nick emphasised the importance of generating real-time genomic data to share with public health agencies and epidemiologists in understanding what was at play, and the importance of sharing on a global scale. For example, Andrew Rambaut learned via a WHO meeting of a new mutation, N501Y, in the lineage B.1.3.5.1, associated in a surge in cases in South Africa. The mutation was predicted to affect the shape of the SARS-CoV-2 virus spike protein and increase adherence to the human ACE2 receptor. Looking at the UK genomic data, he found the same mutation in the B.1.1.7 lineage. Nick described this lineage as having ‘way more mutations than it should have’ – ~20 more than would be expected given the evolutionary rate – with many producing functional changes to the spike protein and the receptor binding domain. Genomic modelling revealed the speed at which the B.1.1.7 variant outcompeted others each time it was introduced to a new part of the UK, showing an estimated 30-70% increased transmissibility. By February 2021, B.1.1.7 was the dominant strain in the UK, with similar patterns in other countries it reached.
Nick went on to highlight other SARS-CoV-2 variants of concern. One such variant is known as P.1, and was identified through the Cadde Project – a successor to their work with Zika, in collaboration with a team at the University of São Paulo, Brazil. This lineage features similar mutations to B.1.1.7, and began to dominate in Brazil, especially in Amazonas, where a second wave appeared to be driven by the variant. Nick asked: do some of the mutations that change transmissibility also change antigenicity, therefore increasing its ability to re-infect individuals and cause a second wave? He explained that this was not entirely clear, but possible.
Nick stressed that the more the virus is allowed to spread, the more chances it has to evolve combinations of mutations which can increase transmissibility and change antigenicity. He stressed that if the virus is able to circulate in areas that do not have sufficient public health measures in place to control it or access to vaccination, it is more likely SARS-CoV-2 will evolve increased transmissibility or immune evasion – ‘and that is what we, as a community, need to try and stop now’. Nick showed a graph demonstrating the surge in COVID-19 cases in India from March 2021, which is overwhelming health services and resulting in huge numbers of deaths. Though the picture is not entirely clear at present, Nick described how this surge is also thought to be associated with a new variant, B.1.617.2, which contains mutations that are considered to change its transmissibility. He noted that not much data was available yet, with most deduced from travellers to the UK; despite the biotechnology capacity and research funding available in India, there remains a gap in genomic surveillance resource. The recent CLIMB-GLOBAL-HEALTH project, in partnership with the ARTIC Network, has been set up to help provide resources developed by COG-UK and ARTIC to a global audience. Training courses are available for anyone to use via their website: https://www.climb.ac.uk/artic-and-climb-big-data-joint-workshop
Nick finished his talk by highlighting examples of the use of the ARTIC protocol and nanopore sequencing method of SARS-CoV-2 genome analysis worldwide. Now, Nick said, it needs to be optimised and utilised further, and more funding is needed. Stressing that ‘nobody is safe until we’re all safe’, he concluded that ‘we need to establish that principle of real-time data sharing worldwide, because everything is connected and we’re all in this together’.