NCM 2021: SARS-CoV-2 sequencing: on the midnight train to throughput
- Home
- NCM 2021: SARS-CoV-2 sequencing: on the midnight train to throughput
Global sequencing efforts in the COVID-19 pandemic
Julie and Robert (University of Wisconsin-Madison, USA) discussed their team’s SARS-CoV-2 sequencing efforts in the state of Wisconsin, and their recent switch to the Midnight protocol. Julie introduced the global picture of the COVID-19 epidemic as it stood on November 11th, 2021, almost two years after the first detected case. At that point, there were 251,266,207 confirmed cases worldwide and 5,070,244 deaths (as reported by the World Health Organisation). A total of >7 billion vaccine doses had been administered. Julie pointed out, however, that ‘while vaccination efforts are reducing the numbers of severe cases’, SARS-CoV-2 is still actively circulating everywhere, with variants of concern arising.
Compared to previous pandemics, ‘this virus is remarkably well studied’: as of November 11th, over 5 million SARS-CoV-2 genome sequences had been submitted to the GISAID database, from around the world. These efforts are ‘thanks largely to the ARTIC and ONT teams, for rapidly developing a protocol, deployable on such an easily accessible sequencing device’. Within the United States, all states have submitted several SARS-CoV-2 sequences to public databases, such as GISAID and NCBI. This includes Wisconsin ― where their own sequencing efforts are located; state-wide, there have been >800,000 confirmed cases here, of which ~3.7% have been sequenced. Milwaukee County, the most populous county, had ~140,000 of these confirmed cases; the second most populous county, Dane County, where Julie’s team are located, had ~57,000 confirmed cases, of which ~7.2% were sequenced. The third largest contributor of sequences was her team. Julie noted that, throughout the pandemic, there were generally only two full-time employees at a time focusing on these efforts, as they were just one of the many projects in their lab.
Case studies
Julie explained how her team have been able to look at ‘some very interesting cases’ throughout the pandemic; she next presented two of these.
Madison (Dane County) had the 12th diagnosed SARS-CoV-2 case in the US in early February 2020. Julie’s team sequenced a research sample from this individual, and revealed that they had no descendent viruses, i.e., its transmission was contained. Alongside other sequence data, they also detected multiple subsequent introductions of SARS-CoV-2 in both Dane and Milwaukee Counties, with more introductions but less extensive community spread in Dane County.
Another interesting example was that of persistent SARS-CoV-2 infection in an immunocompromised patient, in which the team monitored SARS-CoV-2 levels over time. The individual had common variable immunodeficiency and MALT lymphoma; they were initially diagnosed with COVID-19 using standard PCR testing, and then repeated PCR testing was performed over the course of around 300 days. The Ct level of the PCR assay never rose about 30. Unfortunately, there was no sample available from the timepoint of diagnosis for them to use, but they obtained a research sample from around day 100 which Julie’s team sequenced. This revealed a mutation in the receptor binding motif of the spike gene (E484A) that had been observed previously in other immunocompromised patients. The patient had started neutralising antibody therapy at around day 200, and subsequent sequencing of research samples from the individual revealed a globally unique variant E484T. Julie questioned if the change might have been initiated somehow by the antibody therapy.
Julie explained that her team have a contract with the CDC to sequence and investigate SARS-CoV-2 in individuals with immune failure, such as infection despite vaccination, reinfection, and persistent infection; a major focus will be sequencing the virus from immunocompromised individuals, such as transplant recipients, those living with HIV/AIDS, and cancer patients receiving immunosuppressive therapies. They also plan to expand their work beyond SARS-CoV-2: investigating prolonged infection in the context of influenza.
Meanwhile, as the team plan to continue to perform SARS-CoV-2 sequencing, they are transitioning from the ARTIC to the Midnight protocol to improve throughput and decrease cost.
Relative efficiency of Midnight vs ARTICv3 workflows for SARS-CoV-2 sequencing
Robert provided an overview of real-world sample collection. The majority of their sequencing is performed on nasal swab samples that have been identified as positive for SARS-CoV-2 through qPCR analysis via a local testing provider. These samples will have undergone at least one freeze-thaw cycle and have been handled; this means that the number of non-degraded viral genome copies is likely to be lower in the sample when it comes to sequencing it.
Their lab aims to sequence around 192 samples per week for surveillance testing, but they receive more than this ― for example, between 28th August and 10th November 2021, they sequenced 839 samples but received and catalogued 12,905 samples. Their primary goal of surveillance sequencing is to generate consensus sequences for GISAID submission; for this process, a maximum of 10% masked sites (see below) are allowed across the entire consensus sequence. To meet this requirement, they self-select for samples with lower Ct values to take forward to sequencing but ensure that a wide coverage of surveillance is maintained. They also include samples of particular interest, such as immunocompromised individuals, or outbreak clusters; this means that they are looking to develop strategies for sequencing a wide range of Ct values on a single flow cell.
Both ARTIC Classic and Midnight protocols use a tiled primer amplification, as opposed to a whole-genome sequencing, approach. The ARTIC Classic protocol includes 98 primer pairs, producing 400 bp amplicons, tiled across the 30 kbp genome, without any overlap (preventing primer-primer interaction during PCR). This approach is advantageous over whole-genome sequencing as it can work for very low viral copy number samples. Robert further detailed the specific steps that they perform for normalisation and considerations for running multiple samples with varying Cts on a single MinION Flow Cell. He also presented some key differences between the Midnight and ARTIC Classic workflows, such as the number of amplicons produced (29 vs. 98, respectively), amplicon lengths (1,200 bp vs. 400 bp), turnaround times (~5.5 h vs. 10 h), and relative costs – ‘down to around $10 with Midnight’. As fewer primers are used in the Midnight approach, future mutations in the genomes of SARS-CoV-2 variants are less likely to fall in a primer binding site, and therefore less likely to impact sequencing. Robert detailed the steps involved in the Midnight library prep workflow, explaining how it reduces hands-on time and decreases sequencing costs.
Sequencing coverage for Midnight workflow
Robert shared their read analysis for their last two Midnight sequencing runs (n=175), explaining the reasons and consequences for masked sites: masked sites in the consensus sequences are positions that are marked as Ns in the sequence, resulting from poor-quality reads and/or low depth of coverage (<20 reads at a particular position). For GISAID submission, a maximum of 10% of the consensus can be masked across the entire 30 kb SARS-CoV-2 genome. Some samples they received fell below this threshold, and Robert emphasised how the Ct values are reported by the testing provider prior to additional sample handling, and so are likely overestimates of non-degraded viral RNA at the time of sequencing. Robert next pointed out how there were certain amplicons in the ARTICv3 protocol that consistently underperformed, causing more frequent dropouts in the consensus; this wasn’t observed in the Midnight samples. Regarding GISAID eligible sequences, just over half of the ARTICv3-derived sequences were eligible vs. just over two-thirds of Midnight-derived consensus sequences.
Robert discussed how they plan to further optimise the Midnight protocol; in particular, acknowledging that they will ‘always be working with potentially degraded samples’, and that there may be some primer optimisation they can make in line with their local variant pool. They are also investigating Bonito basecalling instead of Guppy, their current basecaller.