Periscope: sub-genomic RNA identification in SARS-CoV-2 ARTIC Network Nanopore Sequencing Data
Matthew Parker (Sheffield Biomedical Research Centre, UK) opened his talk by describing how he and his colleagues are part of the COVID-19 Genomics UK Consortium (COG-UK), the goal of which is to sequence as many SARS-CoV-2 genomes as possible, to help inform public health response to the virus. In Sheffield, they have sequenced over 2,800 SARS-CoV-2 genomes. They collaborate with Sheffield Teaching Hospitals, part of the NHS, who send the team SARS-CoV-2-positive, pre-extracted RNA samples. Matthew and his team then perform RT-PCR using the ARTIC Network protocol and sequence them on a GridION. For analysis, they use the ARTIC Bioinformatics SOP, which produces consensus viral genome sequences; these are then uploaded to COG-UK. The ARTIC protocol takes an amplicon tiling approach to produce a good quality consensus sequence from SARS-CoV-2 RNA samples which could be degraded. To date, the team have been sequencing approximately 48 samples per MinION Flow Cell (47 SARS-CoV-2-positive samples + one negative control). The standard ARTIC bioinformatics workflow then includes mapping the reads to the reference genome, variant calling, and generating a consensus sequence.
Matthew then introduced how the SARS-CoV-2 virus is thought to express its genome. The positive-stranded RNA viral genome is 30 kb in length, and comprised of several open reading frames (ORFs). Two ORFs are translated into protein directly from the genome, whilst the ORFs towards the 3’ end are translated from RNA intermediates: sub-genomic RNAs (sgRNAs). Matthew and his team have developed periscope, which enabled the identification of sgRNAs from nanopore sequencing data. The nature of the transcription process in SARS-CoV-2 produces a set of short sgRNAs, each with a leader sequence at their ends. Primer 1 of Pool 1 from the ARTIC network primers, Matthew explained, is an almost-perfect match for this leader sequence. As a result, amplicons representing the sgRNAs are produced in RT-PCR. The team decided to use the presence of the leader sequence in amplicons to classify their sequencing reads into genomic or sub-genomic RNA. Matthew noted that they use raw, unfiltered sequencing data to achieve this to prevent the shorter sgRNAs from being filtered out by the ARTIC SOP filters.
Counting the number of sgRNA reads in their dataset, and normalising this number per 1000 genomic reads from the same amplicon, enables Matthew and his colleagues to determine expression levels for individual sgRNAs across different SARS-CoV-2 samples and cohorts. Matthew highlighted sgRNA expression levels between Sheffield and Glasgow datasets, which gave comparable results. To determine reproducibility, they used full technical replicates from the same samples and observed good correlation between sgRNA levels. For orthogonal validation, they compared their ARTIC nanopore data to short-read metagenomic data in vitro, seeing broad agreement both between different sgRNA levels and total sgRNA levels over time in the two datasets.
Matthew then introduced his investigation of non-canonical sgRNAs in SARS-CoV-2 samples. These are sgRNAs which do not contain a leader junction in the region of the TRS-B site. He showed how non-canonical sgRNAs have been repeatedly observed to originate from the same sites across the SARS-CoV-2 genome, in both the Sheffield and Glasgow datasets. Not much is known about the process by which these arise, so Matthew was keen to apply periscope to investigate them. The non-canonical sgRNAs are also present in surprisingly high amounts in some samples: Matthew showed an example in which one was represented by nearly 200 reads. In vitro analysis revealed that non-canonical RNA broadly follows the same expression pattern as canonical sgRNA.
Concluding his presentation, Matthew highlighted how ‘sgRNA is readily and reproducibly detectable in ARTIC Network nanopore sequencing data’, via a simple search for reads containing leader sequences. He described how periscope enables the use of this data to further understand sgRNA.
Periscope is freely available at: https://github.com/sheffield-bioinformatics-core/periscope