David R. Greig - Comparison of single nucleotide variants identified by Illumina and Oxford Nanopore technologies in the context of a potential outbreak of Shiga Toxin producing E.coli
London Calling 2019
Short-read sequencing platforms have been adopted by public health agencies for infectious disease surveillance worldwide and have proved to be a robust and accurate method for quantifying relatedness between bacterial genomes. However, this approach offers less flexibility for urgent, small scale sequencing that is often required during public health emergencies. In contrast, Oxford Nanopore Technologies offers a range of rapid real-time sequencing platforms, although at this time it has been suggested that lower read accuracy compared to other sequencing technologies might be problematic for variant identification. We compared Illumina and Oxford Nanopore sequencing data of two isolates of Shiga toxin producing Escherichia coli to assess the utility of nanopore technologies for urgent, small scale sequencing. We investigated whether the same single nucleotide variants were identified by the two sequencing technologies and whether inference of relatedness was consistent. We show that with optimised variant calling using nanopore sequencing data alone, it is possible to rapidly determine whether or not two cases of were likely to be epidemiologically linked.
David R. Greig from Public Health England (PHE) closed the Bacterial genomics breakout session by presenting his team’s work characterising single nucleotide variants of Shiga toxin producing Escherichia coli (STEC) using both short-read sequencing technology and long-read nanopore sequencing. David told the delegates how the team at Gastrointestinal Bacteria Reference Unit (part of PHE) sequence over 2,500 STEC samples per year. STEC is a common foodborne pathogen, which is characterised by the presence of the Shiga toxin (coded by the stx gene), which produces a range of commonly known symptoms, such as fever and vomiting. However, the disease can also lead to a number of kidney, cardiac, and neurological complications that can be fatal – particularly in children. At the GBRU, they primarily type STEC through culture, PCR, and short-read sequencing.
In the initial pilot study, a single STEC sample was analysed using both short- and long-read sequencing technology. Following detailed optimisation of analysis and variant calling thresholds for the nanopore data, the results revealed 1424 variants across both technologies when compared to the reference genome. However, after masking prophages that can contribute up to 20% of the STEC genome and serve to confound short-read analysis, they obtained 531 variants. Ignoring the reference genome and comparing just the short-read and nanopore sequencing data for the same strain revealed 101 variants. David described this as being somewhat higher than anticipated, and upon examining the data, it was shown that the nanopore sequencing was contributing 95 of these variants. Further research determined that 94 these variants were caused by a repeated cytosine methylation motif. By masking this motif they were able reduce the total SNP variants between the two techniques to just 7.
Based on these results the team were sufficiently confident to move to a real-world testing scenario. David describes how the GBRU received two putative STEC isolates on the same evening, which initial PCR analysis indicated to be the same strain. They put both of these strains though their short-read and nanopore sequencing pipeline. One strain was shown to have 6 different variant positions between the two sequencing methodologies, while the other showed 7.
One important fact that David pointed out was the shorter time frame required for nanopore sequencing. Including culturing of the strains, the nanopore methodology required just 17 hours, compared against the 31 hours for the short-read technique.
Comparing the two samples to each other revealed 125 different variants, indicating that they are not part of the same outbreak. Further examination of the 6 and 7 different variants detected by the two technologies showed that four of the variants were located in the same positions in both samples. Due to their location in repetitive genomic regions, the team at PHE believe these variants may be false positives in the short-read data.
Summarising his talk, David asked the rhetorical question ‘can you do SNP typing with nanopore sequencing?’ To which his answer was, ‘yes you can’.