High-quality, low-cost, nanopore-only bacterial genome sequences

To obtain reference-quality bacterial genome assemblies, data is often used from the sequencing of either pure cultures or metagenomic samples. Short-read sequencing has been the technology of choice for this application in previous years but has limited ability to resolve repetitive sequences that are longer than the library insert size. Consequently, technology capable of producing long sequencing reads, including the Oxford Nanopore platform, has 'recently emerged as the choice' for assembling genomes derived from such samples1.

Professor Albertson and colleagues, based at Aalborg University in Denmark, investigated whether nanopore sequencing data alone could be used to obtain reference-quality bacterial genome assemblies1. Their work noted that, in the past, there has been a preference to use either short-read or reference polishing of nanopore data to obtain near-complete microbial genome assemblies, yet this is an undesirable option as it adds cost and complexity1.

The team evaluated the performance of R9 and the more recent R10 nanopore chemistry in bacterial genome assembly, obtaining sequence data derived from 'pure cultures' (in this case, a mock community) and an activated sludge sample.

Figure 1. Indels observed per 100 kb in the de novo bacterial isolate assemblies, at different depths of coverage, with and without short-read polishing. The authors noted that short-read polishing of nanopore data obtained using R10.4 chemistry provided no significant improvement in assembly quality. Image adapted from Sereika et al.1 and available under Creative Commons license (creativecommons.org/licenses/by/4.0).

They introduced the term 'near-finished' genome to indicate the generation of a high-quality genome assembled with only long nanopore reads, for which the application of short-read polishing would not significantly improve the consensus sequence. They found that R10.4 data alone could generate near-finished bacterial genomes, without polishing (Figure 1). The depth of coverage required to achieve this was approximately 40-fold. To assess performance on metagenomic genome assembly, the team sequenced a sample of activated sludge; a similar conclusion was made — that R10.4 chemistry enabled the generation of near-finished microbial genomes, without short-read polishing1.

A notoriously challenging bacterial genome to sequence and assemble is that of Mycobacterium tuberculosis. M. tuberculosis is the pathogen responsible for tuberculosis (TB), which remains one of the deadliest infectious diseases, with 1.5 million human deaths attributed to TB in 20202. Drug-resistant M. tuberculosis is a particularly significant threat for effective TB control1,3. Genome sequencing of the pathogen has gained traction in recent years for both clinical research and epidemiological investigations. Such efforts have provided valuable insights into circulating strains, including mutations underlying drug resistance and virulence, and the dynamics of person-to-person transmission — conferring high-resolution analyses when compared with culture-based phenotyping or targeted sequencing assays2.

Previously, short-read sequencing technology was typically used to investigate the genetic basis of resistance and the genomics underpinning TB transmission. However, the genome of M. tuberculosis is challenging to resolve with short reads due to its high GC content and repetitive nature — including the highly variable and GC-rich pe/ppe genes associated with drug resistance, which are often excluded from analysis due to difficulties in accurately mapping these regions to the genome when using short reads. Furthermore, the high capital cost and centralisation associated with these sequencing platforms has limited access to whole-genome analysis in many areas with a high TB burden and lower income3,4.

In contrast, the Oxford Nanopore platform can produce sequencing reads of any length, and a scalable range of devices is available, including portable options suitable for in situ sequencing; the technology has therefore been recognised as a 'promising platform for cost-effective application' to TB genome analysis3. However, few studies have investigated the performance of nanopore sequencing for M. tuberculosis genome analysis for drug susceptibility prediction or outbreak investigation.

'Oxford Nanopore R10.4 enables the generation of near-finished microbial genomes from pure cultures or metagenomes at coverages of 40-fold without short-read polishing'

Sereika, M. et al. Nat. Methods (2022)

In light of this, Gómez-González et al. and Hall et al. compared the performance of Oxford Nanopore and short-read sequencing platforms for these applications3,4. Gómez-González et al. sequenced 10 M. tuberculosis clinical research isolates with both nanopore and short-read technology, obtaining 93.6-fold short-read and 72.2-fold nanopore depth of coverage, after mapping. The team highlighted the improved coverage of long nanopore reads in repetitive regions where short reads failed to accurately align. As expected, a higher number of large variants were detected with long nanopore reads, (median 81 versus 24, across the isolates); regarding single nucleotide polymorphisms (SNPs), for all sample pairs, >99% of SNPs identified were called in both samples, with few platform discrepancies. All lineage predictions were identical between the two platforms (Figure 2); however, looking specifically at the nanopore data, as the pe/ppe gene regions were successfully resolved with long nanopore reads, SNPs could also be incorporated from these regions for lineage analysis, which led to an improved resolution that 'would be of special interest in outbreak settings, where transmission analysis of closely related isolates can be potentially better established'. They also suggested that the ability to cover repetitive regions with long reads could contribute a better understanding of drug-resistance mechanisms in M. tuberculosis3.

Hall et al. aimed to establish whether nanopore sequence data could be used to reproduce equivalent transmission clusters and drug susceptibility profiles to those generated with short-read data4. To investigate this, the team obtained matched nanopore and short-read data from 151 isolates. The study found that isolate clustering was the same between the two platforms, and in terms of genotyping resistance-associated SNPs and INDELs, they obtained near-identical results, with a concordance of >99.99% between the two technologies4.

Figure 2. Phylogenetic trees representing the branching order for the M. tuberculosis clinical research isolates studied, showing equal branch lengths for the 10 pairs of sequenced isolates. Image adapted from Gómez-González et al.3 and available under Creative Commons license (creativecommons.org/licenses/by/4.0).

'Our analysis shows that it is now possible to obtain high-precision SNP calls in M. tuberculosis with current nanopore data'

Hall, M.B. et al. medRxiv (2022)

Find out more about microbial genomics
  1. Sereika, M. et al. Oxford Nanopore R10.4 long-read sequencing enables the generation of near-finished bacterial genomes from pure cultures and metagenomes without short-read or reference polishing. Nat. Methods 19, 823–826 (2022). DOI: https://doi.org/10.1038/s41592-022-01539-7
  2. WHO. Tuberculosis. Available at: https://www.who.int/news-room/fact-sheets/detail/tuberculosis. [Accessed 23rd August 2022]
  3. Gómez-González, P.J. et al. Portable sequencing of Mycobacterium tuberculosis for clinical and epidemiological applications. Brief. Bioinform. bbac256 (2022). DOI: https://doi.org/10.1093/bib/bbac256
  4. Hall, M.B. et al. Nanopore sequencing for Mycobacterium tuberculosis drug susceptibility testing and outbreak investigation. medRxiv 22271870 (2022). DOI: https://doi.org/10.1101/2022.03.04.22271870