Resources Get started
Resource Centre

Using long native reads to partition and assemble genomes from complex metagenomic samples


Date: 22nd May 2019


Download the PDF

Long PCR-free nanopore reads allow partitioning and assembly of individual genomes from complex mixtures of different organisms, using several different bioinformatics approaches

Fig. 1 De novo metagenomic assembly a) laboratory workflow b) typical bioinformatics pipeline

Long reads provide more genomic context, improving assembly from complex samples

The majority of microbes cannot be cultured in the laboratory, and so the most direct way to derive whole genome sequences from complex mixtures of organisms is by metagenomic assembly, where all genomes in the sample are assembled together (Fig. 1a). Such mixtures often contain many similar genomes with different levels of abundance, which often leads to misassembly. A common approach to this problem is to bin reads into subsets that ideally represent a single genome, and to then assemble bins individually. Long reads can improve this by improving the sensitivity and specificity of binning strategies and providing longer overlaps for assembly. An example of such a workflow is shown in Fig. 1b.

Fig. 2 Assembly by coverage binning a) workflow b–e) performance on Zymo mock community

Binning using differential coverage profiles to improve assembly contiguity

For metagenomic samples where the microbial genomes are not well represented in reference databases, differences in the organism abundance within the samples can be exploited as a binning strategy (Fig. 2a). We used three different extraction protocols on the Zymo mock community to create different genome abundances (Fig. 2b). We aligned reads from each sample to contigs assembled from the combined set of samples, and used aligned read depth to measure contig abundance in each sample (Fig. 2c). This allows binning of contigs based on matching abundance profiles (Fig. 2d). Finally, contigs in the initial bins can be refined by taking all reads that align to the contigs and conducting a second, bin-specific assembly (Fig. 2e).

Fig. 3 Partitioning native bacterial reads using strain-specific methylation patterns a) overview of experimental set-up b) bioinformatics workflow c) hexbin plot showing partitioned reads

Separating reads from closely related bacterial genomes using Tombo to identify strain- specific patterns of Dam and Dcm methylation

The high degree of sequence similarity between multiple strains in microbial communities can present significant challenges for analysis. One way to resolve strain-specific sequences is to take advantage of the patterns of DNA methylation that are often present in microbial genomes. Methylation occurs at specific target motifs, yet there exists a great diversity of these motifs in the bacterial world, even among members of the same species. These naturally occurring methylation patterns can be detected in nanopore reads and can serve as epigenetic barcodes for binning reads by strain. In the example shown, two strains of E. coli were co-cultured: a wild-type K12 strain and a K12 mutant lacking the Dcm and Dam methyltransferases that methylate the 5’-CCWGG-3’ and 5’-GATC-3’ motifs, respectively (Fig. 3a). Nanopore sequencing resulted in a mixture of reads from each strain. After aligning all reads to an E. coli K12 reference sequence, the methylation detection tool Tombo was used to characterise the methylation status at 5’-GATC-3’ and 5’-CCWGG-3’ sites. Read-level statistics were compiled by assessing all motif sites from each read and taking the median methylation score for these sites (Fig. 3b). The resulting hexbin plot shows a division of reads solely based on these read-level methylation assessments at the two motifs in question: one group has high scores for both the Dcm and Dam motifs, while the other group has low methylation scores for both motifs (Fig. 3c).

Recommended for you

Open a chat to talk to our sales team