Store Resources Support About

Using long native reads to partition and assemble genomes from complex metagenomic samples


Date: 3rd December 2020

Long PCR-free nanopore reads allow partitioning and assembly of individual genomes from complex mixtures of different organisms, using several different bioinformatics approaches

Download the PDF

Fig. 1 De novo metagenomic assembly a) laboratory workflow b) typical bioinformatics pipeline

Long reads provide more genomic context, improving assembly from complex samples

The majority of microbes cannot be cultured in the laboratory, and so the most direct way to derive whole genome sequences from complex mixtures of organisms is by metagenomic assembly, where all genomes in the sample are assembled together (Fig. 1a). Such mixtures often contain many similar genomes with different levels of abundance, which often leads to misassembly. A common approach to this problem is to bin reads into subsets that ideally represent a single genome, and to then assemble bins individually. Long reads can improve this by improving the sensitivity and specificity of binning strategies and providing longer overlaps for assembly. An example of such a workflow is shown in Fig. 1b.

Fig. 2 Assembly by coverage binning a) workflow b–e) performance on Zymo mock community

Binning using differential coverage profiles to improve assembly contiguity

For metagenomic samples where the microbial genomes are not well represented in reference databases, differences in the organism abundance within the samples can be exploited as a binning strategy (Fig. 2a). We used three different extraction protocols on the Zymo mock community to create different genome abundances (Fig. 2b). We aligned reads from each sample to contigs assembled from the combined set of samples, and used aligned read depth to measure contig abundance in each sample (Fig. 2c). This allows binning of contigs based on matching abundance profiles (Fig. 2d). Finally, contigs in the initial bins can be refined by taking all reads that align to the contigs and conducting a second, bin-specific assembly (Fig. 2e).

Fig. 3 Partitioning bacterial reads using methylation patterns a) overview of experimental set-up b) bioinformatics workflow c) hexbin plot showing partitioned reads d) verification of results

Separating reads from closely related bacterial genomes using native DNA sequencing combined with Tombo to identify strain-specific patterns of Dam and Dcm methylation

Sequence similarity between strains in microbial communities can present challenges for analysis. Fortunately, even though methylation occurs at specific motifs in bacteria, there is high diversity of motifs even among members of the same species. These methylation patterns can be detected in native nanopore reads and used to bin reads by strain. We co-cultured a wild-type K12 E. coli strain and a mutant strain lacking Dcm and Dam methyltransferases (5'-CCWGG-3' and 5'-GATC-3' motifs respectively, Fig. 3a). After aligning all reads to a K12 reference, we used Tombo to characterise methylation status at each motif before calculating the median methylation score for these sites (Fig. 3b). The hexbin plot shows a division of reads based solely on the read- level methylation assessments at the two motifs. In addition, each strain had been transformed with a different plasmid and these reads segregated with the expected genome (Fig. 3c). De novo assembly of reads from the unmethylated cluster gave an assembly in which much of the Dam methyltransferase gene was deleted, hence the lack of Dam methylation (Fig. 3d). The Dcm methyl- transferase gene had also been inactivated by a point mutation (Fig. 3e). Next, we used the methylation binning tool Nanodisco to group genomic DNA and plasmids from a 4-species mock community, including three strains with high nucleotide similarity but different methylation motifs. We used a de novo approach to discover and cluster significant motifs (Fig. 3f). Plasmids and genomic contigs from correct hosts clustered together. Lastly, we took the difference in methylation between each plasmid and host genome across known motifs (Fig. 3g). Each plasmid had the highest similarity with the correct host genome.

Recommended for you