Assembly methods for nanopore-based metagenomic sequencing: a comparative study


Adriel Latorre-Pérez, from Darwin Bioprospecting Excellence in Valencia, Spain, kicked off his presentation by outlining the purpose of metagenomic sequencing, namely obtaining individual genome assemblies from mixed microbial samples. However, this isn’t without challenges, including unknown and uneven composition of microbes, intragenomic repetition, and intergenomic overlaps between closely related species in the community. According to Adriel, when using traditional sequencing platforms, such metagenomic sequencing results in ‘incomplete genomes with hundreds or thousands of contigs’. Conversely, Adriel commented that ‘long reads generated by Oxford Nanopore Technologies’ platforms have demonstrated improved assemblies in terms of contiguity and completeness’. This has led to the development of new assembly tools such as Flye, Canu, Raven, and Pomoxis, specifically designed to handle long-read sequencing data. The objective of Adriel’s study was to benchmark the performance of these tools in order to establish a pipeline for future metagenomic analyses in his lab.

In the first evaluation, the team utilised publicly available data generated by Nicholls et al. (2019), for two commercially available mock communities. Both communities comprised the same microbial species but one (Zymo CS) had an even composition, while the other (Zymo CSII) had a logarithmic composition. Adriel showed results revealing that, for the even community, the highest assembled fraction was provided by the metaFlye tool (70.4%). Honourable mentions were also given to Canu, Pomoxis, and Raven. Adriel pointed however that ‘every tool was far from covering the entire metagenome’, which was due to the presence of two yeast species that were present in a lower fraction to the bacterial species. Upon removal of the yeast data, the average genome fraction assembled increased to almost 100%. Similar trends were also found when analysing the log community.

In terms of sequence accuracy after polishing using Racon and Medaka, Canu was adjudged to perform the best for both SNPs and indels, when compared against data obtained using short-read platforms. The results of this initial study were recently published in Nature.

In a second evaluation, the team used publicly available data for other mock communities (BenchEven, BenchUneven, BMock12, and MSA2006), comprising 12 bacteria and plasmids, with different taxonomic groups, distributions, and levels of complexity. Most of the assemblers tested took just a few hours to reconstruct the metagenomes; however, Canu required a number of weeks. This led the team to remove Canu from their assessments. As for the initial study, the metagenome assembly best results were obtained using metaFlye.

Summarising this research, Adriel suggested that, for most general metagenomic assembly requirements, metaFlye is his clear recommendation, but if researchers are looking for something faster, they should also consider Raven. Currently, he also suggests the use of one round of Racon and Medaka polishing for optimal results, but cautions that these tools are rapidly evolving, so to ‘always check for updated benchmarks’.

Authors: Adriel Latorre-Pérez