Evaluation of hybrid and non-hybrid methods for de novo assembly of nanopore reads

Motivation
Recent emergence of nanopore sequencing technology set a challenge for established assembly methods. In this work, we assessed how existing hybrid and non-hybrid de novo assembly methods perform on long and error prone nanopore reads.

Results
We benchmarked five non-hybrid (in terms of both error correction and scaffolding) assembly pipelines as well as two hybrid assemblers which use third generation sequencing data to scaffold Illumina assemblies. Tests were performed on several publicly available MinION and Illumina datasets of Escherichia coli K-12, using several sequencing coverages of nanopore data (20×, 30×, 40× and 50×). We attempted to assess the assembly quality at each of these coverages, in order to estimate the requirements for closed bacterial genome assembly. For the purpose of the benchmark, an extensible genome assembly benchmarking framework was developed. Results show that hybrid methods are highly dependent on the quality of NGS data, but much less on the quality and coverage of nanopore data and perform relatively well on lower nanopore coverages. All non-hybrid methods correctly assemble the E. coli genome when coverage is above 40×, even the non-hybrid method tailored for Pacific Biosciences reads. While it requires higher coverage compared to a method designed particularly for nanopore reads, its running time is significantly lower.

Availability: https://github.com/kkrizanovic/NanoMark

Authors: Ivan Sovic, Kresimir Krizanovic, Karolj Skala, Mile Sikic