Fig. 3 Duplex assemblies of a) Drosophila melanogaster b) and c) Caenorhabditis elegans and d) the Zymo mock bacterial community
Oxford Nanopore’s duplex pipeline finds reads originating from both strands of the same double-stranded DNA molecule by comparing reads which pass through the same pore in succession. Duplex base-calling combines signal information from both strands and can generate a 2-pass consensus read sequence with >99.9% modal accuracy (>30 on the Phred scale). Read accuracy from duplex base-calling is independent of read length (Figs. 3a and 3b, top), allowing for arbitrarily long highly accurate reads. These reads can be assembled with the latest generation of high-performance genome assembly tools, including Hifiasm, HiCanu, Verkko (Figs. 3a and 3b, bottom), and La Jolla Assembler (Fig. 3c), leading to near complete chromosome or chromosome-arm on single contigs (Figs. 3a and 3b, bottom). Longer read lengths unlock higher-contiguity assemblies even at low coverages, as can be seen in our assemblies of different length 20x subsets of C. elegans duplex data; assembly of 20x of duplex reads longer than 30 kb (read length N50 = 41 kb) led to an assembly with a 25% higher contig N50 compared with 20x of duplex reads between 15-30 kb (read length N50 = 22 kb) and an order of magnitude higher contig N50 compared with 20x of duplex reads between 5-15 kb (read length N50 = 10 kb). High accuracy reads also significantly improve assembly of samples for which obtaining high molecular weight DNA continues to be challenging, as in some bacteria. For example, a 50-200x duplex dataset for the Zymo mock community with read N50s of ca. 5 kb was mostly assembled to single, nearly perfect circular contigs by the MetaFlye assembler (Fig. 3d).