Longer and longer: DNA sequence of more than two million bases now achieved with nanopore sequencing.


The first >2Mb DNA sequence (more than two million DNA bases in one continuous sequence) has been reported using Oxford Nanopore sequencing technology.

The read was reported in a manuscript published on BioRxiv by a team at the University of Nottingham, consisting of Alex Payne, Nadine Holmes, Vardhman Rakyan (Blizard Institute, Queen Mary University of London) and Matt Loose.   They observed that the longest seen to date was 2,272,580 bases in length.  The read was reconstructed from a sequence of eleven individual reads on the basis that the reads appeared consecutively in the channel, mapped contiguously to the reference genome, and the evidence on which they were originally split was weak. The authors noted that the reconstruction process was particularly applicable to ultra-long read preparations.

The read itself can be found here in the Supplementary File Collection 2 of the BioRxiv paper.

Why long reads, and why ultra-long reads?
Traditional short-read DNA sequencing technologies may provide data that is harder to assemble into a complete genome or dataset, like a jigsaw puzzle made from a large number of small pieces.

Nanopore sequencing read lengths are dependent on sample preparation, sequencing the fragment that is prepared. Researchers are now routinely sequencing fragments of 10s or 100s of kb.

With long reads, and even more so with ultra-long reads that are 100s of kb long, it is easier to assemble genomes.  For more information, read this white paper on genome assembly (registration may be required).
With long reads, it is also possible to span tricky regions or even characterise regions that have not been sequenced.  It has been estimated that ~8% of the human genome remains unsequenced (1).


Some resources on nanopore long reads
•    Watch Karen Miga’s talk on spanning the centromere or read the Nature paper
This human genome assembly from the nanopore consortium, where the ultra-long reads enabled assembly and phasing of the 4-Mb major histocompatibility complex (MHC) locus
•    Or this paper from Wigard Kloosterman's group, where long nanopore reads were able to resolve complex chromothriptic regions comprising of over 40 breakpoints.
An assembly of the plant pathogen Rhizoctonia solani. KeyGene were able to assemble the genome with almost 10-fold fewer contigs than when using short-read sequencing, enabling a much more complete and contiguous genome assembly.
•    review other papers that make use of nanopore long reads.

Long reads are also critical when resolving structural variation. This white paper reviews the use of nanopore for SVs (registration may be required).

----
1. Miga et al Nucleic Acids Research 43(20) e133