Accessing the inaccessible human genome with long reads


Even though the well-studied human genome sequence was published over a decade ago, it is still incomplete1,2. Ebbert et al. demonstrated how the human genome contains 36,794 “dark” regions in 6,054 gene bodies, which are inaccessible for assembly and alignment based on short-read sequencing3. These loci are important to analyse as they might contain mutations associated with human disease. Compared to other long-read sequencing technologies, the team found that nanopore long sequencing reads were most powerful in resolving these inaccessible regions in the short-read sequencing data, resolving 90.4% of “dark” coding sequences, compared to 49.5% and 64.4% with two other long-read technologies (Figure 1).

‘Oxford Nanopore Technologies outperformed other long-read technologies, resolving 90.4% of dark CDS regions’45

With the high-output, long-read Oxford Nanopore PromethION platform, the sequencing and de novo assembly of a highly contiguous human genome is now obtainable with unprecedented efficiency, as was recently demonstrated by Shafin and colleagues.

Figure 1: Three examples of “dark” gene resolution using long-read technologies, in genes associated with human diseases. In example (C) duplicated regions are indicated by blue bars, and white lines indicate regions that have diverged for short reads to align uniquely. 1: Oxford Nanopore long reads; 2: Alternative long-read platform; 3: Linked-read platform; 4: Short-read platform. Image adapted from Ebbert et al. (2019)3 and available under Creative Commons license (creativecommons.org/licenses/by/4.0).

Shafin et al. used an optimised PromethION sequencing method and assembly workflow to sequence 11 human genomes in 9 days on a single PromethION35. They obtained 2.3 Tb of sequence with an average depth of coverage per sample of 63x and read N50 of 42 kb, including 6.5x of ultra-long 100 kb+ reads.

The team introduced three new computational tools for genome assembly and polishing: Shasta, a de novo long-read assembler, and MarginPolish and HELEN, for assembly polishing. These tools enabled the production of a draft assembly in <6 hours, which was followed by polishing for 29 hours, to achieve 99.9% sequence identity. The authors stated that the outcome could be even further improved by taking advantage of the real-time capabilities of nanopore sequencing: ‘With real-time base calling, a DNA-to-de novo assembly could be achieved in less than 96 hours with little difficulty’1.

  1. Shafin, K. et al. (2019). Efficient de novo assembly of eleven human genomes using PromethION sequencing and a novel nanopore toolkit. BioRxiv doi: 10.1101/715722.
  2. Miga, K. (2019). Telomere-to-telomere assembly of a complete human X chromosome. Presentation. Available at: https://nanoporetech.com/resource-centre/telomere-telomere-assembly-complete-human-x-chromosome [Accessed: 01 Oct 2019]
  3. Ebbert, M. T. W. et al. (2019). Systematic analysis of dark and camouflaged genes reveals diseaserelevant genes hiding in plain sight. Genome Biol 20(1):97.