Genome assembly

High-quality genome assemblies are crucial for their use as reliable reference sequences. However, the short reads produced by traditional sequencing technologies lead to highly fragmented, incomplete assemblies. Short reads cannot span important genomic regions such as repeats and structural variants, resulting in them being assembled incorrectly. In contrast, nanopore technology can deliver long and ultra-long sequencing reads (current record >4 Mb), that can span complex genomic regions, enabling the generation of highly contiguous genome assemblies.

  • Generate more contiguous genome assemblies with long and ultra-long reads
  • Resolve repeats and structural variants
  • Explore epigenetic modifications and eliminate bias through direct sequencing of native DNA
  • Scale to your requirements — from small microbial genomes to large plant genomes
Introduction

Generate more contiguous genome assemblies using long sequencing reads

Large structural variants, repeat sequences, and GC-rich regions are challenging to accurately characterise with short-read sequencing technology, and the resulting genome assemblies tend to be fragmented due to the lack of read overlap. Nanopore technology routinely generates sequencing reads that are tens of kilobases in length, and is also capable of sequencing ultra-long libraries (i.e. read N50 of >100 kb; Figure 1). The greater overlap between ultra-long reads enables easier de novo genome assembly. The longest DNA fragment sequenced to date using nanopore technology is 4.2 Mb, which was achieved using the Ultra-Long DNA Sequencing Kit. The long-read capability of nanopore sequencing not only enables accurate delineation of complex genomic regions such as repeats and structural variants, but also the sequencing of smaller microbial genomes in single reads — negating the need for assembly entirely (see poster).

Figure 1: Nanopore sequencing delivers long and ultra-long read lengths that can span complex genomic regions, enabling the generation of highly complete and contiguous genome assemblies.

Tomato genome assembly metrics

Table 1: Comparison of tomato genome (Heinz 1706) assemblies. The nanopore sequencing assembly was generated using 46x duplex and 42x ultra-long simplex data, while the public SL5.0 assembly was generated using a range of technologies, including an alternative long-read capable sequencing platform plus chromosome conformation capture. Long nanopore sequencing reads enabled the generation of a full telomere-to-telomere reference genome for this organism, with the 14 contigs comprising the 12 chromosomes, plus the chloroplast and mitochondrial genomes. The nanopore sequencing data added an additional 16.7 Mb of sequence data compared to the public reference genome, with a consensus accuracy of Q51.8 (>99.999% accurate). Data kindly provided by Alexander Wittenberg, Keygene, Netherlands. Watch the video.

Comprehensive genomic analysis, including direct detection of modified bases

A common metric for assessing genome assembly quality is contig N50 — the length at which half of the nucleotides in the assembly belong in contigs of this length or longer. The use of long nanopore sequencing reads delivers significantly higher N50 values than provided by alternative sequencing technologies, enabling the generation of more complete and more contiguous genome assemblies (Table 1). In addition, using Pore-C, a complete, end-to-end workflow for nanopore sequencing-based chromosome conformation capture, large genome assemblies can be further scaffolded and corrected. Long sequencing reads also simplify haplotyping, enabling the resolution of compound heterozygosity and parental origin. Furthermore, nanopore sequencing does not require amplification, allowing the direct detection of base modifications (e.g. methylation) alongside the nucleotide sequence for even more comprehensive genomic analyses.

Case study

Making telomere-to-telomere genome assemblies accessible

Using just a single plaform, the Oxford Nanopore platform, it is now possible to generate telomere-to-telomere crop genome assemblies

Alexander Wittenberg, KeyGene, Netherlands

In this Knowledge Exchange, Sean McKenzie (Oxford Nanopore Technologies) and Alexander Wittenberg (KeyGene) discuss how nanopore sequencing makes telomere-to-telomere (T2T) genome assemblies accessible to every lab.

Using data from human and plant samples (including maize and tomato), they discuss how highly accurate and ultra-long nanopore reads facilitate the completion of genome assemblies, resolving challenging repetitive chromosomal sequences, such as telomeres and centromeres.

Accuracy improvements in crop genomes

Case study

Capturing global genomic diversity in the human pangenome

The Human Pangenome Reference Consortium (HPRC) aims to develop a human pangenome assembly that better represents human genomic diversity than current single reference genomes. Initial sequencing of 47 diverse human genomes using the PromethION device revealed how nanopore sequencing requires less coverage to achieve the same level of accuracy as an alternative long-read capable sequencing technology. The team further demonstrated how long sequencing reads enabled the inclusion of structural variants that would previously have been missed in short-read studies.

Sequencing workflow

How do I assemble genomes using nanopore sequencing?

Oxford Nanopore provides a range of sequencing devices suitable for any sized genome assembly project, from small individual microbial genomes to high-throughput, population-scale sequencing of large genomes.

For best practice advice on genome assembly, view our whole-genome sequencing Getting Started guides for small or large genomes. These guides provide a step-by-step overview of the entire sequencing workflow — from selecting the right nanopore sequencing device through to sample preparation, sequencing, and data analysis. Our best practice workflows for human, bacterial and metagenomic genome assembly provide structured, recommended workflows for assembling genomes using nanopore sequencing technology.

Looking to perform microbial genome assembly?

View our simple, end-to-end workflow for microbial genome assembly.

Get started

High-throughput assembly of large genomes

For high-throughput sequencing and assembly of large and complex genomes, such as those of humans, animals, and plants, we recommend the following:

PromethION

Ligation Sequencing Kit

Flye + Medaka

Subscribe

Get in touch


Talk to us

If you have any questions about our products or services, chat directly with a member of our sales team.


Book a sales call

To book a call with one of our sales team, please click below.