The time is ripe for gapless genome assemblies case study

Fruit production remains a huge source of income for many countries1,2. However, like many plants, their mass cultivation is threatened by climate change and the accompanying introduction of pests and diseases. Breeding of new cultivars that are resistant to such stress factors represents the kernel to securing sufficient production; to succeed in doing so is largely dependent on a comprehensive understanding of the genome1.

With over 4 million tons of cherries produced across 6.7 million hectares per year, cherries are an economically important fruit. As with many fruits, cherries are bred to be resistant to climate change and disease, as well as to be delicious. From a practical viewpoint, breeding new cultivars is labour-intensive and time-consuming owing to their perennial nature. To make selective breeding a more viable practice in cherries and maximise yield, genomic information is required to guide breeding practices. However, plant genomes assembled using short-read sequencing data tend to be fragmented, containing many gaps, primarily due to their repetitive nature3, the genome of Prunus fruticosa, commonly known as the dwarf cherry, has proven to be no exception. Incomplete genome assemblies can lead to ambiguity in our understanding, as key genetic information may be missed.

‘…(nanopore) technology alone can sufficiently produce a high-quality complete genome draft1

Considering these difficulties, Wöhner et al. generated a draft assembly of P. fruticosa using long reads generated on the Oxford Nanopore PromethION™ device1. Wöhner highlighted that nanopore sequencing can be a standalone technology for generating high-quality de novo assemblies of complex genomes, with no need to ‘rely anymore on short-read data for polishing’. Using long nanopore reads alone, the team obtained a final assembly with a scaffold N50 of around 44 Mb and a BUSCO score (a measure of genome completeness) of 98.7%, representing a highly contiguous assembly. To add the cherry on the cake, the team were able to largely resolve the parental haplotypes of the tetraploid (4n) genome with just 30x depth of coverage. This assembly will prove to be an invaluable resource for determining future breeding strategies, and a foundation for further molecular and evolutionary Prunus research.

The humble banana is one of the most widely consumed fruits globally2. Their cultivation is not only essential for providing populations with sustenance, but also for securing many economies2. Strategic breeding programs are necessary to improve crop quality and yield, though achieving this depends on a comprehensive knowledge of the banana genome2. Like many crop genomes, the banana genome presents a challenge to assemble owing to the abundance of repeat elements, structural variants, and low complexity regions1,2. Short-read-based banana genome assemblies suffer from low contiguity due to the poor mappability of reads in repetitive regions2.

Figure 1: Chromosomal size comparison between two banana genome assemblies demonstrates the benefits of long and ultra-long nanopore reads. The increased lengths of the yellow chromosomes from the latest assembly (Belser et al. 20212) can be attributed to the inclusion of repeat elements accurately resolved with the long nanopore reads. The yellow chromosomes have far greater representation of centromeric sites (red) and the ability to capture telomeres at the chromosomal ends, which were missed by the older assembly (white chromosomes; Martin et al. 20164). Image taken from: Belser et al. (2021)2.

In light of this, Belser et al. used long nanopore reads to assemble the genome of the domesticated banana species Musa acuminata. The team obtained 177x genomic depth of coverage from whole-genome sequencing on a single PromethION R9.4.1 Flow Cell; of this, 17x depth was obtained from reads >75 kb. Contig N50s increased in length from an average of 42 kb in previous short-read assemblies to 32 Mb, and crucially the size of the genome assembly matched closely to the estimated genome size, for which previous assemblies had fallen short. Furthermore, previous iterations based on short-read data reported just 130 rDNA gene units, compared to the 7,696 gene units reported in this assembly (Figure 1). These data indicate that complex regions of the genome were better resolved in this latest assembly, and ultimately culminated in the completion of five chromosomes, telomere-to-telomere.

Even using an alternative, long-read sequencing technology, the team found that ‘centromeric regions, detected with centromeric repeats, are very fractionated ... underlying the importance of ultra-long [nanopore] reads to resolve these highly repetitive regions’, and therefore highlighted the importance of ultra-long reads in the production of a high-quality assembly of the banana genome, which will help decipher this fruit’s evolutionary history and support further genetic studies2.

‘Gapless and telomere-to-telomere assembly of chromosome sequences is now possible2

1. Wöhner, T. et al. Genomics. 113:4173-4183 (2021).

2. Belser, C. et al. Commun Biol. 4(1):1047 (2021).

3. Rousseau-Gueutin, M. et al. GigaScience. 9(12) (2020).

4. Martin, G. et al. BMC Genomics. 17:243 (2016).