Making telomere-to-telomere genomic assemblies accessible: examples from human and plant genomes


Knowledge Exchange overview

In this Knowledge Exchange, Sean McKenzie and Alexander Wittenberg discussed how Oxford Nanopore Technologies make telomere-to-telomere (T2T) genome assemblies accessible. They explored how highly accurate and ultra-long reads facilitate the completion of genome assemblies, including challenging, repetitive chromosomal sequences such as telomeres and centromeres.

Sean McKenzie provided a brief overview of T2T assembly pipelines, leveraging multiple data types, as well as present benchmarks for human T2T genome assembly. Alexander Wittenberg then focused on plant T2T genome assemblies, through the examples of Zea mays B73 (maize) and Solanum lycopersicum Heinz 1706 (tomato). He showed how Oxford Nanopore duplex sequencing reads in combination with simplex sequencing hold the potential to provide a single-instrument solution for (near) complete genome assembly, in a fast and cost-effective way.

What did the speakers cover?

A brief introduction to telomere-to-telomere de novo genome assembly: theory, tools, and benchmarks on human data

Abstract
Recent advances in sequencing technologies and analytical tools have finally allowed biologists to build essentially perfect “telomere-to-telomere" (T2T) genomes de novo, unlocking the investigation of an organism’s entire genetic sequence. In this talk, Sean McKenzie, Associate Director, Genomic Applications Bioinformatics at Oxford Nanopore, will give a brief overview of the T2T assembly process, including how the Verkko and hifiasm+ul assembly pipelines leverage multiple data types to build T2T genomes. He will also discuss the data and computational requirements of these tools, as well as present benchmarks for human T2T genome assembly given a variety of input data.

Biography


Sean McKenzie is a bioinformatician with extensive experience in comparative genomics and de novo genome assembly. During his PhD at The Rockefeller University and Post-Doc at the University of Lausanne, he studied genome architecture and evolution within social insects and worked to understand the genomic basis of social behaviour. He has experience conducting de novo genome assembly for a wide range of organisms including bacteria, algae, fungi, insects, and vertebrates using all manner of data types and assembly methods. He joined the Genomic Applications team at Oxford Nanopore in 2020 and currently heads up application benchmarking, as well as US pilot project bioinformatics support.

Single platform telomere-to-telomere assemblies: examples from the tomato and maize genomes

Abstract
In March 2022, the complete sequence of a human genome was published. This milestone achievement was made possible by combining long, accurate PacBio HiFi reads with ultra-long Oxford Nanopore reads in combination with advancements in computational algorithms. This recipe for “telomere-to-telomere” (T2T) genome assembly relies on sequencing data from two different platforms, limiting its adoption. Recently, Oxford Nanopore released an end-to-end workflow for telomere-to-telomere human genome sequencing using only the Oxford Nanopore PromethION device. To test the ability of a single platform to produce T2T crop genome assemblies, we generated Oxford Nanopore sequencing data for two important crop species, Zea mays B73 (maize) and Solanum lycopersicum Heinz 1706 (tomato). For each genome, we generated both Oxford Nanopore duplex and simplex sequencing reads. Duplex reads comprise high-quality consensus reads from DNA strands that are read twice, once from the template strand and once from the reverse-complement strand, while simplex sequences are only read once. KeyGene teamed up with the Telomere-to-Telomere consortium to generate gapless genome assemblies with these Oxford Nanopore duplex/simplex sequencing reads and compared that to assemblies using previously generated HiFi data. In particular, we were interested in comparative analysis of hard-to-sequence genomic regions to better characterize sequence context dependencies in both technologies.

In the case of Solanum lycopersicum Heinz 1706 data analysis, the Verkko assembler was able to generate a complete, telomere-to-telomere genome assembly with only one gap after a few manual interventions. The gap corresponded to the highly repetitive rDNA region. Verkko took advantage of the duplex data quality, which is similar to PacBio HiFi, but tens of kilobases longer. This allowed for a very high-quality initial assembly graph to be constructed from the duplex data, which was then further resolved using the ultra-long simplex reads. The final assembly shows a base accuracy exceeding 99.999% (Q50).

We conclude that Oxford Nanopore duplex sequencing reads are a viable substitute for PacBio HiFi reads, and, in combination with simplex sequencing, have the potential to provide a single-instrument solution for (near) complete genome assembly in a fast and cost-effective way.

Biography


Alexander Wittenberg graduated with an MSc in plant breeding and crop protection at the Wageningen University and completed his PhD at the Laboratory of Plant Breeding. Here, he focused on the development of innovative genotyping methods to study the origin of genome plasticity in crops. In 2007, he joined KeyGene as a scientist, where he continued his work on the development and application of molecular marker methods. Alexander acquired considerable experience in the field of next-generation sequencing, with expertise on a wide range of platforms and applications. Currently, he is responsible for scouting new genomics technologies and involved in the development of innovative sequence-based technologies within KeyGene’s Genome Insights crop innovation platform. Next to his focus on innovation he is closely working with R&D- and business development departments within KeyGene to translate these technologies to the market for KeyGene’s partners.

Authors: Sean McKenzie and Alexander Wittenberg