NCM 2023 Singapore: Redefining telomere-to-telomere genome assembly strategy using the Oxford Nanopore platform


Accurate, haplotype-resolved telomere-to-telomere (T2T) genome reference plays an essential role in investigating genome sequence variation, modifications, and functionality. It also empowers the genetic studies of human diseases and population structures. Currently, the prevalent strategy for creating such genome references is to generate phased contigs through the de novo assembling analysis of high-fidelity long-read sequencing data from the PacBio platform and the ultra-long read sequencing data from the Oxford Nanopore platform. Assembled contigs are further scaffolded into chromosome level genome assembly by using long-range chromatin interaction information (such as Hi-C, 10x linked reads) and/or strand-specific sequence information (such as optical mapping and strand-seq). A similar strategy has been used to create the first T2T human genome reference (CHM13) as well as the first human pan-genome reference. Here, we have explored an alternative strategy where the de novo assembly analysis was performed by using high-quality (>Q30) duplex reads and ultra-long reads (>100 kb) from the PromethION platform and further benchmarked its performance against the established methodology. To avoid potential assembler-specific bias, we have analyzed and compared it by employing both Hifiam and Verkko tools for assembling. The performance results from different assemblers and sequencing platforms enable a researcher to make informed decisions about the most appropriate tool, considering sequencing cost and computational requirements and time constraints for population-based pangenome construction.

Authors: Jianjun Liu