Main menu

NCM 2021: Accuracy improvements in crop genome assembly using the Q20+ chemistry


Alexander explained that KeyGene are developing a comprehensive computational toolset to perform crop genome analysis ‘on an unprecedented scale’, including de novo assembly, variant detection, and data visualisation. Alexander laid out the four key aspects of reference genomes: correctness, completeness, contiguity, and cost; and explained the benefits of using nanopore sequencing technology for this process. At KeyGene, the team have been evaluating the performance of the new Q20+ (‘Kit12’) chemistry and the R10 pore series, using samples from different plant species. They have also been investigating the impact of plant-trained basecalling models. Alexander stated that they had obtained significant improvements in raw read accuracy using the new Q20+ chemistry and applying basecalling models trained on plant sequence data. For example, for maize whole-genome sequencing data, they obtained a 2.5% increase in raw read accuracy (from 96.9% to 99.4%). Alexander presented lettuce genome sequence data (192 Gb data yield; library prepared with the Q20+ Ligation Sequencing Kit and sequenced on R10.3 PromethION Flow Cells). The data were basecalled using a Guppy Q20 model trained using plant data, and assembly was performed with both Flye v2.9 and the KeyGene STL assembler. Alexander pointed out how their lettuce genome assembly data presented at London Calling in 2018 had been ‘already impressive’ compared to the published short-read-based reference. Here, compared to that 2018 genome, and compared to the lettuce genome assembled using data from an alternative long-read sequencing technology, ‘the KeyGene STL assembler using the Q20 data from Oxford Nanopore shows the best assembly’, in terms of contiguity and accuracy. Alexander added that, with the STL assembler they could ‘finish the assembly within 30 hours’. Moving on to the melon genome (library prepared with Q20+ chemistry and sequenced on R10.3 and R10.4 PromethION Flow Cells); data were similarly assembled with Fly and the STL assembler. Duplex data were also generated for the R10.4 run (16 Gb of the 169 Gb R10.4 run yield). Compared to the melon genome assembly produced in parallel via the alternative long-read technology, there were significantly fewer contigs in the Oxford Nanopore-based assembly. When considering the R10.4 data, basecalled with a model trained using plant data and assembled with Flye v2.9, the quality score ‘became on par’ with that same genome assembled from data obtained using alternative technology. Alexander lastly discussed their analysis of the duplex data alone (which had comprised ~10% of the R10.4 run yield); comparing ~35x genome coverage of duplex data with 35x depth data from the alternative technology, the ‘consensus accuracies of the ONT data [are] actually significantly higher’. ‘We think that this is a breakthrough in the technology’. Alexander concluded that ‘the duplex reads are really amazing’.

Authors: Alexander Wittenberg

入门指南

购买 MinION 启动包 Nanopore 商城 测序服务提供商 全球代理商

纳米孔技术

订阅 Nanopore 更新 资源库及发表刊物 什么是 Nanopore 社区

关于 Oxford Nanopore

新闻 公司历程 可持续发展 领导团队 媒体资源和联系方式 投资者 合作者 在 Oxford Nanopore 工作 职位空缺 商业信息 BSI 27001 accreditationBSI 90001 accreditationBSI mark of trust
Chinese flag