Main menu

Generating high-quality reference human genomes using PromethION nanopore sequencing


Miten from the University of California, Santa Cruz, kicked off the Assembly and Scaffolding breakout talking about a collaborative project to generate a pipeline to create reference quality human genomes in 7 days using nanopore sequencing and Hi-C data. The aim of the project was to provide a framework to enable more high quality reference genomes to be generated by increasing sequencing speed and reducing cost, as well as producing a pipeline that is scalable and cheaper.

The first part of Miten’s talk focused on the nanopore sequencing process itself. The data for 11 genomes were produced in 9 days on the PromethION platform. Using the Short Read Eliminator Kit from Circulomics, they achieved 7-fold enrichment for reads >100 kbps, with an average depth of coverage per genome of over 60X, and an average N50 of 42 kbps. Basecalling with the latest basecaller (Flip-flop) and aligning against the GRCh38 reference genome gave modal alignment identity of 93%, and a median identity of 90%.

The second part of Miten's talk focused on the assembly polishing and scaffolding pipeline which was performed in the cloud. For assembly, the Shasta tool was used. Shasta is a new tool developed for nanopore de novo long-read assembly that can be run on a single compute node. Miten showed that Shasta can produce complete human genomes in around 6 hours with comparable contig NG50s and fewer misassemblies compared to the tools Flye, Canu and Wtbg2, at a fraction of the time and cost. Miten then discussed two new tools for two-step polishing of the assemblies: marginPolish - a graphical based assembly alignment polisher and HELEN - an RNN-based consensus sequence polisher. Miten showed data comparing the consensus accuracies after the two step MarginPolish and HELEN vs the Racon (4x) and Medaka pipeline; the MarginPolish / HELEN approach came out on top in his tests. MarginPolish and HELEN were also quicker and cheaper to run. After assembly and polishing, the team finally added Hi-C long-range data to generate chromosome-level scaffolds.

Wrapping up his talk, Miten stated that with the PromethION and their analysis pipeline they have been able to generate long-read, high-quality, and high-coverage genomes in less than 7 days, for less than $10k per genome. Miten also announced that all three tools have been publicly released today on github.

Authors: Miten Jain

Getting started

Buy a MinION starter pack Nanopore store Sequencing service providers Channel partners

Nanopore technology

Subscribe to Nanopore updates Resources and publications What is the Nanopore Community

About Oxford Nanopore

News Company timeline Sustainability Leadership team Media resources & contacts For investors For partners Working at Oxford Nanopore Current vacancies Commercial information BSI 27001 accreditationBSI 90001 accreditationBSI mark of trust
English flag