Generating high-quality reference human genomes using PromethION nanopore sequencing
About Miten Jain
Miten is an Assistant Research Scientist at the University of California, Santa Cruz. His research interests include developing methods for long-read sequencing of DNA and RNA, methods for detection of base modifications, and software for analysis of MinION and PromethION data.
To catalogue and associate all forms of human genetic variation to health and disease, a new generation of genome sequencing and assembly technologies is required. However, current workflows for producing high-quality human genome assemblies have overall cost and production time bottlenecks that prohibit scaling to hundreds of individuals. We designed and evaluated an optimized PromethION-based workflow to produce near reference quality genome assemblies for the offsprings from ten parent-offspring trios. We demonstrate the production of long read, high-quality, and high-coverage genomes with a less than one-week total turnaround time from sample extraction to complete assembly, and a total projected cost of less than $10k per genome. To lower costs and improve quality we have developed three new tools: 1) Shasta - a nanopore de novo long read assembler that on a single compute node can produce complete human genomes in around 6 hours; 2) marginPolish - a new graphical model-based assembly polisher that improves on earlier methods in both cost and accuracy; and 3) HELEN - an RNN-based multi-task learning model that further refines the base and run-length prediction for each genomic position and produces state-of-the-art results. We evaluate the performance based on assembly accuracy, throughput/timing, and cost and demonstrate improvements relative to current best-of-breed in all areas. Recognizing that even 100kb reads are insufficient to scaffold through the most repetitive regions of the human genome, we augment this sequencing with a Hi-C long-range library to facilitate scaffolding and haplotype phasing.