Computational paradigm for scalable, ultra-fast, & cost-efficient nanopore sequencing


Abstract

In 2021, we pioneered a new ultra-rapid nanopore sequencing and analysis pipeline that set a world record in DNA sequencing, achieving results in 7 hours and 28 minutes. Since then, we've made significant strides in improving the pipeline's accuracy, speed, and cost effectiveness. Advancements in basecalling software (Dorado) and biochemistry (R10.4.1 Flow Cell and Kit 14) have been foundational. These advancements have resulted in higher accuracy, whilst allowing us to reduce sequencing times. We have now adapted our bespoke computational framework to take advantage of these improvements while maintaining its real-time functionality. A key innovation in our computational framework is the implementation of auto-scaling within a high-performance cloud computing infrastructure, facilitated by Ray. This auto-scaling is finely attuned to the rate of sequencing data generation, enabling resource optimisation and cost-efficiency without compromising the speed of basecalling and alignment processes. We have documented turnaround times of under 30 minutes post-sequencing for basecalling and alignment post-sequencing. Additionally, our pipeline can now simultaneously call small variants, structural variants, and CNVs. The integration of an automated variant curation stage allowed us to generate a curated list of variants within an hour of sequencing completion. Furthermore, we have robustly deployed our pipeline on the newest NVIDIA A100 and H100 GPUs, both on-prem as well as in the cloud on clinical samples in the latest phase of the project. These advancements herald a new era in the field of genome diagnostics. Coupled with state-of-the-art Oxford Nanopore technology, our computational framework has the future potential to be implemented in clinical settings, and has promise of substantial benefits, paving the way for rapid, accurate genomic medicine.

Biography

Sneha Goenka is a PhD candidate in the Electrical Engineering Department at Stanford University, where she is advised by Prof. Mark Horowitz. Her research centers on designing efficient computer systems for advancing genomic pipelines for clinical and research applications, with a focus on improving speed and cost. She is a 2023 Forbes 30 Under 30 Honoree in the Science category, 2022 NVIDIA Graduate Fellow, and 2021 Cadence Women in Technology Scholar. She has a BTech and MTech (Microelectronics) in Electrical Engineering from the Indian Institute of Technology, Bombay, where she received the Akshay Dhoke Memorial Award for the most outstanding student in the program. Sneha starts as an Assistant Professor in the ECE department at Princeton University in Fall 2024.

Authors: Sneha Goenka