Evaluating the quality of long-read phasing methods in clinically relevant genes
- Home
- Evaluating the quality of long-read phasing methods in clinically relevant genes
Abstract Haplotype phasing is used to determine the specific combinations of variants on each chromosome and is a fundamental tool for fully understanding the impact genetic variants have on disease. Traditional phasing methods rely on population data generated from microarray or short-read sequencing data and require a large number of individuals to be effective, thus have inherent bias against underrepresented populations. Long-read sequencing (LRS) offers direct read-based phasing without population data or parental samples, yet the effectiveness and error rates of phasing associated with clinically relevant OMIM genes remains under evaluated. In our study, we assess phasing error rates using the reference sample HG002, with a focus on medically relevant OMIM genes. We compare multiple Oxford Nanopore chemistries utilizing consensus VCFs from the Genome in a Bottle (GiAB) Consortium and a high quality Q100 VCF from the fully assembled HG002 genome. Data are aligned to GRCh38 and the telomere-to-telomere reference genome (CHM13), with variants phased using Clair3 and WhatsHap. Aligning to CHM13 reduces phasing errors compared to GRCh38, with errors concentrated in fewer genes. We also present analyses from the 1000 Genomes Oxford Nanopore Technologies Sequencing Consortium, which aims to sequence a large subset of samples from the 1000 Genomes Project. Using preliminary results from the first 200 samples sequenced to an average depth of 30x, we highlight the impact of read quality, length, and depth on phasing accuracy across OMIM genes. Our findings offer insights into optimizing LRS-based haplotyping methods and will be of broad utility to the human genetics community, including clinicians and researchers. We anticipate that improvements to phasing will increase the identification of disease-causing variants in individuals with suspected Mendelian conditions who are sequenced on LRS platforms. Biography Nikhita Damaraju is a second year PhD Student at the University of Washington’s Institute for Public Health Genetics, jointly advised by Dr. Danny Miller and Dr. Brian Shirts. Her research is focused on optimizing long-read sequencing data for clinical care and population genetics. Nikhita’s research involves developing and applying statistical methods to derive meaningful information. She is passionate about making a tangible impact on translational health using data.