needLR: a structural variant filtering and prioritization tool for long-read sequencing data
- Home
- needLR: a structural variant filtering and prioritization tool for long-read sequencing data
Abstract Over half of individuals with a suspected Mendelian condition remain undiagnosed after an exhaustive clinical evaluation, limiting their ability to benefit from precision therapies, N-of-1 trials, and accurate prognosis or recurrence risk counseling. While several factors contribute to this low solve rate, one salient component is the difficulty of reliably identifying and resolving structural variants (SVs; genomic variants ≥50 bp) using traditional short-read sequencing-based approaches. Long-read sequencing (LRS) offers more accurate and comprehensive SV detection, thereby increasing the average number of SV calls per genome from ~10k to ~25k. This creates a need for a comprehensive LRS-based population-scale SV database and tools to filter and prioritize candidate SVs in unsolved individuals. Here, we present needLR, a tool that uses high-coverage LRS data of samples from the 1000 Genomes Project (1KGP) to assign population allele frequencies to SVs and perform downstream SV filtering and prioritization. needLR incorporates an SV merging tool, JASMINE, to identify analogous SVs between samples both within the 1KGP cohort and between the 1KGP cohort and the query sample. needLR then ranks each SV in the query sample by its allele count in the 1KGP cohort, creating a frequency-based prioritization framework. Finally, needLR annotates each SV in the query sample with medically and genomically informative indicators, including: Does the SV intersect a protein-coding gene? Is it associated with an OMIM phenotype? Is the SV near repetitive DNA sequence, segmental duplications, centromeres, or telomeres? needLR is available as a downloadable, open-source command line tool and as an interactive web application. Biography As a pre-doctoral student in Molecular and Cellular Biology at the University of Washington, Gus has a broad interest in human rare genetic disease. Since starting graduate school, he has become especially interested in using long-read sequencing to characterize pathogenic structural variants.