Detection of mosaic and somatic structural variants with Sniffles2

Abstract Long-read sequencing remains the most accurate method to identify complex genomic alterations, including structural variations (SVs). While detecting germline SVs is technically challenging, tools like Sniffles2 have improved over current methods. The next hurdles are accurately detecting condition-specific SVs (somatic variants) within an organism and SVs present in only a few cells within a tissue (mosaic SVs). This work presents advancements in tackling these issues. First, we assessed the somatic SV detection pipeline using the well-known COLO829 cancer cell line, providing a new benchmark for somatic SV assessment. Analysis of somatic variants (VAF 10–100%) revealed 346 SVs (1.13%) unique to single cancer replicates despite similar coverage levels (~68x), highlighting the cell line's instability. This led to proposing updated benchmarks for somatic SVs for GRCh38 and a novel benchmark for CHM13-T2T. Next, the role of mosaic SVs across brain samples was investigated. We identified recombinants of ALU or other repeat elements in neurodegenerative diseases such as Multiple System Atrophy. Validation using PCR and Sanger sequencing confirmed two ~6% variant allele frequency mosaic SVs based on 55x bulk nanopore-sequenced brain data: a 127-bp deletion in an intron of RBFOX3 and a recombinant between a novel inserted ALU-Y and an existing ALU-Y element in the reference. Additionally, 26 ALU-ALU recombinants were identified using Sniffles2. Overall, Sniffles2 accurately captures germline, somatic, and mosaic SVs with high precision, enabling new insights into human genome polymorphisms and their implications in various adult diseases and cancer. Biography Dr. Fritz Sedlazeck is an Associate Professor at Baylor College of Medicine and an adjunct Associate Professor at Rice University. He has led a research group since 2017 at the Human Genome Sequencing Centre at Baylor College of Medicine. Fritz’s research focuses on developing computational methods to detect and analyze genomic variations with a focus on structural variations. Structural variations (SVs) are genomic events that manipulate multiple positions in a genome, which impact evolution, genomic disorders, regulation, as well as play an important role in explaining multiple phenotypes. His group focuses on the mechanisms of SV formation across multiple species and to improve our understanding how these complex alleles evolve and impact phenotypes. Over the years, Dr. Sedlazeck has led multiple efforts from large-scale short reads (e.g. Topmed, CCDG) to long reads (CARD, All of US) to study SV occurrence, impact, and mechanism of SV.

Authors: Fritz Sedlazeck