NCM 2022: Structural variants identification from long and short-read sequencing technologies for daughters from families of the 1000 Genomes Project


We present a comparative analysis of whole genome sequencing (WGS) of three families from the 1000 Genomes Project. The samples (3 daughters from Trios families) were sequenced using long-read Oxford Nanopore Technology (ONT) and compared with short-read sequencing by Illumina. In our work, we focused on the structural variants (SV) - segments of DNA that are at least 50 base pairs in length, abundant in human genomes as reported by the 1000 Genomes Project. We designed an ensemble-based method that takes structural variants output from multiple SV callers, trains them using neural networks, and benchmarked on the gold set of Structural Variants (Chaisson, 2019) as a truth set from the 1000 Genomes Project.

Download the PDF