Dariusz Plewczyński & Sachin Gadakh
Comparative analysis of structural variants identification by long-read and short-read sequencing technologies in selected human families
About Dariusz Plewczyński & Sachin Gadakh
Dariusz Plewczyński's interests are focused on functional and structural genomics where he aims to make use of the vast wealth of data produced by high-throughput genomic projects. The major tools that are used in his interdisciplinary research endeavor includes statistical data analysis, genomic variation analysis using diverse data sources, bioinformatics, biophysics, and genomics. His goal is to combine structural variants, epigenomic, transcriptional and super-resolution imaging data with spatial and temporal nucleus structure for better understanding of the biological function of genomes, the genomic structural variation within populations, the spatial constraints for the natural selection during the evolutionary processes, mammalian cell differentiation, and finally cancer and auto-immunological diseases origin and development.
Sachin Gadakh is currently a Ph.D. student in the discipline of biological sciences at University of Warsaw. He has obtained his Masters in Science in Bioinformatics from the Centre for Bioinformatics, Pondicherry University. His main interests are studying the causative role of three-dimensional human genome structure in cancer development, analyzing long-read sequencing data to study structural variants, and analyzing chromosome conformation capture-based methods data such as PoreC. He is currently working on a comparative analysis of structural variants identified by various sequencing technologies.
We present a comprehensive analysis of Oxford Nanopore sequencing technology as compared with short-read techniques, such as Illumina. In our study, we focus on structural variants (SVs), at least 50 bp segments of DNA that are unique for personal genomes as it was identified by 1000 Genomes project. Moreover, we designed an ensemble-based method that can process both long-read and short-read sequencing data for the variant discovery process to accurately analyze the human genome. We provide comparative distributions of SVs identified from short- and long-read sequencing datasets for selected families (father, mother, child) and further compare their frequencies against the length. Our findings demonstrate that across all SVs sizes, long-read based SV inference outperforms short-read sequencing.