Long read sequencing improves detection of non coding structural variation


Mohammed Uddin (Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai) focused his talk on detecting non-coding structural variants (SVs) with long-read sequencing, in the context of neurodevelopmental disorders (NDDs). Twin studies have suggested that genetics play a major role in the aetiology of these disorders. Both structural and single-nucleotide variants (SNVs) contribute to their pathophysiology, and whole exome and genome sequencing are becoming first-tier tests for NDDs. However, exome sequencing suggests that only 30-40% of NDD cases can be explained by coding variants, and the remaining >50% of cases remain molecularly unresolved. Research into the non-coding portion of the genome is therefore important to find risk variants associated with NDDs.

Mohammed studied monozygotic and dizygotic triplets from an Emirati family affected by NDDs. Short-read exome sequencing and microarray analysis showed negative results with regards to the underlying genetic association. His team therefore conducted long-read sequencing with Oxford Nanopore technology to identify SVs within the genomes of the triplets, and thereby try and resolve the association. Each genome was sequenced on two MinION Flow Cells and SVs were called with Nanovar. In addition, traditional short-read sequencing technology was also deployed. The SVs of ≥1 kb in the monozygotic twin identified with nanopore sequencing were used as the ‘golden set’.

Mohammed highlighted how the consensus copy number variant (CNV) callset from short-read sequencing data was ~50% smaller than that identified in the golden set from long nanopore reads. Analysis of the golden set also revealed how more SVs were identified in general with long nanopore reads, and more genes were therefore found to be impacted. He gave a specific example of a 60 kb homozygous deletion detected in ORF genes within the affected siblings that was not called in short-read data due to the presence of repeats within the region.

Lastly, Mohammed described how their team annotated the non-coding SVs identified to quantify their impact on non-coding regions of the genome. They discovered that many non-coding elements such as long non-coding RNAs (lncRNAs), small nucleolar RNAs (snoRNas), and microRNAs (miRNas) were impacted by SVs. Significantly more were identified from long-read nanopore data compared to short-read data.

Authors: Mohammed Uddin