Haplotype-resolved repeat expansions & methylation patterns in 1000 Genome Project data


Abstract

Expansions of short tandem repeats (STRs) and changes in DNA methylation patterns are both causes of rare genetic disorders, with baselines for both not possible from databases such as the 1000 Genomes Project (1KGP) due to the limitations of short-read sequencing. We have therefore re-sequenced 300 1KGP individuals with nanopore sequencing, to an average coverage of 30x and read N50 of 40 kbp, out of a planned total of 800 individuals. For the first 100 samples, analysis of known disease-causing repeat expansions, including characterization of motif sequence alleles and haplotype-resolved repeat lengths, has revealed expected levels of variation, but has also included the identification of some premutation alleles in this presumably healthy control set. These data will allow us to filter and prioritize variants in individuals who are unsolved after standard genetic testing and to compare expansion sizes to benchmarks when assessing the ability for the expansion to be tolerated. Separately, long-read sequencing data enables concurrent analysis of haplotype-resolved methylation patterns, enhancing our understanding of the impact of CG-rich repeat expansions on methylation and gene expression. We have globally quantified CpG methylation, confirming known differential methylation regions (DMRs) on autosomes including the loci associated with Prader-Willi and Beckwith-Wiedemann syndromes. Additionally, we assessed methylation variation in X-chromosome haplotypes of 46 XX individuals, developing computational methods for determining X-chromosome inactivation status. Our established database for common DMRs in 1KGP samples will facilitate rapid identification of methylation differences related to imprinting or X-linked disorders using automated analysis pipelines.

Biography

Sophia Gibson is a second-year PhD student in the Department of Genome Sciences at the University of Washington. Sophia’s dissertation work is undertaken in Dr. Danny Miller’s lab where she focuses on the applications of long-read sequencing to improve the diagnostic rate of rare Mendelian disorders — particularly those caused by tandem repeat expansions as well as imprinting and X-linked disorders. This is mainly through the development of interactive web-based tools and analysis pipelines.

Authors: Sophia Gibson