Mark T. W. Ebbert
Long-read sequencing technologies resolve most dark and camouflaged gene regions
About Mark T. W. Ebbert
Dr. Ebbert is an Assistant Professor of Neuroscience at the Mayo Clinic with a background in computational biology and bioinformatics, focusing on Alzheimer’s disease, amyotrophic lateral sclerosis (ALS), frontotemporal dementia (FTD). He also has experience in genomics studies and analyses, algorithm design, and statistics. He has published in respected journals across cancer, bioinformatics, and Alzheimer’s disease, and recently published a manuscript demonstrating that long-read technologies can traverse the challenging C9orf72 ‘GGGGCC’ repeat expansion.
Complex genomes, including the human genome, contain ‘dark’ regions that standard short-read sequencing technologies do not adequately resolve, including protein-coding genes, leaving many variants that may be relevant to disease entirely overlooked. We systematically identified gene regions that are ‘dark by depth’ (few mappable reads), and others that are ‘camouflaged’ (ambiguous alignment). More than 100 protein-coding genes are 100% camouflaged using standard short-read sequencing. Many known disease-relevant genes are also camouflaged, including CR1, a top Alzheimer’s disease gene, and other disease-relevant genes include NEB, SMN1 and SMN2, and ARX. We further assessed how well long-read technologies resolve these regions, including 10x Genomics, PacBio’s Sequel, and Oxford Nanopore PromethION (Cliveome v. 3.0). We found that long-read technologies largely resolve the camouflaged gene regions, making it possible to identify mutations that may be important in human disease.