Interview: Advancing long-read de novo genome assembly methods in clinical research

Dr. Karen Miga is an Assistant Professor in the Biomolecular Engineering Department at University of California, Santa Cruz (UCSC), and an Associate Director of the UCSC Genomics Institute. In 2019, she co-founded the Telomere-to-Telomere (T2T) Consortium, an open, community-based effort to generate the first complete assembly of a human genome. Additionally, Karen is the Director of the Reference Production Center for the Human Pangenome Reference Consortium (HPRC).

We caught up with Karen to discuss how she became interested in the unexplored areas of the human genome, how nanopore sequencing is helping in the generation of a human reference genome, and the impact the work of the T2T Consortium is having on our understanding of human genomics.

You can also watch her recent talk here, where she covers her work in more detail.

Watch talk

What are your current research interests?

My career has been dedicated to understanding the structure and function of sequences in unexplored regions of our genome. The largest and most persistent gaps in our genome marked locations enriched with highly repetitive DNA or sequences for which there are many near-identical tandem copies, known as ‘satellite DNAs’, that extend for millions of bases. My long-term objective is to develop computational and experimental methods to understand the organisation, genetic diversity, and functional impact of satellite DNAs.

What first ignited your interest in genomics?

I started my career around the same time as the first release of the initial draft of the human genome, and I was fascinated to learn that roughly half of our genome was defined by repeats. It was also at that time I understood that some parts of our genome, around 8-10%, were so full of repeats they could not be confidently sequenced or mapped. It was so interesting to think about what role these mysterious sequences could have in cellular function and human disease.

The recent work of the Telomere-to-Telomere (T2T) Consortium has produced the first truly complete sequence of a human genome. What impact could this have for researchers?

The availability of a complete genome sequence will advance our understanding of the most difficult-to-sequence and repeat-rich parts of the human genome. In the future, when someone has their genome sequenced, researchers and clinicians will be able to identify all the variants in their DNA and use that information to better guide their healthcare. Knowing the complete sequence of the human genome will also provide a comprehensive framework for scientists to study human genomic variation, disease, and evolution.

How is nanopore sequencing helping in the generation of the human reference genome? How has it benefitted your work?

Sequencing technologies capable of providing long reads have offered a huge ‘step change’ in our ability to correctly assemble human diploid genomes. Ultra-long nanopore sequencing methods, or the ability to routinely sequence reads that are at least 100 kb in length, have been incredibly important for the Telomere-to-Telomere (T2T) Consortium in generating phased diploid assemblies — even over the most repeat-rich and complex regions. Additionally, having information about DNA modifications has allowed us to study methylation profiles across complete chromosomes — providing insight into new regions of epigenetic regulation in the genome.

What have been the main challenges in your work and how have you approached them?

Satellite DNAs are highly variable, meaning they are different in repeat copy number between individuals in the population. To study this variability and the epigenetic organisation, we need to work with accurate maps of these regions. My work with both the T2T Consortium and the Human Pangenome Reference Consortium (HPRC), has been to lead a team to optimise new methods to analyse these newly assembled regions and expand our epigenetic and genetic maps in these regions. To address these questions my lab has been using long-read sequencing, assembly, and new functional assays, such as DiMeLo-seq.

What's next for your research?

By improving our understanding of variation in these highly repetitive regions, we can begin to explore variants that are associated with human health and disease.

To learn more about other applications of nanopore sequencing in human genomics research, click here.