NCM 2023 Houston: MethPhaser: methylation-based haplotype phasing of human genomes

The assignment of variants across haplotypes, a process called phasing, is crucial for predicting the consequences, interaction, and inheritance of mutations and is a critical step in improving our understanding of phenotype and disease. While there are three main phasing methods, only one (read-based phasing) can comprehensively provide information also about de novo single nucleotide variants (SNVs) and their origin. But this is often limited by the read size and length of homozygous regions in the genome. To address this, we developed MethPhaser, the first method that uses haplotype-specific methylation signals from nanopore sequencing reads to extend SNV-based phasing. MethPhaser operates on a set of already-phased SNVs to extend or merge individual phased regions together, often by extending the phase blocks into homozygous regions that contain haplotype-specific methylation signatures. Benchmarking using trio-based phasing data showed that MethPhaser is able to extend the genome-wide phasing by 1.6 to 2.5-fold, while only marginally increasing the phasing error rate from 0.03% to 0.05%. We further evaluated its performance on various human populations (HG01109, HG02080, and HG03098), as well as across blood samples from a cohort of patients with cardiovascular disease. In each case, MethPhaser is able to improve phaseblock N50 with methylation information. MethPhaser represents a novel approach that uses easily accessible nanopore methylation data to improve phasing and thus the interpretation of variant interactions across many medically important genes. MethPhaser is open-source and available at https://github.com/treangenlab/methphaser.

Authors: Yilei Fu