Nanopore sequencing offers advantages in all areas of research. Our offering includes DNA sequencing, as well as RNA and gene expression analysis and future technology for analysing proteins.

Learn about applications
View all Applications
Resources Investors Careers News About Store Community Contact

Unprecedented access to haplotype-resolved biology enabled by ultra-long reads and Pore-C


Date: 19th May 2022

Longer reads contain more phase information which can be used by both genome assemblers and molecular phasing tools, leading to longer haplotype-resolved contigs and longer phase blocks

Download the PDF

Fig. 1 Haplotype-resolved assemblies a) concept b) and c) collapsed and trio-binned ONT assemblies, respectively d) pipeline e) phasing f) and g) resolving haplotypes h) and i) final assemblies

New assembly pipeline enables chromosome-scale haplotype-resolved assemblies of large diploid genomes using a combination of long nanopore reads and Pore-C data

Many assemblers collapse diploid genomes into a haploid assembly, mixing variants from both haplotypes randomly (Fig. 1a). Each contig/scaffold of a collapsed assembly has k-mers from both parents (Fig. 1b). It is preferable to have an assembly for each separate haplotype, and this is often achieved by trio binning. Here, unique k-mers are extracted from each parent’s data and then used to separate the reads into paternal and maternal. The two sets are then assembled separately. This results in one assembly per haplotype where each contig/scaffold only has k-mers from one parent (Fig. 1c). However, parent data is not always available. Here we present an alternative where phasing, based on long-reads and Pore-C, enables reads to be separated into haplotypes without the need for parent data. The pipeline, based on DipASM, first assembles ONT reads into a collapsed assembly, aligns the long reads back and calls variants (Fig. 1d). These variants are then phased into chromosome-scale phaseblocks. We obtain a single phaseblock for each chromosome, containing virtually all variants with correct phasing (Fig. 1e). Next, reads are tagged using the phased variants and separated into haplotypes. The vast majority of base pairs can be phased in this way. A final assembly step yields a chromosome-scale assembly for each haplotype. The resulting scaffolds either stem from the paternal or the maternal haplotype and have human-reference scale N50s (Figs. 1f and 1g). Distinguishing maternal and paternal scaffolds is difficult without trio information, and thus both assemblies are a mix of paternal and maternal scaffolds. Finally, Fig. 1h) and 1i) show dot plots for both assembled haplotypes compared to the T2T CHM13 assembly.

Fig. 2 Molecular phasing with ultra-long reads

Variant phasing with ultra-long reads allows up to chromosome-arm-scale association of genomic features from a single sample – no parental sequencing required

A large number of interactions between genomic and epigenomic variants happen in cis, i.e. between variants on the same copy of a chromosome. Traditionally trio sequencing (sequencing a proband as well as both parents) has been used to associate these variants with their chromosome of origin – and thus each other. However, there are many drawbacks to this approach. For instance, it may be difficult to obtain samples from parents, sequencing parental samples increases costs, and the whole approach fails for loci which are heterozygous in all three samples. With ultra-long reads (reads with N50s > 50 kb), single nucleotide polymorphisms can be phased across hundreds of megabases (Figs. 2a and 2b), and this phasing information can be propagated to structural variants and haplotype-specific differentially methylated regions by haplotagging reads. Genomic variants can thus be associated with thousands of other genomic variants present on the same copy of a chromosome (Fig. 2c). Furthermore, because some haplotype-specific differentially methylated regions are consistently methylated on either the paternally or the maternally derived chromosome (imprinting control regions for imprinted genes), many phase blocks can be directly associated with parent of origin and thus each other (Figs. 2d-2f). Fig. 2g shows an example of multiple haplotype-specific differentially methylated regions and heterozygous SNPs and SVs which can be associated with a chromosome of origin via phasing with ultra-long reads.

Recommended for you

Open a chat to talk to our sales team