Requirements
Telomere-to-telomere sequencing (T2T) know-how document
FOR RESEARCH USE ONLY
Contents
Introduction
Data and results
- 1. Depth vs performance: ULK
- 2. Depth vs performance: Pore-C
- 3. Depth vs performance: Assembly polishing
- 4. Analysis
Change log
Introduction
This know-how document provides supplementary information for the end-to-end workflow for telomere-to-telomere sequencing of the human genome using the Oxford Nanopore PromethION platform.
The Telomere-to-telomere sequencing (T2T) on PromethION (SQK-APK114, SQK-LSK114, and SQK-ULK114) protocol includes three separate sequencing experiments: Ultra-Long DNA Sequencing Kit V14 (SQK-ULK114), Assembly Polishing Kit V14 (SQK-APK114), and the Pore-C protocol using the Ligation Sequencing Kit V14 (SQK-LSK114).
The telomere-to-telomere (T2T) workflow combines ultra-long reads, Pore-C and our new assembly-polishing chemistry to completely resolve haplotypes and achieve a state-of-the-art Q50 human assembly.
Our recommendation is that four PromethION Flow Cells are used for telomere-to-telomere sequencing of a single human sample, and are allocated accordingly:
Preparation | Kit | R10.4.1 PromethION Flow Cells | Input (whole blood) |
---|---|---|---|
Ultra-long DNA sequencing | SQK-ULK114 | 2 | 2 x 1.6 ml (2 x ~6 million cells) |
Pore-C | SQK-LSK114 | 1 | 5–10 ml (~10 million cells) |
Assembly polishing | SQK-APK114 | 1 | 1 ml (for 5 µg gDNA) (~5 million cells) |
Data and results
Depth vs performance: ULK
We see that 130 Gb of pass data (equating to ~43X depth of the human genome) provides good assembly contiguity with a high number of T2T contigs and scaffolds (when combined with pore-C).
Figure 1 shows how performance (the number of T2T contigs/scaffolds) differs with varying amounts of ULK data, shown as both depth of coverage and total output (Gb). The figure represents data phased with the Verkko native Pore-C phaser. Also shown is the expected output/depth from ULK experiments.
Figure 1: T2T contig and scaffold assembly performance across ULK data coverage depth. Data is phased with the Verkko native Pore-C phaser. Solid vertical lines represent median coverage depths obtained (in total over two flow cells) from a number of ULK experiments, with the shaded green box representing the interquartile range of the dataset (which we equate to expected depth).
It is important to consider how the assembly quality is affected by the N50 yielded from the ULK preparation. Our data shows an N50 of 100 kbp is on trend with that of 120 kbp, but an N50 of 75 kbp shows some deterioration in performance (Figure 2).
Figure 2: Initial data on the dependence of N50 on the performance of ULK data. At 45X depth (filtered data) the number of T2T configs and T2T scaffolds produced with an N50 of 100 kbp (purple •) appears on trend with 120 kbp data (blue •). A noticeable drop in performance is observed at 75 kbp (green •). As in Figure 1, solid vertical lines represent median read depths obtained (in total over two flow cells) from a number of ULK experiments, with the shaded green box representing the interquartile range of the dataset (which we equate to expected depth).
Depth vs performance: Pore-C
Pore-C provides long-range data; longer range than even Ultra-long data, and is required for genome phasing. Data supports that 25X depth of the human genome from Pore-C data gives us good contig and scaffold assembly performance (Figure 3). In our experiments we saw no benefit to increasing Pore-C coverage depth above 25x.
Without the Pore-C data you will obtain no information for T2T contigs and/or scaffolds. If the Pore-C experiment is omitted, T2T contigs will be broken at un-spanned homozygous regions, as Verkko won’t know which haplotypes on either side belong together.
Figure 3: Dependence of phasing performance on Pore-C human genome depth. Solid vertical line represents median depth obtained from a number of Pore-C experiments, with the shaded blue box representing the interquartile range of the dataset (which we equate to expected depth).
For Pore-C data generation, ultimate performance is dependent on how well DNA cross-linking occurred during the sample preparation, and not solely on the outputs/depths achieved during sequencing.
Depth vs performance: Assembly polishing
Assembly polishing represents one of the final steps in the T2T workflow and uses the data generated with the Assembly Polishing Kit (SQK-APK114) with the aim to attain Q50 accuracies. The graph below (Figure 4) details the relationship between the amount of APK data and Q-accuracy.
Figure 4: Dependence of Q-accuracy on APK output (depth). The solid vertical line represents median depth obtained from a number of APK experiments, with the shaded green box representing the interquartile range of the dataset (which we equate to expected depth).
Read lengths achieved in APK sequencing runs are expected to be ~6-8 kb. If assembly polishing is not used during the workflow assemblies are expected to be in the range of ~Q40 rather than ~Q50.
Analysis
Current data processing requirements dictate that experienced bioinformaticians are needed, with an appropriate level of compute power as described in the community protocol. Details of how to undertake the analysis are also contained within the community protocol with appropriate Github links. An example Nanopore-only T2T dataset is also linked via this EPI2ME blog post.
Change log
Date | Version | Changes made |
---|---|---|
February 2025 | V1 | Document release |