Telo-Seq - info sheet and recommendations


Introduction

Overview of the document

This document offers comprehensive guidance on the Telo-Seq method for sequencing telomeres in high molecular weight genomic DNA. Telo-Seq is designed to accurately measure telomere length and assign each telomere to a specific chromosome arm. The updated workflow comprises utilises barcoded Telo-adapters for multiplexed Telo-Seq experiments on a single flow cell, along with more detailed analysis through the Epi2me compatible wf-teloseq pipeline.

The following key areas are covered:

  • The role of telomeres in health and disease: understanding the biological significance of telomeres.
  • Telo-Seq method and protocol: an overview and detailed steps of the Telo-Seq method.
  • Telomeric enrichment and length estimation: techniques for enriching telomeric sequences and estimating their length.
  • Sample input and fragment distribution: considerations for sample preparation and fragment analysis.
  • Sequencing setup and run parameters: guidelines for optimal sequencing performance.
  • Example sequencing performance and analysis pipeline: expected outcomes and analysis workflows.

Understanding telomeres through Telo-Seq

Telomeres are essential repetitive DNA sequences located at the ends of linear chromosomes, protecting them from degradation. In humans, they consist of repetitive n(GGTTAG) motifs, ending in a single-stranded 3' G-rich overhang (see Figure 1) (Podlevsky and Chen, 2011). Telomeres gradually shorten with each cell division, and once they reach a critically short length, cells enter a state of senescence known as the ‘Hayflick limit’ (Lulkiewicz et al., 2020). The telomeres provide protective padding as they can shorten without affecting gene expression. This shortening process is closely associated with age-related diseases, including cancer, as many cancer cells bypass this limit by reactivating telomerase or using alternative mechanisms to maintain telomere length, allowing unchecked cell growth.


Telo seq know how image 1

Figure 1. The telomeric 3' overhang. In this example, the overhang starts with ‘GGTTAG’.


In humans there are 22 pairs of autosomal chromosomes, along with a pair of sex chromosomes XX or XY, making up 23 chromosome pairs. Both maternal and paternal chromosomes have telomeres on the P and Q arms (see Figure 2), resulting in 92 individual telomere arms.


Telo seq know how image 2

Figure 2. Inheritance of parental chromosomes and their contribution to individual telomere arms.


Telo-Seq utilises the unique properties of telomeric DNA, allowing for precise measurements of telomere length at the chromosomal arm level. This method provides significant advantages over traditional sequencing techniques, including improved accuracy and the ability to work with high molecular weight (HMW) genomic DNA. By assigning telomere lengths to specific chromosome arms, Telo-Seq allows for detailed analysis of telomere dynamics in health and disease, offering valuable insights into conditions like cancer and age-related disorders.

Telo-Seq protocol overview

Telo-Seq is designed to precisely measure telomere length and assign each telomere to its specific chromosome arm. As illustrated in Figure 3, the step-by-step process is as follows:

1. Ligation of Telo-adapters: Telo-Seq uses the telomeric 3’ overhand to ligate custom barcoded 'Telo-adapters’ onto the end of each chromosome arm.

2. Restriction digestion: the DNA is subjected to a restriction digestion using EcoRV. The enzyme digests most of the chromosome, leaving the telomere and sub-telomere regions intact.

3. 3’ dA-tailing: after digestion, a 3’ dA-tailing step is performed as to prepare the DNA for sequencing adapter ligation.

4. Splint annealing: to mitigate dissociation of the splint from the pre-annealed Telo-Adapter, a reannealing step is carried out, ensuring the presentation of a cohesive end for sequencing adapter ligation.

5. Adapter ligation: the cohesive end created by the annealed splint is then ligated with the sequencing adapter, allowing the DNA to be sequenced.


Telo seq know how image 3

Figure 3. Overview of the Telo-Seq library preparation.


Telo-Seq experiments sequence the “C strand” of the telomere, from the start of the double stranded portion of the telomere, from the outside of the telomere inwards through to the sub-telomere. The ssDNA 3’ overhang of the “G strand” of the telomere is not sequenced.

Discontinuation of single-plex Telo-Seq

The Telo-Seq protocol has been updated to accommodate a multiplexing through barcodes. The previous single-plex approach has been discontinued. Multiplexing provides greater efficiency and cost-effectiveness by allowing multiple samples to be processed simultaneously on a single flow cell, enhancing output. This update responds to feedback from early access users and internal performance evaluations, which showed that multiplexing offers superior performance across different sample types and use cases:

  • Increased throughput as up to 12 different samples can be processed concurrently on a single flow cell, reducing the time and cost per sample.
  • Better utilisation of the sequencing capacity, yielding more data and greater coverage per sample.
  • Barcoding and adapters: custom barcoded Telo-adapters are used to differentiate samples within the multiplex run. Each barcode corresponds to a specific sample, and careful adapter ligation ensures high specificity and minimal barcode crosstalk.

Prepare

Input mass and multiplexing

Telo-Seq offers significant telomeric enrichment compared to standard sequencing methods, enabling precise telomere length measurements. Table 1 demonstrates how Telo-Seq consistently produces more telomeric reads than standard runs using the Ligation Sequencing Kit (SQK-LSK114).


Telo seq know how Table 1

Table 1. Telomeric read enrichment using Telo-Seq compared to the conventional SQK-LSK114 library preparation. Data was obtained from MinION flow cells run for 48 hours on GridION, with all outputs analysed with wf-teloseq using low stringency filtering. Across all tested input masses, Telo-Seq demonstrated a significant increase in telomeric reads compared to SQK-LSK114.


Optimal Telo-Seq performance requires at least 5 µg of HMW DNA per barcode for a 12-plex to achieve full flow cell occupancy for optimum sequencing output. As shown in Figure 4, increasing the DNA input mass improves telomeric read output. However, inputs of less than 5 µg per barcode for a 12-plex yield insufficient library to achieve full pore occupancy on the flow cell, which in turn results in reduced telomeric read output.


Telo seq know how image 4

Figure 4. The effect of varying input mass per barcode on Telo-Seq performance. Mean low stringency filtered telomeric reads ± SD per barcode against input mass. Increasing the input DNA mass per barcode leads to improved outputs.


For accurate telomere length estimation at the individual chromosomal arm level, at least 1,000 telomeric reads per barcode (obtained with wf-teloseq using low stringency filter, see Filtering options and use cases) are required. For this reason, when processing samples through the multiplex protocol, we recommend that between the samples to be processed a minimum of 60 μg is used. For example:

  • 12 x 5 μg inputs = 60 μg total
  • 6 x 10 μg inputs = 60 μg total
  • 4 x 15 μg inputs = 60 μg total
  • 1 x 15 μg input will not yield sufficient library to fill the flow cell, and result in reduced telomeric output.

To guarantee a minimum of 1,000 telomeric reads per barcode, we recommend running the sequencing experiment for 48 hours.

Fragment distribution

Assessing fragment distribution

We have found input sample fragment distribution to be the most important variable that may impact Telo-Seq performance. Optimal Telo-Seq performance is achieved when >90% of the starting DNA fragments are longer than 10 Kbp, due to the inherent length of telomeres and sub-telomeres. Sequencing >10 Kbp fragments allows for better capture of chromosomal context for alignment and arm assignment. Successful alignment of telomeric reads to a genomic reference requires sufficiently unique sequence in the sub-telomeric regions of the chromosome. Therefore, it is recommended that DNA inputs for Telo-Seq do not contain <10% of fragments shorter than 10 Kbp, as shorter fragments may fail to map to chromosome arms, leading to poor coverage. Fragment distributions can be assessed by Pulsed-field gel electrophoresis (PFGE) or Agilent Femto Pulse.

Achieving optimal fragment distribution

Several DNA extraction methods have been tested at Oxford Nanopore Technologies. Optimal fragment distributions for Telo-Seq performance have been observed in the following extraction methods:

Extraction methods like QIAGEN DNeasy and QIAGEN Genomic-tip were found to provide less suitable fragment distributions and are therefore not recommended for Telo-Seq. Other extraction methods may be used, but it is important to ensure that >90% of the fragment distribution is longer than 10 Kbp.

Correcting sub-optimal fragment distribution

If the sample has a high percentage (>10%) of fragments below 10 Kbp, consider using the Short Fragment Eliminator Kit (EXP-SFE001) to deplete shorter fragments. The use of EXP-SFE001 has been shown to improve Telo-Seq performance for samples with a large proportion of fragments below 10 Kbp (Figure 5).


Telo seq know how image 5

Figure 5. Telo-Seq performance of samples with sub-optimal fragment distributions before and after size selection using the EXP-SFE001 kit. Depletion of fragments <10 Kbp as measured by Agilent Femtopulse has a positive impact on Telo-Seq performance.


Sample origin

Telo-Seq development and validation at Oxford Nanopore Technologies primarily used HMW gDNA extracted from GM24385 cell culture, where the telomere and sub-telomere are an average of 8 Kbp long. Fundamentally, Telo-Seq should be compatible with any DNA sample containing the repetitive telomeric n(GGTTAG) motif, although some organisms may have significantly longer telomeres or sub-telomeres which could impact chromosomal mapping. It is important to consider the restriction enzyme cut site positions (see Restriction enzyme choice). If processing samples which are non-human in origin, we recommend performing an in silico digestion of the reference genome to determine theoretical cut sites and verify whether there is any cleavage within the telomere or sub-telomere.

Sequence

Sequencing setup and run parameters

We recommend the following parameters in MinKNOW:

  • Flow cell type: R10.4.1.

  • The latest release of MinKNOW.

  • Basecalling:

    • The latest release of Dorado if not basecalling live.
    • HAC or SUP basecalling model (see SUP vs HAC basecalling section)
  • Kit Selection: select LSK114, even though the Telo-Seq protocol uses NBA114.

  • Run options: runtime limit of 48 hours.

  • Output:

    • Pod5 if basecalling after run.
    • FASTQ or BAM if basecalling live.

Sequencing platform

We recommend using flow cells to sequence a Telo-Seq library as this will maximise the output of a Telo-Seq library. Sequencing can also be performed on MinION and GridION. Table 2 illustrates the results to expect.


Telo seq know how Table 2

Table 2. Representative outputs for 12-Plex Telo-Seq libraries on MinION and PromethION.


Example sequencing performance

Telo-Seq development and validation at Oxford Nanopore Technologies have primarily used high HMW genomic DNA extracted from GM24385 cell cultures. As a result, the expected outputs are based on the performance of a 5 µg per barcode in a 12-plex Telo-Seq run with this specific sample type. Different results may be observed when using alternative samples.


Telo seq know how Table 3

Table 3. Representative outputs of a 12-plex Telo-Seq performed using HMW gDNA extracted from GM24385 cell culture


Telo seq know how image 6

Figure 6. A representative read length distribution for Telo-Seq.


Telo seq know how image 7

Figure 7. The total of Gb sequenced increases over time, at 48 hours of sequencing output plateaus.


Telo seq know how image 8

Figure 8. Q score distribution over 48 hours of sequencing.


Telo seq know how image 9

Figure 9. Pore activity over 48 hours of sequencing. It is expected that a proportion of pores will remain ‘Open’ for the duration of the run.


Telo seq know how image 10

Figure 10. The health of the flow cell deteriorates more rapidly than with non-Telo-Seq experiments.


Telo seq know how image 11

Figure 11. The translocation speed and flow cell temperature over 48 hours of sequencing.


Q-score filtering

The wf-teloseq analysis workflow has a q-score filtering step integrated into the workflow. There is no need to modify the default q-score parameters within MinKNOW when setting up a Telo-Seq experiment or processing the data downstream.

SUP vs HAC basecalling

Whilst the telomere itself is a repetitive polymer of n(GGTTAG), it can contain minor variations within the repeating sequence. For this reason, we recommend using the SUP basecaller model for the greatest sequencing accuracy.

Analyse

wf-teloseq

The Telo-Seq analysis pipeline, wf-teloseq, is hosted on GitHub. wf-teloseq is currently developed and maintained as research software. It does not yet have all the features of a fully supported EPI2ME workflow.

Workflow pathways

There are three pathways to choose from when analysing Telo-Seq data, based on the desired output.

Pathway 1: Global telomere length estimation

Pathway 2: Individual chromosome arm telomere length estimation for samples with matched reference

Pathway 3: Individual chromosome arm telomere length estimation for samples without matched reference

Building and using a custom reference

Filtering options and use cases

When to use low or high stringency filters

Example wf-teloseq output for a human cell line dataset




Breaker





Telomere attrition occurs during genome replication as the chromosomes ends cannot be fully replicated end-to-end. The telomeres provide padding as they do not contain genes, do not require complete replication, and may shorten without impacting gene expression. However, this can only occur a finite number of times before the telomere becomes too short for replication, known as ‘the Hayflick limit’, resulting in cellular senescence (Lulkiewicz et al. 2020). Therefore, telomere length may be correlated with cellular aging. Many oncogenic cells avoid senescence by activating telomerase or alternative mechanisms to elongate telomeres. Telomere characterisation through sequencing can improve understanding of their role in health and disease (Schmidt et al. 2024).

In humans there are 22 pairs of autosomal chromosomes, along with a pair of sex chromosomes XX or XY, making up 23 chromosome pairs. Both maternal and paternal chromosomes have telomeres on the P and Q arms (see Figure 2), for a total of 92 individual telomere arms.

Telo seq Know-how V2 Fig 2 Figure 2. Inheritance of parental chromosomes and their contribution to individual telomere arms.


Telo-Seq overview

Telo-Seq aims to measure telomere length accurately and assign each telomere to a chromosome arm. Telo-Seq uses the telomeric 3’ overhang to ligate custom ‘Telo-Adapters’ onto the end of each chromosome arm (see Figure 3). The DNA is then subjected to restriction digestion and subsequent 3’ dA-tailing. The restriction enzyme digests most of the chromosome, leaving the telomere and sub-telomere intact. A complementary splint is then annealed to the Telo-Adapter to create a cohesive end compatible with the sequencing adapter which is subsequently ligated.

Telo-seq workflow v0.4 LH edit Figure 3. Overview of the Telo-Seq library preparation.


Telo-Seq is currently in registration-based early-access, please register here to gain access to the Telo-Seq protocol and analysis pipeline.


Telomeric enrichment

To demonstrate the extent of telomeric enrichment that is possible through Telo-Seq, Table 1 shows the reported telomeric read outputs from a standard sequencing run with the Ligation Sequencing Kit V14 (SQK-LSK114) compared to Telo-Seq runs (with 1–15 μg input) from the same high molecular weight (HMW) genomic DNA (gDNA) sample. Without telomeric enrichment, there are very few telomeric reads in a standard sequencing run.

Telo seq Know-how V2 Table 1 Table 1. Telo-Seq telomeric read enrichment when compared to a conventional SQK-LSK114 library prep. Aggregate statistics of multiple MinION flow cells run for 48 hours. At all inputs tested, Telo-Seq demonstrates a significant increase in telomeric reads compared to standard approaches using the Ligation Sequencing Kit V14 (SQK-LSK114).


The Telo-Seq protocol recommends an input of 15 µg of HMW gDNA. As Figure 4 illustrates, starting with a higher mass of DNA is beneficial for Telo-Seq performance. Telo-Seq can be used to provide sequencing data for telomere length estimation and specific chromosome arm mapping (see Analysis section). Global estimation of telomere length may be achieved with 300–500 telomeric reads, whereas specific chromosomal arm telomere length estimation requires >1,000 reads to achieve >10X coverage per arm (see Length estimation section). Therefore, it is important to ensure that the input mass for the Telo-Seq protocol has been considered for the intended experimental objective.

Telo seq Know-how V2 Fig 4 Figure 4. An input titration of starting HMW DNA demonstrating Telo-Seq performance improves with increased input mass.


Length estimation

The Telo-Seq pipeline may be used to determine telomere lengths of each individual chromosomal arm, or in the absence of a genomic reference, report the global telomere length of the sample.

Where specific chromosomal arm coverage is required, we recommend a minimum of 10X coverage per telomere arm. 10X coverage yields an accurate median telomere length measurement, with increased coverage yielding increased precision (see Figure 5).

Spread across 92 telomere arms, 10X coverage may be achieved with 1 k telomeric reads, using a genome reference for mapping. Coverage per chromosomal arm may be uneven across different chromosomal arms, therefore 1 k telomeric reads is the minimum recommendation for this application, however, increased coverage is recommended. With fewer than 1 k telomeric reads, the coverage of individual chromosomal arms starts to decrease and may impact the accuracy of the telomere length measurement.

Where global telomere measurement is required, 300–500 telomeric reads are sufficient for a representative measurement without the need of a genome reference. However, it is important to note that global telomeric length measurement will not represent all the chromosomal arms equally. This is demonstrated in Figure 5 where global mean and median telomere length diverge with decreased sample depth.

Telo seq Know-how V2 Fig 5 Figure 5. Violin plots of telomere length distributions for of down-sampled read sets. Blue violins indicate the distribution of reads plotted against length. Median telomere length plotted in orange. Mean telomere length plotted in grey. Data sets are plotted against single arm alignment on the left and global arm alignments on the right. Top) non-down-sampled 20 k telomeric reads aligned to each chromosomal arm at maximum coverage. Middle) 1 k down-sampled telomeric reads represent each chromosomal arm in a similar trend compared to the non-down-sampled data, with >10X coverage of each arm. Bottom) 400 down-sampled telomeric reads, where read coverage per arm is <10X so a global telomere length is reported, and therefore a global measurement is more appropriate.
*In this example two chromosome arms are identical and therefore 91 chromosome arms are reported (Chr13_paternal_P = Chr13_paternal_P and Chr22_PATERNAL_P) of which Chr13_paternal_P telomere length shows a large distribution because the two arms have distinct telomere lengths but not sequence identity.


Sample input considerations

Fragment distribution

Optimal Telo-Seq performance is observed when most of the DNA fragments in the sample are longer than 10 kb. This is due to the inherent length of the telomere and sub-telomere. Sequencing long fragments allows the capture of chromosomal context for arm placement and alignment. To successfully align telomeric reads to a genomic reference, reads must contain sequence homology to the non-telomeric chromosomal sequence. For this reason, we recommend DNA inputs for Telo-Seq should not contain fragments shorter than 8 kb as shorter fragments will be less likely to map to chromosome arms, which may result in poor coverage of some chromosomes.

Several DNA extraction methods have been tested at Oxford Nanopore Technologies. Optimal fragment distributions for Telo-Seq performance have been observed in the following extraction methods:


Extraction methods shown to provide less appropriate DNA extractions are the QIAGEN DNeasy and QIAGEN Genomic-tip methods; use of these kits is not recommended for Telo-Seq.

If you do not have access to a means of assessing HMW gDNA fragment distributions, such as pulse-field gel electrophoresis or an Agilent Femtopulse, consider performing an SQK-LSK114 library preparation using 1 μg of your HMW gDNA extract to assess the fragment distribution of the sample, this may be sequenced on Flongle. Typically, samples that yield an LSK114 read N50 of >15 kb are appropriate for Telo-Seq.

If the sample you plan to perform Telo-Seq with has a large percentage of fragments below 10 kbp (or a read N50 <15 kb as determined by LSK114 sequencing), consider using the short fragment eliminator kit (EXP-SFE001). EXP-SFE001 can be used to size select HMW gDNA by depleting short fragments (<10 kb). The use of EXP-SFE001 on samples containing a high percentage of fragments below 10 kb has been shown to be beneficial for Telo-Seq performance (see Figure 6).

Telo seq Know-how V2 Fig 6 Figure 6. Telo-Seq performance of samples with sub-optimal fragment distributions before and after size selection using the EXP-SFE001 kit. Depletion of fragments <10 kb as measured by Agilent Femtopulse has a positive impact on Telo-Seq performance.


Sample origin

Telo-Seq development and validation at Oxford Nanopore Technologies has been performed using primarily HMW gDNA extracted from GM24385 cell culture, where the telomere and sub-telomere are an average of 8 kb long. Fundamentally Telo-Seq should be compatible with any DNA sample containing the repetitive telomeric n(TTAGGG) motif, although some organisms may have much longer telomeres or sub-telomeres which could impact chromosomal mapping. It is important to consider the cut site positions of the restriction enzyme. EcoRV is utilised in the protocol as most of the human chromosome cut site positions are 2 kb – 10 kb upstream from the telomeric 3’ overhang. We recommend carrying out an in silico digestion of the reference genome to determine theoretical cut sites and to check whether there is any cleavage within the telomere or sub-telomere.


Pre-hybridisation of the Telo-Adapters and Telo-Splint

Telomeres with the repetitive telomeric n(TTAGGG) motif may present a 3’ overhang in one of six different frames (see Figure 7). In humans there is evidence that there is a dominant frame of GGTTAG (Smoom et al. 2023).

Telo seq Know-how V2 Fig 7 Figure 7. The different frames of the n(TTAGGG) telomere 3' overhang. Here the first 7 bases of the 3’ overhang are highlighted to in blue, this is where the complementary Telo-Adapter ligates during the Telo-Adapter ligation.


To account for this, Telo-Seq uses a mix of six Telo-Adapters that make up the ‘Telo-mix’. Each of the six adapters represents a different frame so that all possible telomeric overhangs may be adapted. Figure 8 shows how pre-hybridisation of the Telo-Splint to the Telo-Adapters improves Telo-Seq performance. The protocol also includes a downstream splint annealing step which follows the Telo-Adapter ligation to maximise splinting. The inclusion of this splinting step helps improve performance consistency.

Telo seq Know-how V2 Fig 8 Figure 8. Pre-hybridisation of the Telo-Adapter with the Telo-Splint increases the enrichment of Telo-Seq.


Sequencing set-up and run parameters

We recommend the following parameters in MinKNOW:

  • Flow cell type: R10.4.1.
  • The latest release of MinKNOW (23.07.12 or newer).
  • Guppy (7.1.4) or Dorado (0.4.3 or newer).
  • Kit Selection: The standard Ligation sequencing kit script (SQK-LSK114).
  • Run options: Runtime limit of 48 hours, default minimum read length of 200 bp.
  • Analysis: HAC or ideally SUP basecalling (see the SUP vs HAC basecalling section of this document for more information).
  • Output: POD5 and FASTQ or BAM, default qscore threshold.

Example sequencing performance

Telo-Seq development and validation at Oxford Nanopore Technologies has been performed using primarily HMW gDNA extracted from GM24385 cell culture. Therefore, these expected outputs are based on the performance of Telo-Seq with this sample. A different output may be expected for alternative samples.

Telo seq Know-how V2 Table 2 Table 2. Representative outputs of Telo-Seq performed using HMW gDNA extracted from GM24385 cell culture.


Telo seq Know-how V2 Fig 9 Figure 9. A representative read length distribution for Telo-Seq.


Telo seq Know-how V2 Fig 10 Figure 10. The total of Gb sequenced increases over time, at 48 hours of sequencing output plateaus.


Telo seq Know-how V2 Fig 11 Figure 11. Qscore distribution over 48 hours of sequencing.


Telo seq Know-how V2 Fig 12 Figure 12. The activity of the pores over 48 hours of sequencing. It is expected that a proportion of pores will remain ‘Open’ for the duration of the run.


Telo seq Know-how V2 Fig 13 Figure 13. The health of the flow cell deteriorates more rapidly than with non-Telo-Seq experiments.


Telo seq Know-how V2 Fig 14 Figure 14. The translocation speed and flow cell temperature over 48 hours of sequencing.


Flow cell washing and reloading

If only a global estimation of telomere length is required, it may not be necessary to run a sequencing experiment for as long as 48 hours. For example: 300–500 telomeric reads are required for global length; with 3% reads on target, 10–17 k raw reads would be required. However, as the percentage of reads on target cannot be ascertained during sequencing, we recommend gathering an excess of reads to mitigate low telomeric outputs (see Table 3). Once sufficient raw reads have been accumulated, the experiment may be stopped, and the flow cell can be washed for later use using this method.

Telo seq Know-how V2 Table 3 Table 3. Example outputs of nuclease flushed flow cells. Two 5 μg gDNA input Telo-Seq libraries were prepared. The first library was loaded onto the MinION flow cell and sequenced for 2 hours to collect sufficient reads for a global telomere length estimation. After stopping the run, the flow cell was nuclease flushed and re-primed. The second library was loaded onto the MinION flow cell and sequenced for 2 hours to collect sufficient reads for a global telomere length estimation.


If individual chromosome arm telomere length estimation is required, >1 k telomeric reads are required. For example, with 3% reads on target, 34 k raw reads would be required for >1 k telomeric reads. A higher precision median telomere length measurement may be achieved with higher coverage; therefore, it is strongly recommended that an excess of reads is gathered. Note that flow cell flushing becomes less effective the longer the flow cell is run. After 48 hours the flow cell will be exhausted and flow cell flushing for re-loading is not advised beyond this point.


Analysis

Pass vs failed reads

When using a Guppy basecaller, sequencing reads are binned according to their q-score. Reads with a qscore below 9 are filtered into a fastq_fail output folder. Due to the inherent nature of the telomeric reads, some may have lower quality scores than the q-score threshold. Therefore, to ensure all telomeric reads sequenced are considered in the analysis, we suggest concatenating all fastq_pass and fastq_fail files together. If using the dorado basecaller with all the POD5 through command line, there is no qscore threshold unless stipulated through an optional parameter.

SUP vs HAC basecalling

While the telomere itself is a highly repetitive polymer of n(TTAGGG), it contains minor variations within the repeating sequence. For this reason, we recommend using the SUP basecaller model for the greatest sequencing accuracy. Figure 15 demonstrates the gains in on-target telomeric reads that may be achieved through SUP basecalling vs HAC basecalling.

Telo seq Know-how V2 Fig 15 Figure 15. A comparison of SUP vs HAC basecalling. SUP basecalling may yield a higher percentage of telomeric reads than HAC basecalling.


Analysis pipeline

Telo-Seq is currently in registration-based early-access, please register here to gain access to the Telo-Seq protocol and pipeline. Following registration, you will be provided with a link to the pipeline repository.

There are two pathways to utilise when analysing Telo-Seq data, based on the desired output:

  • Pathway 1: Samples only telomeric read counts and telomere length (‘Raw Reads’ - Unmapped). This takes ~5 minutes to run.
  • Pathway 2: Sample specific chromosome arm telomere read counts and telomere length. This takes ~1 – 3 hours to run (16 – 8 threads respectively). This pathway includes the pathway 1 results.

With pathway 2 there are three different filtering conditions reported in addition to sample only: ‘No Filter, ‘Lenient’ and ‘Strict’. These different filtering conditions are designed to be supportive of different user requirements when mapping to a reference. For instance, chromosome arm mapping for maternal/paternal references may be limited to one variant between arms so to reduce mis-mapping of high coverage samples, removal of reads that are not full length is recommended (strict). However, if coverage is low and/or the sample fragmented, then the small number of mismapped reads may be tolerated for the gain in coverage of reads with shorter sub-telomere length (lenient).

To minimise mismapping of a sample, the ‘strict’ filter is recommended. This filter uses only full length reads that extend to the enzyme cut site, subsequently ensuring the reads span the sub-telomere. This removal of fragmented reads reduces mismapping and noise and provides a higher accuracy in chromosome arm length estimation. However, due to the strict nature of this filter it does remove some fragmented, yet potentially useful telomeric reads.

If output is limited, it is advisable to use the ‘lenient’ filter. This filter uses reads that contain a complete telomere and at least 80bp of the sub-telomere for the chromosome arm length estimation.

The ‘No filters’ setting will use all reads with a map Q score above the default of 10 for the chromosome arm length estimation, but these can be modified by the user.

  • Raw reads = unmapped telomere reads
  • Mapped (no filters) = no additional filters
  • Mapped (lenient) = keep reads where the end mapping position is at least 80 bases beyond last telomere motif.
  • Mapped (strict) = keep reads where the start mapping position is before last telomere motif identification and end mapping position is within 25 bases of cut site (with exception of cut sites beyond 45 kb).

Pipeline output

Replicate Telo-Seq runs performed using HMW gDNA extracted from GM24385 cell culture have shown consistent coverage (see Figure 16) and telomere length distribution (see Figure 17) across the maternal and paternal chromosome arms using a sample with matching reference (HG002 v1.0 T2T). In this example two chromosome arms are identical which is why 91 chromosome arms are used (Chr13_paternal_P = Chr13_paternal_P and Chr22_PATERNAL_P) of which Chr13_paternal_P telomere length shows a large distribution because the two arms have distinct telomere lengths but not sequence identity. Samples that do not match the reference will experience mis-mapping and it is recommended for chromosome arm analysis to generate a sample specific reference.

Telo seq Know-how V2 Fig 16 Figure 16. Chromosome arm (single haplotype) coverage.
For chr22 one haplotype chr arm coverage is included in chr13 due to sequence identity. Chr21 maternal and paternal P arm are highly repetitive sub-telomeres and have cut site of ~65 kb – 150 kb upstream of the telomeric overhang, which is limited within the input sample fragment distribution. Chr5 maternal and paternal P arm have very short sub-telomeres which are also limited in size selection. Generally, there is consistent capture across the maternal and paternal arms with these exceptions.


Telo seq Know-how V2 Fig 17 Figure 17. Chromosome arm (maternal/paternal) telomere length.
For chr22 one haplotype chr arm coverage is included in chr13 due to sequence identity. Chr21 maternal and paternal P arm are highly repetitive sub-telomeres and have cut site of ~65 kb – 150 kb upstream of the telomeric overhang, which is limited within the input sample fragment distribution. Chr5 maternal and paternal P arm have very short sub-telomeres which are also limited in size selection. Generally, there is consistent capture across the maternal and paternal arms with these exceptions.


Methylation

Telo-Seq is a native library preparation, as such DNA modifications are retained on the nucleic acids that are sequenced. Sequencing data may be interrogated for these modifications, such as methylation. However, this is not something that is currently supported as part of the Telo-Seq early access.


References

Schmidt, T.T., Tyer, C., Rughani, P. et al. High resolution long-read telomere sequencing reveals dynamic mechanisms in aging and cancer. Nat Commun 15, 5149 (2024). https://doi.org/10.1038/s41467-024-48917-7

Lulkiewicz, M., Bajsert, J., Kopczynski, P. et al. Telomere length: how the length makes a difference. Mol Biol Rep 47: 7181–7188 (2020). https://doi.org/10.1007/s11033-020-05551-y.

Smoom, R, et al. Telomouse—a mouse model with human-length telomeres generated by a single amino acid change in RTEL1. Nat Commun 14: 6708 (Oct 2023). https://doi.org/10.1038/s41467-023-42534-6

Change log

Version Change
v3, Jan 2025 Document revamped for the release of he Telomere multiplex sequencing (Telo-seq) from DNA using EXP-NBA114, EXP-ULA001, EXP-LFB001 and EXP-AUX003 protocol.
v2, Oct 2024 Addition of reference to article: High resolution long-read telomere sequencing reveals dynamic mechanisms in aging and cancer.
v1, Nov 2023 Initial publication

Last updated: 2/25/2025

Document options

Language: