Blog: Pushing the boundaries of rare disease research
Tue 6th June 2023
In the 'Rare disease' Breakout session at London Calling 2023, we heard from three scientists on how nanopore sequencing has the potential to revolutionise their areas of clinical research. Spanning from opthalmology, to neuropathies, to hemostatic and metabolic disorders, their research demonstrates how long nanopore reads can reveal disease-associated variants that are inaccessible to traditional technologies, furthering our understanding of the genetics of rare diseases.
Solving genetically undiagnosed inherited neuropathy families: long-read sequencing to the rescue
The ANZAC Research Institute, Australia
Marina Kennerson began her talk by describing inherited neuropathies: diseases of the peripheral nervous system. The most common, Charcot-Marie-Tooth (CMT) neuropathy, impacts one in 2,500 people and leads to chronic, lifelong disability; there is currently no cure. Over 1,000 mutations across over 100 genes have been implicated in CMT, yet 38% of families remain genetically unsolved. These challenges are the focus of Marina’s research.
First, Marina introduced the SORD gene, biallelic mutations in which are a common cause of CMT type 2 and distal motor neuropathy. Affected subjects typically carry a homozygous or compound heterozygous deletion. Downstream of the SORD gene is a highly similar pseudogene; the mutations can also occur here, but are only pathogenic when present in the gene. Marina described a clinical research sample for which short-read sequencing had identified putative compound heterozygous mutations in SORD; however, distinguishing between reads representing the SORD gene and those from the pseudogene can be challenging with short-read technology. In collaboration with Ira Deveson (Garvan Institute of Medical Research, Australia), Marina and her colleagues tested the potential for long nanopore reads to characterise these mutations. Generating ~30x depth of coverage with a median read length of ~13 kb, with nanopore sequencing it was possible to clearly distinguish between the gene and pseudogene and identify that both SORD alleles carried a pathogenic variant.
‘Long-read sequencing accurately distinguishes genes and pseudogenes and phases biallelic mutations’
Next, Marina described the disease CANVAS, an ataxia which involves a biallelic repeat expansion in the RFC1 gene. Pathogenic variants comprise ~400–2,000 repeats of the motif AAGGG. Marina and her team had previously used PCR-based and Southern analyses to identify these repeat expansions. She shared an example in which these methods had suggested the presence of 1,325 AAGGG repeats in a sample, but was not able to fully explain the phenotype.
The team then utilised adaptive sampling, a bioinformatics-based, real-time, PCR-free method of targeted nanopore sequencing, to study this clinical research sample. Enriching for and sequencing the region of interest with long nanopore reads, they were able to see the previously identified repeat expansion, spanning 1,010 repeats — however, they also identified 980 copies of the canonical repeat AAAGG, which had not been found with traditional methods. Marina highlighted the significance of this finding: when RFC1 was first reported, this second repeat was considered not to be pathogenic, but its presence here indicated its pathogenicity at this length and coupled with the AAGGG repeat.
Finally, Marina described a study of clinical research samples from subjects within a family affected by a hereditary sensory neuropathy. In 2003, linkage analysis had identified a region on chromosome 3 but — despite extensive work including Sanger whole-genome and whole-exome sequencing — a genetic basis of disease had not been identified. When RFC1 was reported in 2019, the disease phenotype suggested its involvement, so the team used a PCR-based method to analyse a research sample from the affected family. This identified the canonical AAGGG expansion, but was not able to resolve another.
Twenty years after the initial study, the team used long nanopore reads to characterise this clinical research sample. The long, PCR-free reads enabled identification of both the canonical expansion and a second expansion (AGGGC), each spanning 800 repeats: ‘without having the long-read sequence, we would not have known what the second allele was’. Marina explained that this new finding, which had been missed with PCR-based methods, challenged the assumption that the disease was autosomal dominant. This data enabled further study, with repeat-primed PCR, which suggested that three different repeat expansions may be present across the research samples from this family.
Summarising her research, Marina highlighted the potential of whole-genome and targeted nanopore sequencing as ‘a powerful tool to discriminate the SORD and SORD pseudogene, discover novel RFC1 repeat expansions, phase expansion alleles, and define expansion repeat number’. Stressing the current difficulties of characterising pathogenic variants in this field with traditional methods, she described the potential of nanopore sequencing of structural variants and repeat expansions to ‘spearhead the discovery of mutations in non-coding DNA’.
Nanopore sequencing reveals retrotransposon insertions or complex genetic mechanisms in four rare disorders
Belén de la Morena-Barrio
University of Murcia, Spain
Belén de la Morena-Barrio began by putting ‘rare disease’ into context: though defined as those affecting fewer than one in 2,000 people, the 6,000 rare diseases so far described collectively impact 3.5–5.9% of the global population. However, diagnosis of rare diseases with a genetic basis presents multiple challenges, taking an average of 4–5 years. Even after this, 25–35% of cases remain unresolved, and 70% are misdiagnosed.
Belén and her team decided to investigate the potential of nanopore sequencing to characterise the genetics underpinning four rare disorders which, despite their analysis via multiple technologies, remained of unknown molecular basis. In this proof-of-concept study, Belén and her colleagues selected clinical research samples from large cohorts across four rare diseases, representing cases for which Sanger sequencing, whole-exome short-read sequencing, MLPA, and SNP arrays had proven ineffectual in resolving their genetic basis.
Belén first shared the team’s study of clinical research samples from subjects with antithrombin deficiency, a disease which increases the risk of thrombosis. For these, they performed whole-genome nanopore sequencing on PromethION, generating ~17x depth of coverage and an average read length of 10 kb, with reads spanning up to 2.5 Mb. Data analysis was performed using a multi-modal pipeline including alignment with minimap22, SV calling with sniffles3, and identification of candidate variants involving the gene SERPINC1.
For three of the 12 samples sequenced, the long nanopore reads revealed — to the team’s surprise — a 2.4 kb retrotransposon insertion in SERPINC1. This SINE/variable number of tandem repeats/Alu (SVA) retrotransposon represented ‘the first insertion of an SVA element in a haemostatic disorder’. They then performed de novo assembly, enabling thorough characterisation of the variant — and identifying that it was a new type of retrotransposon. PCR analysis of this research sample and those from relatives validated the finding. Belén and her team were then able to identify a founder effect for this retrotransposon.
Next, Belén shared the results of sequencing two Glanzmann thrombastenia research samples — a disease which increases the risk of abnormal bleeding. For one sample, no mutation had previously been found; for the other, one pathogenic SNV had been found in ITGB3, one of the two genes responsible for the disease. As the disease is recessive, a genetic basis for the phenotype had not been found for either. The team therefore decided to utilise adaptive sampling to enrich the two genes of interest from these research samples and sequence them with long nanopore reads. Reads were aligned with minimap2 and SVs analysed with sniffles24 via an in-house pipeline.
The long, PCR-free nanopore reads revealed the presence of a previously hidden complex SV in both Glanzmann thromastenia research samples. In sample one, in which no mutation had previously been identified, this variant was present in both alleles. In the second, for which only an SNV had been previously identified using standard methods, the SV was present on the alternative allele to the SNV. Belén described how ‘another strength’ of nanopore technology was that it was able to both identify and phase these two variants. All breakpoints were again validated by PCR and a founder effect was identified. This demonstrated that, where only a single SNV could be identified with the currently used workflow, nanopore sequencing was able to characterise biallelic mutations in both instances, showing potential to clearly resolve the variants behind this recessive disease which would otherwise be missed.
Belén then moved on to the sequencing of a glycogen storage disease clinical research sample. Here, a SNV had previously been identified in the gene GYS2, but as the disorder is recessive, this again had not been sufficient to resolve the genetic basis of the disease. With adaptive sampling on MinION, enrichment of this gene revealed the insertion of a 1.5 kb LINE element in the alternative allele, further showing its potential utility to identify causative variants which are intractible to traditional technologies.
Finally, Belén shared the results of sequencing a research sample from a subject with peroxisomal disease. Once again, only one variant had been identified via the standard workflow — a duplication in PEX2, which was not sufficient to identify the cause of this recessive disease — and once again, nanopore sequencing revealed an SVA insertion, 2.6 kb in length, in the alternative allele.
In total, Belén highlighted, nanopore sequencing was able to identify pathogenic variants in eight of 17 clinical research samples for which conventional analysis could not find the genetic basis of the rare disease. Interestingly, six of these — across three different disorders — featured retrotransposon insertions that had been ‘hidden’ when using conventional methods.
The potential clinical utility of amplicon and targeted nanopore sequencing for rare disease diagnosis
University College London Institute of Ophthalmology, Moorfields Eye Hospital & North Thames Regional Genomics Laboratory Hub, UK
Gavin Arno kicked off his presentation by highlighting that ‘we are in the midst of a genomics revolution’. He emphasised the significance of next-generation sequencing in furthering understanding of the genetics of rare diseases, but noted that their diagnosis via the current use of short-read whole-genome sequencing (WGS) is only ~25%. In Gavin’s field, ophthalmology, this reached 55% in the 100,000 Genomes Project Pilot5, thanks to the good understanding of the genetic basis of rare eye diseases.
Inherited retinal dystrophy (IRD) is a leading cause of blindness globally; however, it presents with ‘a very broad range of phenotypic, genetic, and allelic heterogeneity’, making molecular characterisation challenging. Gavin explained that traditional short-read sequencing does not capture the genomic context of regions of interest, whilst the presence of repetitive sequences and pseudogenes results in mismapping and some regions of the genome remain intractable to the technology. When it is possible to identify mutations, they may represent variants of uncertain significance (VUS), as determining whether they are pathogenic or benign can prove difficult. Intronic, missense, or synonymous variants may impact splicing, but transcriptomic analysis of genes expressed only in the retina is impracticable — ‘no-one wants to give a retinal sample in clinic’. Finally, phasing of distant variants in recessive disease is limited with short reads; this a particular problem for the gene ABCA4, accounting for ~30% of all IRD cases.
Gavin described the structure of the opsin array, present on the X chromosome, which governs colour vision. Rearrangements and mutations in this array can result in red/green colour blindness, cone dysfunction syndrome, or cone dystrophy. The array consists of up to 10 opsin genes which, due to read mismapping, are a ‘complete black hole’ when sequenced with short-read technology, preventing analysis of this clinically significant region. He and his team therefore decided to investigate the potential of nanopore sequencing to overcome these difficulties.
Gavin introduced Cas9 Assisted Targeting of Chromosome segments (CATCH): a targeted nanopore sequencing method in which the region of interest — such as the opsin array — is first excised using Cas9, preserved as long fragments, and sequenced to produce ultra-long reads. Testing this method on two clinical research samples, they found that — in contrast to the results of short-read WGS — they obtained ‘full-length reads across the entire array’. Gaining even coverage across the opsin array with up to 120x depth in PCR-free reads, they were able to count the genes present and phase complex haplotypes in samples from female subjects, thoroughly characterising this previously inaccessible region.
Next, Gavin and his colleagues tested the use of this method to analyse the 130 kb ABCA4 gene in a clinical research sample. Again, they obtained full-length reads across the entire gene, enabling characterisation of >1,000 different mutations and phasing of variants. Gavin shared data from a research sample in which they identified and phased four rare variants in the gene, demonstrating the capacity to identify variants which were in trans without the need for trio sequencing.
Gavin then returned to variants of uncertain significance. He shared examples of VUS/intronic variants, characterised with nanopore sequencing, that they suspected would impact splicing, noting that a non-coding variant was found in >10% of research samples from unsolved cases. He explained that the traditional options for characterising their functional effects had several limitations. AI-based splice alteration prediction is not definitive evidence of functional effect, but assays can be complex and expensive.
The team tested the potential of nanopore sequencing to address these limitations, employing RT-PCR of genes with very low-level expression to identify possible functional effects. Gavin shared an example: the gene PDSS1, in which they identified that a mutation resulted in the production of a non-coding isoform. With long nanopore reads, they were able to phase this shift in isoform usage to the allele with the mutation, uncovering its functional impact. They then applied the same method to study a mutation in the gene TULP1 and discovered that it was possible to detect retina-specific splice alterations. Though both TULP1 and this mutation were discovered in 1998, only now could Gavin and his team identify its impact on splice alteration: ‘we were able to finally characterise the effect of that mutation using nanopore sequencing’.
1. Stevanovski, I. et al. Comprehensive genetic diagnosis of tandem repeat expansion disorders with programmable targeted nanopore sequencing. Sci. Adv. DOI: doi.org/10.1126/sciadv.abm5386 (2022)
2. Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. DOI: doi.org/10.1093/bioinformatics/bty191 (2018)
3. Sedlazeck, F.J. et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods. DOI: doi.org/10.1038/s41592-018-0001-7 (2018)
4. Smolka, M. et al. Comprehensive Structural Variant Detection: From Mosaic to Population-Level. bioRxiv. DOI: doi.org/10.1101/2022.04.04.487055 (2022)
5. 100,000 Genomes Project Pilot Investigators et al. 100,000 Genomes Pilot on Rare-Disease Diagnosis in Health Care - Preliminary Report. N. Engl. J. Med. DOI: doi.org/10.1056/NEJMoa2035790 (2021).