Targeted long-read sequencing clarifies complex genetic results and identifies missing variants
- Home
- Resource Centre
- Targeted long-read sequencing clarifies complex genetic results and identifies missing variants
In the first plenary of NCM 2020, Danny Miller (Seattle Children’s Hospital & The University of Washington, USA) began by describing how he believes long-read sequencing has the future potential to increase the rate of genetic diagnoses, and shorten the time it takes to reach them. He highlighted the importance of these two features. Identifying the genetic basis of a disease, he explained, connects diseases that would not previously have been possible to connect, and this could in turn allow novel treatments to be offered to patients. He gave an example of a girl who presented with a range of symptoms including progressive weakness and weight loss, for whom exome sequencing had revealed a de novo variant of uncertain significance in the gene SPTLC1. The team suspected that she had early onset ALS, and identified two further ALS patients with variants in this gene. However, the phenotype associated with variants in this gene was also found to be variable, with other individuals displaying, rather than ALS, the sensory neuropathy HSAN1A or the eye disease macular telangiectasia type 2. Danny stressed that connecting these three diseases would not have been possible without molecular information; here, the treatment given to patients displaying the two other phenotypes presented a potential novel treatment for Danny’s patient. Danny noted that identifying the molecular basis of a disease reduces hospitalisations, unnecessary testing and, as a result, also reduces costs. Crucially, it enables families to make decisions about treatments and goals of care.
Current methods of clinical testing for genetic conditions are microarray, exome sequencing, and whole-genome sequencing; however, these steps can take months or years to complete, and after this full workup, 50% of children remain undiagnosed. Danny asked: would it be possible to replace these with a single long-read sequencing test, to both increase the rate of diagnosis and time to diagnosis? Danny and his team decided to investigate the potential of targeted nanopore sequencing for the analysis of copy number changes and missing variants. Exploring the targeted sequencing options available, they found that neither PCR nor Cas9 targeted sequencing would allow them to sequence the size of regions required, which can reach megabases; they were also impractical as most of their targets are unique to each sample. They were also concerned by the inconsistent success of amplification, and did not want to lose epigenetic information. Luckily, their project was perfectly timed with the release of a preprint by Payne et al. (https://doi.org/10.1101/2020.02.03.926956) describing Read Fish (Read Until): a computational method of selecting specific regions of the genome for nanopore sequencing. They tried the method on 7 Mb of the genome, and generated 20-40x coverage of the targets from a single MinION Flow Cell, representing a ~500% increase over no enrichment. Danny and his colleagues then used this method to enrich restriction sites used to analyse XYLT1 in a patient and his parents via Southern blotting. The Southern blot had shown a triplet repeat expansion, which causes methylation and silencing of the gene, leading to Baratela-Scott syndrome. The PCR-free targeted sequencing was able to recapitulate the gel’s results, enable counting of the repeat, and simultaneously allow evaluation of methylation. This revealed the son’s reads to be methylated, and only some methylated reads for the mother, indicating mosaicism – ‘this is a level of detail you never could have got using short reads.’
Danny then moved on to investigating the potential for Read Until to resolve complex structural changes identified in clinical testing. He described an example of a newborn with multiple congenital anomalies, in which an array revealed three non-contiguous deletions within a 5 Mb region of chromosome 6. This bisected ARID1B, variants of which are associated with Coffin Siris syndrome; this fitted the case well. However, the array could not discern whether the deletions were on the same or different chromosomes, whether there were any additional rearrangements, or any recessive genes with deleterious variants. Targeting of ~15 Mb around the deletions via Read Until with nanopore sequencing produced ~20x average depth of coverage, spanning the three known deletions and revealing two new deletions. The long reads enabled thorough analysis of how these deletions were related – Danny and his team were surprised by the complexity of the rearrangements. They were then able to reconstruct the segments to determine the new order of the genes, and call variants.
Next, the team investigated whether the method could identify variants missed by traditional approaches. They focused on cases in which a single variant was found in a gene associated with a recessive disease, or no variants were identified for an X-linked or dominant disorder. One case was of a boy with kidney disease and retinal dystrophy; here, exome sequencing revealed a paternally inherited stop in NPHP4. They targeted this gene with Read Until and nanopore sequencing, generating ~20x coverage, called SNPs with Longshot, Clair, and Medaka, and structural variants (SVs) with Sniffles and SVIM. As well as identifying the known variant, they found a repeat in the gene, known to be polymorphic in the human genome. Reads spanning the repeat revealed two haplotypes, both of which fell within the expected normal length range, enabling the team to exclude this second variant; Danny noted that this would not have been possible via short reads. They also identified a candidate ‘second hit’ variant: an intronic variant predicted to affect splicing. The long nanopore reads enabled phasing, revealing that the two variants were present on different haplotypes.
Danny described further examples in which missed second variants were identified. In one, an insertion was found in a patient with suspected Alström syndrome and a known stop variant; PCR and Sanger sequencing later confirmed that this was the pathogenic second hit. In another, a child with biochemically confirmed Lesch-Nyhan, an X-linked disease, nanopore sequencing showed a 17 Mb inversion or insertion, later confirmed via FISH as an inversion. In another case, nanopore sequencing revealed a 1.9 kb deletion which was not identified by array or exome sequencing. For his final example, Danny described a case of a patient with Duchenne muscular dystrophy, without a molecular diagnosis following workup. Targeted nanopore sequencing showed a 300 bp GA-rich repeat expansion within DMD. To investigate how frequently this occurs in the population, the team studied short-read sequencing data for 9,000 short-read whole genomes, and found the expansion in 72, 71 of which were females. They next intend to validate some of these results using long sequencing reads; Danny highlighted that this is an interesting example as it ‘would be a huge violation of Hardy-Weinberg’. He noted how such findings from long-read sequencing will be difficult to validate, as the frequency in the population is unknown, and they are difficult to study in the lab – but that these challenges are exactly why he does what he does.
Danny concluded that ‘targeted long-read sequencing can be used to clarify complex structural changes and identify missing variants’. He noted the variety of aberrations found using nanopore sequencing, and how this highlighted that structural variants were difficult to identify using short reads. On average, they used a single MinION Flow Cell per experiment, with a materials cost of $600/sample. They were also able to generate results quickly – in one case in just 18 hours. He described the potential of nanopore sequencing in areas where few clinical testing options are currently available, including phasing of variants, identifying precise translocation breakpoints, and evaluating deletions and duplications. He suggested that his findings ‘show that long reads could be used as a single data source that replaces most of the testing that we do today’, with the possibility to reduce workups that currently take years to just days in the future.