Understanding genetic variation in cancer, using targeted nanopore sequencing


Shruti Iyer (Cold Spring Harbor Laboratory & Stony Brook University) began her plenary talk by discussing how "cancer is a disease of the genome", in which the accumulation of genetic and epigenetic alterations results in a loss of control over normal cell growth. These genetic alterations, she described, can vary from single point mutations up to larger structural variants, which affect one or multiple genes: the field of cancer genomics aims to identify and characterise these variations.

Shruti noted how several thousand tumours have been sequenced via next-generation sequencing, enabling the discovery of different signatures and mutation rates across different cancer types, plus insights into the clonal structure and evolution of tumours. Malignant cells can comprise as little as 10% of a sample; furthermore, heterogeneity is "very much a part of cancer", with subpopulations within this exhibiting different alleles or genomic features. The combination of this intricacy and lack of depth, Shruti explained, means that the ability to detect these variants via whole genome sequencing to the typical depth of coverage of 30x is very low. Shruti described how targeted methods, such as exome capture, have helped this field to move forward by enriching for regions of interest and improve their coverage; this has enabled the detection of many small variants associated with cancers, but there has been a "blind spot" when it comes to detection of structural variants (SVs).

SVs, Shruti explained, are defined as variants spanning over 50 bp; they encompass insertions, deletions, duplications and translocations. Due to their large size, these variants tend to be disruptive. SVs contribute to copy number changes, which can amplify or delete oncogenes and tumour suppressor genes. SVs can also lead to gene fusions, which can modify the sequence and function of the protein produced; for example, by fusing a highly expressed transcript to one with lower expression levels. SVs can therefore act as prognostic indicators: greater genome instability is generally associated with poorer patient outcome. However, Shruti described how, despite the significance of SVs, relatively little about all but large copy number variations is known, and that "this is largely because of the way these variants are studied".

Some methods of analysing SVs, such as cytogenetics and microarrays - can provide a "bird's eye view" of SVs, but lack resolution. High-resolution methods, on the other hand, generally involve short-read sequencing; short reads cannot span SVs, resulting in misalignments and low sensitivity - "up to 80% false positive rates". Shruti quoted that ~700 genes have been identified as "inaccessible to sequencing with short reads", with ~200 of these being medically relevant. An individual human genome, aligned to the human reference genome, has ~20,000 SVs; Shruti highlighted how we are "really missing a lot of things by not looking for them in the right way. How can these important variants be resolved? "Spoiler alert: long reads can help!" Shruti described how SVs can be detected using long reads with a sensitivity and specificity of over 95%.

Shruti described how analysis of the Her2-amplifed breast cell line SK-BR-3 has helped identify several thousand variants with short and long reads. The cell line was sequenced to high coverage using both a short read technology and long nanopore reads; Shruti described how the long reads "helped identify tens of thousands of additional variants in the cancer". She and her team are now focusing their efforts on targeted, long-read sequencing to achieve the depth needed to identify rare variants.

Shruti noted how the use of targeting strategies enable higher-throughput sequencing of the targets of interest, improving their depth of coverage, allowing for the detection of rare alleles. As targeting avoids having to sequence a whole genome to generate sufficient coverage of the regions of interest, this approach is also more cost-effective. However, Shruti described how, until ~1.5-2 years ago, there wasn't really an effective method of long-read target enrichment: methods designed for short-read technologies tend to involve either PCR or target capture, leading to inherent bias and short fragments, meaning that long nanopore sequencing could not be used to its full potential.

Shruti then introduced the CRISPR/Cas9 method of PCR-free, long-read target enrichment. The method begins with dephosphorylation of all the DNA in the sample. Cas9 ribonucleoproteins, or RNPs, with the crRNAs (specific to the ends of the targets of interest)and tracrRNAs (which guide the Cas9 enzyme to this site), are then added. The Cas9 is then guided to the sites flanking the target loci, where it induces double-stranded cuts, excising the region. Sequencing adapters can then be ligated to these exposed phosphorylated ends, enabling sequencing of the target regions. Shruti pointed out that all the DNA - both on target and off target - remains present in the enriched sample through sequencing. The process is entirely PCR-free, and can be used to enrich for very large regions. For their first test, Shruti and her team decided to use this method to enrich the BRCA1 gene in the SK-BR-3 cell line, as the cell line had previously been studied in their lab and this would further help validate their findings. At the time, Shruti explained, groups using CRISPR/Cas9-mediated enrichment hadn't gone beyond targeting and sequencing regions of 5-10 kb; Shruti decided to see how far she could push this - "I started with 200 kb".

Shruti then displayed the result: she successfully managed to capture and sequence the BRCA1 gene end-to-end - in one single, ultra-long nanopore read of 198 kb. As far as she is aware, this stands as the record for the longest read generated with CRISPR/Cas9 enrichment - however, Shruti noted that for this project, she was looking for more than a few very long reads, and described how this began the "chasing BRCA" phase of her dissertation.

Seeing poor enrichment of her target region, she noted how the prevalence of SVs in cancer genomes could mean that the loci she targeted with RNA probes could be affected, meaning that they could not be captured. Shruti then displayed enrichment data for the cell line MCF 10A: whilst this produced more on-target reads, they tended not to span the full length of the region. Next, to improve depth of coverage, she tested the preparation and pooling of multiple libraries - whilst this yielded good results, the reads were still not reaching across the full locus.

Shruti then asked: could the background DNA be competing and inhibiting the sequencing of the ultra-long fragments? To tackle this, she used the Circulomics Short Read Eliminator (SRE) Kit on the CRISPR/Cas9-enriched libraries, prior to preparation for sequencing. In an enriched sample of MCF 10A prepared using this method, one 142 kb read was observed, further bridging the target. However, the team's next step was to enrich multiple targets, some of which were below the length cut-off of the SRE Kit, meaning that they would be removed by the process. To enable the preservation of this enriched DNA whilst effectively removing the background DNA, Shruti and her team developed ACME: Affinity-based Cas9-Mediated Enrichment. This method makes use of the histidine tag present on the Cas9 enzyme, used to purify the protein in its production, enabling the capture of Cas-bound regions on His Dynabeads and pulldown via magents. Shruti displayed how libraries prepared using ACME performed better in terms of depth of coverage, but reads did still not span full, very large targets.

The team then designed a cancer gene panel, targeting multiple genes of different size ranges, to test the upper length limit of the enrichment method; the genes selected were those where SVs had been found in whole genome data. Shruti described how, with more targets, more DNA was pulled out using the ACME process. Shruti displayed alignments  showing end-to-end coverage of the 90 kb region targeting the BRCA2 gene in both cell lines, with depth of coverage much improved by ACME. For SK-BR-3, 99-fold enrichment, to a depth of coverage of 100x, was achieved with ACME. Analysis of the different target lengths - spanning tens of kilobases and higher - determined that good end-to-end enrichment was seen with the method up to ~100 kb. Noting that there isn't always enough sample to enable multiple preps and pooling prior to sequencing, Shruti tested ACME using single-prep libraries; this denonstrated improved depth of coverage over the non-ACME libraries. In one example, Shruti displayed aligned data for the enriched TERT gene, a target of ~45-50 kb: though this has not been run through SV callers yet, a structural event appears visible even when viewing in IGV.

Summarising, Shruti described how ACME helps to increase target coverage by ~2-fold, helps to increase the target length to close to 100 kb. Furthermore, she noted how those using the "tiling" method, in which probes are tiled across very large regions to improve their coverage, could use ACME to reduce the number of probes needed and tile even larger loci.

In future, she and her team plan to apply this method to the second version of their cancer panel, encompassing BRCA1, ERBB2, APAF1 and other COSMIC genes with evidence of SVs in the cell line SK-BR-3. They also intend to compare the performance of SV detection between targeted and whole genome strategies. Along with testing other targeted approaches, they will also design a panel of genes from existing diagnostic panels to test on both organoids and tumour samples. Lastly, they would like to explore the use of native barcodes for PCR-free multiplexed sequencing of their enriched libraries.

Authors: Shruti Iyer