Targeted nanopore sequencing with Cas9 for studies of methylation, structural variants and mutations

Timothy Gilpatrick (Johns Hopkins University) began his talk by noting the benefits of CRISPR/Cas9 DNA sequence enrichment. By enriching for and sequencing regions of interest from the "soup of DNA molecules", higher depth of coverage can be obtained for targets, enabling easier generation of consensus sequences. Whilst most methods of sequence capture require an amplification step, CRISPR/Cas9 captures native DNA, which can be sequenced without PCR, allowing for subsequent methylation analysis. Furthermore, read length is not limited by a PCR step, and large regions of interest can be enriched, enabling examination of structural variation.

Timothy presented the workflow for CRISPR/Cas9 target enrichment. Firstly, any free ends in the DNA sample are dephosphorylated. The region of interest (ROI) is excised using CRISPR/Cas9 guideRNA probe complexes which target flanking regions on either side and introduce double-stranded cuts. This exposes new, phosphorylated ends at each end of the ROI. The enriched, phosphorylated sample is then ligated to sequencing adapters and sequenced. Timothy showed aligned reads for the CRISPR/Cas9-enriched GSTP1 gene from a MCF-10A breast cancer cell line DNA sample.

Timothy described how 10 clinically-relevant regions of interest were selected for enrichment with CRISPR/Cas9 from the GM12878 cell line gDNA, choosing this well-studied genome so that any SNPs and structural variants (SVs) could be confirmed. The regions were enriched in multiplex with CRISPR/Cas9 from 3 μg gDNA, then sequenced on a MinION flow cell: coverage of the regions of interest ranged from 20x for a deletion on a 19 kb region of chromosome 5 to 175x for a 24 kb region of SLC12A4. Timothy also sequenced this panel using the Flongle, and noted that they could "still generate a substantial amount of coverage" on the smaller Flongle flow cell.

CRISPR/Cas9 was then used to enrich and sequence DNA from three breast cell lines: MCF-10-A (non-tumorigenic), MDA-MB-231 (triple negative breast cancer) and MCF-7 (ER+); here, variability was seen in the on-target fraction, ranging from ~2-7%. Timothy described how the off-target sequence data was not a result of off-target cutting of CRISPR/Cas9, but from breakage during the ligation step which exposed off-target phosphorylated ends. Timothy showed data for an enriched region of chromosome 7, which identified an ~8 kbp deletion that was heterozygous in MDA-MB-231 and homozygous in MCF-7; sniffles structural variant calling confirmed this. He then showed data for "even larger" deletions, reaching over 50 kbp; in GM12878, for which parental DNA is also available, these could also be phased. Quoting a target of 155 kbp or above to span a deletion in chromosome 8, Timothy noted that he planned to use the Circulomics Short Read Eliminator kit to further enrich for very long reads.

Timothy then moved on to the identification of SNVs in the enriched data; he observed that mutation calling was "hugely improved" with the update to the Guppy v3.0.3 basecaller, though low-complexity homopolymeric regions still showed errors. Searching for the 176 known SNVs in a 140 kbp span of DNA from GM12878, Timothy and his team analysed sequencing data from the MinION, basecalled with Flip-Flop and aligned with samtools: almost all SNVs were identified, with a sensitivity of 0.97, and nanopolished data at a sensitivity of 0.96, with false positive incidences of 12 and 17, respectively. In the lower-coverage Flongle dataset for the same sample, sensitivity was at 0.78, but nanopolished data at 0.91; there were 10 false positives with samtools and 20 in the nanopolished data.

In an investigation into the false positive calls, the team identified that "real" SNVs were supported by data from both strands, with false positives frequently due to errors on one of the two strands; using this difference in error profile in the forward and reverse strands, they then implemented a filter which required reads to be supported by variant calls by both strands. This "dual-strand filter" process reduced sensitivity but increased specificity, reaching 0.98 (samtools) and 0.99 (nanopolish) for the MinION and 1.00 (samtools) and 0.99 (nanopolish) on the Flongle. Timothy noted that he was also looking forward to sequencing using the new R10 pore.

Timothy went on to describe how nanopore sequencing of native DNA molecules enabled the evaluation of DNA methylation, important in the regulation of gene transcription. Methylation patterns can be informative of disease states, and predictive of outcomes. Modifications including methylation can be preserved and detected in nanopore native DNA sequencing. Timothy and his team compared the methylation patterns detected in native GM12878 DNA sequenced on the MinION and Flongle with methylation analysis from a short-read sequencing technology. Showing data for methylation in GSTP1, for which high methylation has been associated with bad outcomes in breast cancer, he described how a "very similar" methylation pattern was identified across 18 kbp by each the method. Timothy noted that the low-coverage data generated on the Flongle was still sufficient to reveal the same pattern. In the three breast cell lines, methylation analysis showed greater methylation in MCF-7, which correlated with lower GSTP1 expression. Analysis of KRT19 methylation showed hypomethylation in the cancer cell lines and hypermethylation in the non-tumorigenic cell line; expression of this gene has been found to correlate with poor prognosis in breast cancer.

Next, Timothy and his team plan to capture larger regions with the CRISPR/Cas9 method, aiming for regions >1 Mb using the "tiling" of probes across large regions for improved coverage. They also plan to improve clean-up of longer fragments and implement sample barcoding.

Authors: Timothy Gilpatrick