Targeted, amplification-free DNA sequencing using CRISPR/Cas

概览

This info sheet highlights the application and advantages of the CRISPR/Cas system with nanopore sequencing. The technology provides high-coverage, low-cost, native strand sequencing without amplification.

For Research Use Only

Document version: ECI_S1014_v1_revF_11Dec2018

1. Getting started with Cas9 targeted sequencing

重要

Cas9 protocols

There are two protocols available for Cas9 targeted sequencing:

  1. Cas9 Sequencing Kit (SQK-CS9109): this is the protocol that uses the Oxford Nanopore Technologies Cas9 Sequencing Kit for Cas9 enrichment and library preparation
  2. Cas9 targeted sequencing: this is the protocol that uses the Ligation Sequencing Kit (SQK-LSK109) for library preparation after Cas9 enrichment. It requires more 3rd party reagent than if doing library preparation using the Cas9 Sequencing Kit.

How Cas9 works

The CRISPR/Cas family of proteins comprises DNA- or RNA-cleaving enzymes that can be programmed to cut specific sequences using oligonucleotide-length RNA, making them suitable candidates for genome editing but also targeted sequencing, given their high specificity.

CRISPR RNAs (crRNAs) program Cas9 to bind and cleave DNA at sites that are identical (or highly similar) to the crRNA sequence. Candidate crRNAs need to be determined by the user and can be found by bioinformatic searches of reference genomes (detailed below). When Cas9 is loaded with crRNA identical to the intended target sequence, together with trans-activating CRISPR RNA (tracrRNA - a structural RNA required for catalytic activity) it forms a ribonucleoprotein complex (RNP). This complex of RNA and protein searches the genomic DNA for the target region using the crRNA sequence. The complex attempts to seed the crRNA by melting duplex DNA, and, if the crRNA matches and base-pairs with the target sequence, Cas9 cuts both strands of the target sequence.

Cas9 requires a 20mer target site (the protospacer), plus the protospacer-adjacent motif (PAM - sequence NGG). The PAM is immediately next to the 3’ end of the target and defines the boundaries of the protospacer sequence. Cas9 cuts both strands of the target sequence 3 bp upstream of the PAM. The production of a newly-exposed end, what we call ‘deprotection’, at a defined site is the basis for the selectivity of the Cas-mediated enrichment method.

For Cas9 targeted sequencing, we are able to selectively protect and deprotect ends using CRISPR/Cas and enrich for regions of interest in a background prior to long read nanopore sequencing of the native DNA molecule.

How to design a targeted sequencing approach using Cas9

There are several ways to design a Cas9 targeting experiment, and the method depends on the target region of interest. Multiple factors determine which method should be used:

  • Size of region of interest (ROI)
  • Length of input DNA
  • Are the sequences either side of the region of interest known?
  • How much coverage of region of interest is required?

There are three main options, detailed below.

Excision approach

crRNA probes are design to target regions either side of the region of interest to excise that region (ideal for particular gene targets).

This method should be used:

  • When the region of interest is <20 kb (largest target region if multiple regions are being targeted)
  • When both ends of the region of interest are known, to be able to design probes on either side

01 Figure 1. Excising a region of interest using four cuts (two each side of the ROI). Cutting once on each side results in incomplete cleavage, as shown by the high coverage depth of the region behind the ROI after the first cut (bottom panel).

We generally recommend excising a ROI by making four cuts, two upstream of the ROI targeting the (+) strand, and two downstream targeting the (-) strand. Four cuts instead of two are for redundancy, in case one or more crRNAs yield incomplete cleavage. This method should provide the highest coverage of a target region, as on-target strands will have sequencing adapters ligated to both ends. Regions larger than 20 kb can be targeted using this approach, but this is limited by the length of the input DNA and users may experience drops in coverage towards the middle of the region of interest.

Single cut and read out

A crRNA probe is designed at one end of a target region to read into the unknown (ideal for characterisation of integration sites within a genome).

This method should be used:

  • When the region of interest is <20 kb (largest target region if multiple regions are being targeted)
  • When only one side of the region of interest is known

02 Figure 2. Excising a region of interest using a single cut, when only one side of the ROI is known. The coverage depth is high both for the ROI and the region downstream (bottom panel).

This method will give lower throughput due to only ligating sequencing adapters to one end of the strand. Regions larger than 20 kb can be targeted using this approach, but this is limited by the length of the input DNA and users may experience drops in coverage towards the end of the region of interest.

Comparison of the single cut and read-out vs excision approach

Single double cuts data3

Figure 3. Example coverage plots generated using the Oxford Nanopore bioinformatics tutorial “Evaluation of read-mapping characteristics from a Cas-mediated PCR-free enrichment” for the HTT gene with alternative probe design methods. An enrichment experiment run on a single MinION R9.4.1 flow cell targeting the HTT gene with a single probe (panel 1), an excision approach using one probe on either side of the region of interest (panel 2) and the recommended excision approach using 4X probes for redundancy (panel 3). Red lines on the coverage plot highlight the range in the input BED file used in the bioinformatics tutorial. This example ranges from outer probe to outer probe for the 4X excision method for all plots. The HTT gene falls between these red lines, so the average coverage of the HTT gene is the highest coverage observed on these graphs. Therefore the average coverage of the HTT gene with a single probe is around 100X, 1600X for an excision approach with 2X probes and 2100X coverage with a 4X probe excision.

Tiling

crRNA probes are designed along a region of interest in 5-10 kb overlapping chunks that allows for even coverage across the region of interest.

This method should be used:

  • When the region of interest is >20 kb (especially when limited by the length of the input DNA)

03 Figure 4. Probe design for a “bricklayer” tiling approach. Two sets of probes are designed covering a large region of interest (>20 kb). This breaks up the ROI into overlapping sections, ensuring good coverage of the whole length of the ROI.

This method involves designing two pools of probes which should be prepared as two separate experiments and pooled in the final step (before the final clean up) and loaded together onto the same flow cell. There are two options for designing the probes. For most use cases, we recommend a “bricklayer” approach, where each pool contains (+) and (–) probes that are designed to overlap each other. The second “highway” option is to have one pool of probes cutting in one direction (+) and one pool of probes cutting in another (–) (probe directionality is explained below). The “bricklayer” approach has shown to give a higher coverage of the target region.

Decision tree for selecting the Cas9 targeting approach

17 Figure 5. Decision tree for designing the Cas9 targeting approach based on the size of the target region.

Enrichment experiment steps and associated instructions

Step Instructions
1. Extract and prepare DNA Extraction methods
2. Design probes Targeted, amplification-free DNA sequencing using CRISPR/Cas (Probe design)
3. QC input DNA Input DNA/RNA QC
4. Perform enrichment, and prepare sequencing library Cas9 Sequencing Kit (SQK-CS9109) - this is the protocol using the Oxford Nanopore Technologies Cas9 Sequencing Kit for Cas9 enrichment and library preparation
Cas9 targeted sequencing - this is the protocol that uses the Ligation Sequencing Kit (SQK-LSK109) for library preparation after Cas9 enrichment. It requires more 3rd party reagent than if doing library preparation using the Cas9 Sequencing Kit
5. Sequence on device Cas9 Sequencing Kit (SQK-CS9109)
Cas9 targeted sequencing
6. Take basecalled FASTQ files into analysis pipeline Cas9 Targeted Sequencing Tutorial (EPI2ME Labs)
7. Assess success of experiment and feed back into probe design and quality of input

2. Probe design

crRNA probe design guidelines

Independent of which Cas9 cut method is used, the probe design process is the same and the same features that make a good probe apply to all. The diagram below shows what the RNP complex looks like bound to the target sequence (in black and PAM in purple, Figure 6 (1) and (2)). Upon RNP binding, the duplex is locally melted, and the crRNA hybridises to the non-target DNA (bottom strand in Figure 6 (2) which is complementary to the target and crRNA sequence). This means that the crRNA sequence is same sequence as the ROI (Figure 6 (2), shown in red). Therefore, when designing a crRNA probe, the sequence should be the same as the region of interest.

05 Figure 6. (1) the RNP complex bound to the region of interest. (2) The RNP melts the DNA duplex upon hybridisation with the strand complementary to the target region. The crRNA sequence is identical to the target sequence upstream of the PAM.

The orientation of the probes is critical to designing a Cas9 cleavage experiment, the orientation being defined by the PAM and target sequence. Cas9 cuts at the PAM-proximal end of the protospacer 3 bp upstream of the PAM.

Figure 7 (3) shows details of the Cas9 cleavage reaction. The protection of the PAM-distal end by Cas9, which remains bound, provides the directionality for the strand but this process is imperfect. Sometimes Cas9 can release the DNA, but our internal data suggests that the reads towards the PAM site outnumber the reads away by a factor of at least 3:1, up to at least 10:1 (shown by the grey arrows in Figure 7 (3) and the small arrows in Figure 8).

06 Figure 7. (3) The Region of Interest is cleaved at the 5’ side, and the PAM-distal site is protected by the bound Cas9. (4) The cleavage and protection happens on both sides of the ROI. (5) The directionality of the read is determined by the protection of the PAM-distal site by Cas9. In most cases, the ROI is enriched for, and read by the nanopore. In a minority of cases, the DNA is read in the other direction, away from the ROI.

12 Figure 8. The directionality of the sequenced read when the Region of Interest is cleaved by the excision approach (twice on each side).

To design a crRNA probe, you initially must search the region upstream, and downstream if doing an excision approach, of the region of interest for a PAM sequence (NGG). We recommend a minimum of 1 kb flanking region between the region of interest and probe target sequence. This is very important when looking at repeat expansions and regions of low complexity to help with scaling of signal during basecalling, and this flanking region can be expanded if the region of interest is very small and could be lost in subsequent clean-ups (<3 kb). All crRNA probe sequences should be checked against the whole target genome to see if there are off-target alignments, which would result in cuts elsewhere in the genome. Assess if there are any known SNPs in the crRNA probe target sequence, as these SNP mismatches will reduce the efficiency of a probe, potentially causing the absence of a cut at the region of interest in non-reference samples.

重要

Except for reasons of experimental necessity, an ideal approach would be:

  • 5-10 kb is the ideal size for excision, but excising a larger region is possible if a drop in coverage in the centre of the ROI is acceptable
  • The excised region should not be smaller than 3 kb, even with a tiled approach
  • Flank repetitive regions of 1 kb either side with non-repetitive regions
  • Two probes are recommended per region of interest desired cut site: for excision approaches, this means four probes per region of interest
  • When tiling, design probes to be cut in two separate pools
  • Check the probe sequences for potential off target matches (e.g. BLAST, BLAT)
  • Check for common SNPs in the probe sequences

Panels of regions of interest

Cut kinetics are acceptable for panels of up to ~100 crRNAs. Single or multiple gene fragments may be targeted in an experiment. To excise a ROI, at least four crRNAs are required, but up to a hundred or more may be used. For simplicity, we advise users to pre-mix crRNAs at equimolar concentration and form all ribonucleoprotein complexes (RNPs) simultaneously in a single tube (unless performing a tiling experiment where each pool should be kept separate). The greater the number of crRNAs in a single cut reaction, the lower the concentration of each probe, and therefore the slower the cut kinetics. The reaction is performed at excess RNPs with respect to target. In-house data at Oxford Nanopore Technologies shows that the kinetic efficiency of the cut reaction is complete within ~15 min and acceptable for panels of up to ~40 crRNAs. Larger panels may require longer cut times.

Designing and ordering probes

We recommend that even users familiar with existing CRISPR design criteria read the section below carefully, because there are rules specific to nanopore target enrichment. Oxford Nanopore Technologies recommends free crRNA probe design tool CHOPCHOP (chopchop.cbu.uib.no) and ordering IDT Alt-R™ Cas9 probes: idtdna.com/crispr-cas9.

CHOPCHOP

CHOPCHOP (chopchop.cbu.uib.no) identifies all possible target sites, based on the availability of PAM sequences, and evaluate each site for the efficiency of cutting and possible off-target effects (based on experimental data). A CHOPCHOP search would normally involve:

Input parameters (user-defined): 08 Figure 9. The CHOPCHOP homepage, with the fields selected for CRISPR/Cas-based nanopore enrichment.

  • A FASTA sequence or specific genomic coordinates. Remember to include 1 kb flanking regions, and a separate search is required for probes upstream and probes downstream of the region of interest (‘Target’ box on CHOPCHOP homepage)
  • The target genome to search for off-target sites (‘In’ box on CHOPCHOP homepage)
  • Which CRISPR/Cas system to use (‘Using’ and ‘For’ box on CHOPCHOP homepage. Select ‘CRISPR/Cas9’ and ‘nanopore enrichment’)
  • Manual parameters, e.g. the PAM, protospacer length, and the algorithms used to score the efficiencies of probes (select ‘Options’ and change Efficiency score to ‘Doench et al. 2014 – only for NGG PAM’)

09 Figure 10. Recommended parameters in CHOPCHOP for probe design.

The search tool, performed within CHOPCHOP:

  • Identifies all possible protospacer candidates based on the locations of PAM sequences
  • Scores and ranks them based on:
    • Predicted efficiency
    • Self-complementarity (how likely the crRNA is to form unwanted secondary structure)
    • Number of off-target sites in the genome bearing 0, 1, 2 or 3 mismatches relative to the candidate protospacer sequence, as judged by an alignment of each candidate protospacer sequence against the entire genome

Filtering the search results, user-performed: to remove candidates with significant secondary structure, off-target effects, or low efficiency.

Using CHOPCHOP to design probes for a tiling approach

CHOPCHOP will only allow for a search within 20 kb at a time. For loci >20 kb, multiple searches should be performed, but the results from multiple searches can be pooled. This is required if using a tiling approach.

CHOPCHOP returns the target sequence, i.e. the protospacer + PAM sequence. For S. pyogenes Cas9, this will be a 20mer sequence + NGG. Cas9 will not cleave the target if the PAM is included in the crRNA sequence.

Example for Cas9: if target = GTTAGTGTCCCCATACAACGGGG, the 20mer crRNA = GTTAGTGTCCCCATACAACG.

Probe ordering: IDT

Oxford Nanopore Technologies recommends ordering IDT Alt-R™ Cas9 probes: idtdna.com/crispr-cas9. Alt-R™ probes contain proprietary modifications that increase the stability of the crRNA and its resistance to nuclease-mediated degradation.

The crRNA requires a short sequence at the 3’ end of the protospacer sequence that hybridises to tracrRNA; however, this sequence is automatically applied to the crRNA sequence at the time of ordering; only the protospacer sequence (no PAM, as DNA provided by CHOPCHOP) is required. As an RNP complex is required to perform a Cas9 experiment, users must ensure that they also order tracrRNA (IDT Alt-R™, Cat # 1072532, 1072533 or 1072534) and S. pyogenes Cas9 (Alt-R® S. pyogenes HiFi Cas9 nuclease V3 IDT Cat #1081060 or 1081061).

Replacing IDT synthetic crRNA and tracRNA is not recommended by Oxford Nanopore Technologies for the following reasons:

  1. A 5' G is needed to make the in vitro transcript, which limits the candidates by 4-fold. Adding one where it does not already exist causes issues.
  2. Transcribing RNA in vitro can yield highly variable results.

If you would still like to try, skip the annealing step and add 500 nM final sgRNA during the RNP formation step.

3. Ordering probes: How to search for candidate crRNA sequences using CHOPCHOP for HTT

Download (or prepare) the relevant portion of target genome plus flanking sequence in FASTA format

For instance, for HTT, we have decided to target a small repeat region spanning Chr4: 3,074,877-3,074,934 in the GRCh38.p11 as the ROI. To be certain of being able to design probes against this region, we downloaded a 10 kb chunk spanning chr4:3,070,000-3,080,000. Advanced users can perform this step by extracting the FASTA from an indexed reference using samtools faidx.

To maximize the probability that suitable probes will be found, we recommend limiting the probes to highly conserved regions wherever possible.

重要

We strongly recommend at least 1 kb of flanking sequence either side of the ROI. This is particularly essential when looking at repeat expansions or regions of low complexity.

Search for crRNA probe sequences using CHOPCHOP.

Note: CHOPCHOP will only allow for a search within 20 kb at a time. For loci >20 kb, multiple searches should be performed, but the results from multiple searches can be pooled. It is possible to excise regions >20 kb using the Cas9 method, however the throughput will be significantly lower than for shorter regions. Generally, for regions >20 kb we recommend a tiling approach.

  • Open CHOPCHOP in a web browser.
  • Make sure the appropriate organism (in this case, H. sapiens) is selected in the central ‘In’ dropdown box. Regardless of the input sequence, the search for off-targets is performed against the selected organism. For example, to search for an E. coli sequence in a human background, where the human DNA is more abundant, H. sapiens should still be selected.
  • Select ‘CRISPR/Cas9’ for Cas9 from the ‘Using’ dropdown box.
  • To search within a FASTA sequence instead of genomic coordinates, click select ‘FASTA target’ and paste the FASTA sequence (<20 kb) in the box; otherwise, input genomic coordinates spanning a region <20 kb. Coordinates should take the format chrC:start-end, where ‘C’ is the chromosome, and ‘start’ and ‘end’ are the start and end coordinates, respectively (not containing comma separators).

15

Figure 12. Initialising a Cas9 search using CHOPCHOP with specific genomic coordinates.

Tip: We do not recommend inputting a gene name as target, because for such searches CHOPCHOP currently only returns the results for exons.

  • Select “nanopore-enrichment” under the ‘For’ dropdown to set the recommended preset. This will set the Efficiency score to be calculated using “Doench et al.2014 - only for NGG PAM”. We do not consider other scoring factors that are relevant only to genome editing in vivo.
  • Click ‘Find Target Sites!’. The search may take a few minutes to complete.
  • Download the results table by choosing “Results table” from the dropdown list next to “Download results”.

16

Figure 13. CHOPCHOP GUI, showing results for the input region in graphical and tabular format. The table can be downloaded for further filtering (see Step 3 below).

重要

Note that the top-ranking probe options may change slightly depending on the version of CHOPCHOP being used.

Filter the results table according to the following criteria:

  • Retain GC between 40 and 80% for Cas9.
  • Retain only crRNAs with a self-complementarity score of zero.
  • Retain efficiency score > 0.3 (Cas9 only). This score predicts the efficiency of cutting based on the sequence rules (if “Doench et al., 2014” is selected above for “Efficiency score”).
  • Retain candidates with the following mismatches: MM0 = 0; MM1 = 0; MM2 = 0; MM3 ≤ 5. Candidates with many predicted mismatches are likely to yield more off-target cuts. The severity of the off-target cut corresponds to the number of mismatches in the predicted sites: single-mismatch (MM1) sites are more likely to be cut by a probe than double (MM2) or triple (MM3) mismatches. Large numbers of mismatches should be avoided, especially with large >100-probe panels. In general, we recommend, where possible, filtering even more stringently for candidates with MM3 ≤ 2.
  • Candidates should then be selected based on the desired cleavage location, the efficiency of the probe (higher is better), and the number of predicted mismatches.

If no probes are available in your target region, you may do the following:

  • Expand the target region to include more flanking sequence;
  • Slightly relax the filtering parameters, until sufficient candidates are found. We recommend relaxing the MM0 parameter to 1.

Split the results table in two, separating the results for the (+) and (-) strand (Column ‘Strand’).

Retain the results for the (+) strand upstream of your target region (i.e., < Chr4: 3,074,877) and the results for the (–) strand downstream (i.e., > ChrX: 3,074,934).

Select two crRNAs for each of the (+) and (-) strands for redundancy.

Ensure that there are no SNPs in the probe and PAM regions, as this will reduce yield.

重要

Excluding the PAM from the crRNA probe

CHOPCHOP returns the target sequence, i.e. the protospacer + PAM sequence. For S. pyogenes Cas9, this will be a 20mer sequence + NGG. Cas9 will NOT cleave the target if the PAM is included in the crRNA sequence. Create an additional column in the Excel spreadsheet that calculates the crRNA sequence from the target, using these two examples, in which the PAM is starred:

Example for Cas9: if target = GTTAGTGTCCCCATACAACGGGG, the 20mer crRNA = GTTAGTGTCCCCATACAACG.

Order the crRNAs as Alt-R™ Cas9 crRNAs from IDT, following IDT’s online ordering instructions. The crRNA sequences should be provided as DNA sequences without PAMs using the instructions above.

The ordering page is presented here.

Note 1: The ordering page will convert the DNA sequence to RNA and will append any other relevant sequence automatically; this is required for the catalytic activity of Cas9.

Note 2: Please remember to order tracrRNA that will contain complementary regions to the crRNA - these are added by default in the IDT portal.

Examples of probes for HTT

Probe name HTT_2561 HTT_2662 HTT_7412 HTT_9569
gene HTT HTT HTT HTT
sense + + - -
xsome 4 4 4 4
allele maternal maternal maternal maternal
location_NA12878 3072436 3072537 3077287 3079444
mm_in_seed FALSE FALSE FALSE FALSE
GC 45 50 45 45
self_comp 0 0 0 0
MM0 1 1 1 1
MM1 0 0 0 0
MM2 0 0 0 0
MM3 1 2 2 1
efficiency 0.66 0.6 0.64 0.51
PAM AGG AGG AGG AGG

The crRNA sequences for each of these probes are:

  • HTT_2561 = TTTGCCCATTGGTTAGAAGC
  • HTT_2662 = TCTTATGAGTCTGCCCACTG
  • HTT_7412 = GGACAAAGTTAGGTACTCAG
  • HTT_9569 = CTAGACTCTTAACTCGCTTG

These can be ordered from IDT and used in the initial step of the sample prep protocol as a control experiment or as an in-run control with other targets.

4. Cas9 Targeted Sequencing analysis

Introduction to the Cas9 data analysis tutorial

A tutorial delivering tools for reviewing the performance of a Cas9 targeted sequencing experiment is provided by Oxford Nanopore Technologies. The tutorial guides the assignment of sequences that can be defined as on-target, off-target and background reads and presents both tabular data and graphical plots that can be used to assess the performance of an enrichment study.

This analysis tutorial can be run with two different methods, but both will generate similar statistics, graphs and outputs. The recommended method is using the EPI2ME Labs platform.

Input files and set-up

The tutorial requires a FASTQ format sequence file generated by MinKNOW and a BED file for target coordinates as input, along with pointers to download locations for the reference genome to be used. The EPI2ME Labs tutorial has an embedded example dataset to demonstrate the type of files needed and an example of the output.

How to interpret the results

The tutorial aids with the quantification of the non-target depletion and provides information on mapping characteristics that highlight the protocol's performance. The executive summary generated at the end of the EPI2ME Labs tutorial provides the key metrics used to establish whether an experiment has been successful.

Single HTT gene target in human: Single HTT

10 gene targets (5-15kb) in human: 10 targets

Coverage

The most important value to focus on is the “Target Coverage”, which will be an average value if multiple targets are being assessed in the same experiment. For a well-designed Cas9 targeted experiment, we would expect >200x coverage for a single target. The coverage of the region of interest is mostly dependent on probe design and input DNA quality. To assess coverage in more detail, the coverage plots show overlapping directional coverage for the region of interest. For multiple targets in the same experiment, when the headline value provides an average coverage, these plots are useful to look at individual coverage of the regions. If the coverage is lower than 200x for a region of interest, the coverage plot can show where a crRNA probe may not be performing well. Firstly, ensure that additional probes have been added to either side of the region of interest for redundancy as explained in the Probe Design section of this document, as this could significantly boost coverage of the region of interest if a probe is not performing well. For probes to be working equally well in both directions (+/-), a similar coverage plot will be observed for both directions as show in the HTT below. In the example of the SCA17 gene below, the + probe has performed better as it has generated a higher coverage than the probe in the – direction. In this example, it would be recommended to review the probes in the – direction to boost overall coverage.

Coverage

Off-target/background

The other figures on the EPI2ME Labs tutorial relate to background reads and off-target regions. Background reads, off-target effects and explanation for the % of reads on-target are described elsewhere in this document, and all of these statistics can be found in the executive summary on the EPI2ME Labs tutorial. If the total “Throughput” of the run is high (similar to that observed with a non-targeted sequencing run), it is likely that the dephosphorylation step has not worked in the preparation, and new reagents are recommended. Please note that the % of reads on-target is dependent on the size of the region of interest compared to the size of the genome. As well as the headline stats, the EPI2ME Labs tutorial provides an insight into possible off-target regions. These regions are described as areas of the genome that are outside of the region of interest but have a higher coverage than the background coverage of the genome. These regions arise from crRNA probes causing Cas9 to cut at other sequences very similar to the sequences being cut around the region of interest. To reduce the number of off-target sites, and in turn boost the coverage of the region of interest by reducing the activity of Cas9 in other regions and focusing more on the region of interest, we recommend reviewing the the crRNA probe design.

Further analysis

Output from the EPI2ME Labs tutorial can be assessed in IGV for a more detailed visualisation of the region of interest. The output can also be further processed by other bioinformatic tools to assess SNPs, SV and repeat counting. Refer to Bioinformatics Resource page in the Nanopore Community for more information.

5. Expectations and guidance

The Cas9 targeted sequencing protocol depletes off-target DNA, therefore enriching for the region of interest.

As the target region is often a small part of the genome of interest, the overall throughput will be lower than a standard Ligation Sequencing Kit (SQK-LSK109) run, but more time is spent sequencing target DNA. Cas9 targeted sequencing experiments will therefore boost the coverage of the region of interest several hundred-fold, and users will see a reduction in coverage of the rest of the genome compared to a whole genome sequencing experiment.

14 Figure 14. Relative coverage of the whole genome and the Region of Interest with and without Cas9 targeting.

The main metrics used to show the efficiency of a Cas9 targeted experiment:

  • Overall throughput
  • % of reads on target
  • Coverage of region of interest
  • Depletion of the non-target DNA

These key metrics can easy be determined using the Oxford Nanopore Technologies Cas9 enrichment-specific data analysis tutorial, which will also generate coverage plots for the region(s) of interest and provide information about specific off-target cuts (details below).

For a well-designed panel looking at 10 target regions ranging from 5-15 kb in the human genome, for example, we would expect the following metrics (MinION R9.4.1 flow cell):

  • Overall throughput = >1 Gb
  • % of reads on target = 5-10%
  • Average coverage of regions of interest = >100x
  • Depletion of non-target DNA = ~3000-fold depletion

For larger or smaller genomes and regions of interest, coverage will decrease or increase respectively. Ploidy will also impact coverage, e.g. a haploid cell will have fewer copies per cell than a diploid or triploid cell. Users should calculate the levels of enrichment expected based on their target size, copy number, genome size, etc.

The % of reads that are on-target in a run (and the target coverage) is governed by the set-up of the experiment. Targeting multiple regions in a single sample will give a similar coverage for each target region (compared to sequencing just a single target region) but increase in % of reads on target. This is because we are increasing the proportion of the genome being targeted and therefore increasing the % of input that corresponds to the target. If a user wants to look at a single gene target but in multiple samples, which might be barcoded, then the overall yield of the experiment will be higher but the proportion of reads on-target will be like a single gene, single sample experiment. Coverage per target will also be lower for the multiple sample options.

Theoretical relationships between target size and levels of enrichment


(A) No enrichment (B) Single gene target, single sample (C) Ten gene targets, single sample (D) Single gene target, five pooled samples
Input per sample 1 µg 5 µg 5 µg 5 µg
Number of targets x target size 1x 5 kb 1x 5 kb 10x 5 kb 5x (1x 5 kb)
% of target in genome 0.00017% 0.00017% 0.00167% 0.00017%
% of input that is target, after enrichment 0.00017% ~0.4% ~4% ~0.4%
Total sequencing yield ~10 Gb ~1.1 Gb ~1.1 Gb ~3.7 Gb
Target yield (all targets, all samples) 16.7 kb 4.3 Mb 42.5 Mb 14.8 Mb
Target coverage/sample 3x 850x 850x 590x
Background coverage/sample 3x ~0.35x ~0.35x ~0.25x

Table 1. Calculations of enrichment data with variation of several key input parameters. Example cases based on typical sequencing runs of control human reference samples.

Sources of background and off-target effects

Non-target DNA observed during sequencing can be split into two categories: off-target and background. Each crRNA in a panel should allow Cas9 to cut genomic DNA at the site that perfectly matches its sequence, but may also cut at sites bearing multiple mismatches, leading to adapter ligation at those sites and a reduced proportion of on-target reads. This “off-target” activity can be mitigated by the careful design of crRNAs to have a minimum number of mismatches while maintaining cut efficiency. Background DNA can come from efficiency in the dephosphorylation step or ligation of sequencing adapters to non-cut DNA.

To analyse enrichment data and assess the level of off-target and background DNA sequenced, please refer to the Evaluation of read-mapping characteristics from a Cas-mediated PCR-free enrichment bioinformatics tutorial.

DNA input requirements

For optimal target coverage, Cas9 targeted sequencing experiments require at least ~5 pg of target material in the input. For example, enrichment of a 5 kb human gene would require 5 µg of input DNA.

Input amount Total throughput (Gb) % of reads on target % of bases on target Total Mb on target Mean number of reads per target (coverage) Mean Mb per target
500 ng 0.06 5 4 2 37 0.2
1 µg 0.12 5 4 4 75 0.4
2 µg 0.37 5 4 13 240 1.3
5 µg 1.00 6 5 43 880 4.3

Table 2. Experimental data from an DNA input titration from a human 10 gene-panel using the Ca9 targeted sequencing protocol. The proportion of reads or bases on target remains constant, but target coverage is roughly proportional to input amount, for 500 ng to 5 µg total input.

For an efficient enrichment experiment, high molecular weight DNA is required. The median length of molecules in a genomic DNA sample can greatly impact upon the efficiency of the enrichment in two main ways:

  • The shorter the median DNA fragment length, the greater the concentration of DNA ends that must be protected from ligation (assuming constant mass of DNA). Thus, in general, the shorter the DNA, the greater the background
  • If pairs of crRNAs designed to excise a ROI are placed more than one median fragment length apart, coverage drop between the cut sites will be significant.

For these reasons, we recommend purifying genomic DNA to the highest possible length and quality and matching the spacing of crRNAs to the expected median fragment length.

6. Further information regarding Cas9 and probes

Further experimental planning

The table below highlights the key features and requirements of Cas9 targeted sequencing using S. pyogenes Cas9. Although other CRISPR/Cas enzymes, such as Cas12a, are available, we recommend that new users begin developing their own targeted sequencing assay using S. pyogenes Cas9 for the following reasons:

  • Cas9 enzymes, including a wide variety of high-fidelity mutants, are more widely available from third parties;
  • A far wider library of verified probe designs is already available for Cas9 via the scientific literature;
  • The rules for predicting the efficiencies (i.e. the proportion of target predicted to be cut) and off-target effects of Cas9 crRNAs are better understood than for other enzyme crRNAs, i.e., the design criteria are better-defined;
  • The Cas9 cleavage reaction yields a blunt-ended double-stranded break, so the biochemistry required to generate an end to which an adapter can be ligated is simpler.
Parameter S. pyogenes Cas9 (1.)
Protospacer length 20mer
RNAs required crRNA and tracrRNA (or sgRNA)
Cost per crRNA at 2 nmol scale (2.) $95 (US) £48 (UK)
Cost of tracrRNA at 5 nmol scale (3.) $95 (US) £48 (UK)
PAM NGG, 3'
Structure of double-strand break blunt-ended
End-preparation method dA-tailing

Table footnote:

  1. Streptococcus pyogenes species
  2. Based on pricing of Alt-R™ synthetic crRNA at the lowest synthesis scale in vials, provided by Integrated DNA Technologies (IDT). 2 nmol is sufficient for ~200 reactions. Workflows involving multiple crRNAs require preparing an equimolar mix of crRNAs. crRNAs may be ordered in bulk in multi-well format at lower cost. Pricing correct as of September 2018.
  3. Based on pricing of Alt-R™ synthetic tracrRNA from IDT. One-time purchase: the same tracrRNA is used for all Cas9 cleavage reactions. 5 nmol is sufficient for ~500 reactions. Pricing correct as of September 2018.

Last updated: 4/19/2023

Document options