Nanopore sequencing accuracy
For many years Oxford Nanopore has continuously iterated our technology to improve its performance. We continue to improve the nanopore sensing system, through updates to analytical methods and new chemistries. This page guides you on what to expect from the nanopore sequencing system, and which tools to choose to achieve these results.
Nanopore DNA and RNA sequencing accuracy can be measured in a number of ways, and the relevant metric for a scientist will depend on the specific experiments being performed.
As with all systems, choosing the most up to date analysis tools for the analysis that you are interested in is critical, and the quality of the sample can also influence the outcome. With so many relevant variables, clear guidelines are important, and below we have defined some accuracy measurement types, and included recommendations for best performance.
Raw read accuracy
Nanopore sequencing provides direct electronic analysis of the target molecule, rather than sequencing a synthetic copy or using surrogate markers such as fluorescence. Basecalling algorithms are then used to provide an interpretable output of the sequencing reads. Nanopore basecalling algorithms are continuously improved to enhance accuracy over time, also allowing new methods to be applied to previously sequenced raw data.
Direct sequencing avoids sources of bias such as PCR and gives native information about the target molecule. We define raw read accuracy as the accuracy achieved when reading a single DNA or RNA fragment/molecule once. Applications for which raw read sequencing is relevant include those where time-to-result maybe be critical, but at this time most applications are more likely to focus on variant calling, consensus accuracy or other metrics. Improvements in raw read accuracy can drive improvements in other accuracy metrics.
Single molecule accuracy is similar to raw-read accuracy, but in the case of duplex reads combines the basecalled data from template and complement strands of a single DNA molecule into a higher-quality basecall. Duplex data is capable of delivering data in excess of Q30, and perfect reads from DNA molecules 10s of kilobases in length.
Latest updates to nanopore sequencing achieve:
|Flow cell||Kit||Sequencing & basecalling parameters||Sample||Raw read accuracy||Output|
|R10.4.1||Ligation Sequencing Kit V14||400 bps, 5 kHz, HAC basecalling||Human HG002||99.0% (Q20)||●●●|
|R10.4.1||Ligation Sequencing Kit V14||400 bps, 5 kHz, SUP basecalling||Human HG002||99.5% (Q23)||●●●|
|R10.4.1||Ligation Sequencing Kit V14||400 bps, 5 kHz, Duplex basecalling||Human HG002||>99.9% (Q30)||●|
Single nucleotide variants (SNVs), small indels and structural variants (SVs) are critical for our understanding of how genomic changes drive phenotypes. The ability of nanopore technology to sequence any length of nucleic acid molecule allows for unprecedented resolution of complex structural variants, as well as identification and haplotype phasing of single nucleotide alterations.
The ability to accurately call variants is often expressed as precision and recall values, generated from reads covering the position of interest multiple times. Precision is the proportion of calls in the call set that are correct, whereas recall is the percentage of variants present in the genome that are found in the call set.
The latest harmonic mean of precision and recall (F1 score) for nanopore chemistries can be found in Figure 1. The tool chain to achieve similar metrics is reported in the legend
Latest updates to nanopore sequencing achieve:
Building a consensus sequence involves combining multiple copies of a specific DNA/RNA region, sequenced in separate reads, into a single high-quality sequence. In doing so, the multiple copies combined together to form a single sequence means any random errors are averaged and so 'cancelled' out, producing a more accurate ‘consensus’ sequence to work from.
Latest updates to nanopore sequencing achieve:
|Flow cell||Kit||Consensus accuracy||Sequencing & basecalling parameters||Analysis tools||Sample|
|R10.4.1||Ligation Sequencing Kit V14 Ultra-long Sequencing Kit V14||Telomere-to-telomere (T2T): 99.994%* 18 full chromosome haplotype- resolved, N50>135 Mb||400 bps, 5 kHz, simplex SUP, duplex||Assembly with Verkko, phasing with Gfase||Human HG002|
|R10.4.1||Ligation Sequencing Kit V14||Q50 at 10-20x||400 bps, 4 kHz, simplex SUP||Assembly with Flye||Zymo mock community (bacterial)|
*Generated by combining approx. 40x duplex, 40x ultra-long and 40x Pore-C
Single molecule consensus
Consensus generation can also be applied to specific regions of interest, by combining multiple exact copies of a single original fragment or molecule into a single high-quality sequence. These exact copies could be sequenced together in a single read, for example generated by circular or linear amplification, or could be associated by use of a unique identifier (UMI). Through combining multiple copies together, a higher confidence in accuracy is achieved.
Applications where single molecule consensus could be particularly useful include liquid biopsy low-frequency variant detection, or 16S sequencing.
Covering all of the genome
To create an accurate picture of the genome, it is important for a sequencing technology to reach all parts of it, even the parts which are difficult to map. Genomes are littered with repetitive and low-complexity regions, which are difficult to sequence and align using traditional technologies. For example, it is estimated that short-read technology reaches only 92% of the human genome, leaving 8% that contains many disease-relevant genes, excluded from the dataset. Nanopore technology has been shown to reduce these “dark” areas of the genome by 81%, shedding light on parts of the genome not sequenced by any other technology (Ebbert, 2019), and giving a more complete picture. Ultra-long nanopore sequencing reads were central to completing the human genome, allowing to resolve of repetitive regions unresolvable with other technologies (Nurk, et al., Science, 2022).
Tuning accuracy for your experimental need
Want to fine-tune accuracy based on your needs? Choose between duplex and simplex basecalling models.
Simplex reads: generated by reading a single strand through a nanopore. Accuracy fine-tuned with basecalling models:
- Fast basecalling: fastest, least computationally intense, highest compatibility with real-time basecalling on device
- High Accuracy basecalling (HAC): highly accurate, intermediate speed and computational requirement. Good compatibility with real-time basecalling device
- Super accuracy basecalling (SUP): the most accurate, more computationally intense
Note: modified basecalling (e.g. 5mC and 5hmC) can be performed alongside any of the basecalling methods mentioned above.
Higher-quality reads are now available from the “squiggle”: sampling frequency has been increased from 4000 to 5000 samples per second (5 kHz) in the latest MinKNOW release, with more data points for basecalling. As a result, all read accuracies are enhanced for both duplex and simplex and all basecalling models.
Our latest Q20+ chemistry enables duplex reads: the second strand can follow the first through the same nanopore, producing information from two orthogonal signals, merged into one consensus sequence. Single molecule accuracy of duplex is ~Q30 or higher. A specific basecaller for duplex reads is available.
Interested in accessing high-duplex flow cells? Register your interest here.
Fast analysis: genomics variants (SVs, SNVs, etc.), phasing, de novo assembly, etc.
High accuracy analysis: genomics variants (SVs, SNVs, etc.), phasing, de novo assembly, etc.
Super accuracy analysis: genomics variants (SVs, SNVs, etc.), phasing, de novo assembly, etc.
Highest quality and accuracy: de novo assembly, T2T
Table 1. Recommendations, output and computational requirements of simplex and duplex reads sequencing in combination with available basecalling models.
The four ‘canonical’ bases (A, C, G and T in DNA and A, C, G and U in RNA) can be biologically modified by the presence of additional chemical group, such as methylation. These modifications can significantly alter gene expression and are implicated in a range of diseases including cancer. Scientists are only just beginning to scratch the surface of how newly-recognised epigenetic changes impact function, for example, RNA is known to possess over 170 distinct modifications.
Oxford Nanopore’s technology can sequence the DNA or RNA molecules directly, enabling direct, real-time detection of 5mC, 5hmC, 6mA.
This allows for detection of these base modifications with no additional experiments or sample preparation steps required, and modification information is accessible through onboard software. In contrast, traditional technologies can require a separate process called bisulphite sequencing, which uses aggressive sample treatment and has a number of limitations.
Sequencing may be used to perform a certain biological test, for example presence or absence of a particular organism, species identification, testing for one or more genetic variants, or to perform multi-omics testing in one assay. Test accuracy can be defined as the ability of the technology to answer that question correctly every time, and this can be quantified by identifying the proportion of true and false positives and negatives among a total number of cases. Test accuracy is an important metric for areas such as food safety, and microbial surveillance. Nanopore sequencing has been shown to be effective at accurately performing many different types of tests. Browse the resource centre for examples.
For these examples, the analysis pipeline is specific to the test in question, but tool recommendations can be found in the protocol builder.
In 2020, the UK Government published a study of 23,000 samples showing that Oxford Nanopore’s first regulated test has gold-standard accuracy. Read the study.
Here you will find accuracy information relating to our older chemistry options.
|Flow cell||Kit||Accuracy type||Analysis tools||Sample|
|R9.4.1||SQK-LSK110||Raw read98.3% modal (Q18, simplex)||"Super accuracy" basecaller in MinKNOW||Zymo mock community|
|R9.4.1||SQK-LSK110||Circular genome consensusQ50 at ~100X||Basecall with "super accuracy" in MinKNOWAssemble with FlyePolish with Medaka||Zymo mock community (bacterial)|
|R9.4.1||SQK-LSK109||Single molecule consensus99.995%, Q45||UMI||rRNA amplicons (25X)|
|R9.4.1||SQK-LSK110||SV variant calling at 50XPrecision: 95.5Recall: 97.5F1: 96.5||EPI2ME workflow, github pipeline, EPI2ME Labs tutorialusing LRA & cuteSV||human, HG002|
|R9.4.1||SQK-LSK110||SNP variant calling at 50xPrecision: 99.9Recall: 99.9F1: 99.9||DeepVariant||human, HG002|
Our goal is to enable to genetic analysis of anything, by anyone, anywhere, and as such we are pursuing constant iterative performance improvements. For many years Oxford Nanopore has continuously iterated our technology to improve its performance. We continue to improve the nanopore sensing system, boosting accuracy performance through updates to analytical methods and new chemistries. Latest releases can be found in the Nanopore Community, or in the News section.
Get in touch
Talk to us
If you have any questions about our products or services, chat directly with a member of our sales team.
Book a sales call
To book a call with one of our sales team, please click below.