How basecalling works
Nanopore sequencing is the passing of a single molecule through a nanopore that has an ionic current flowing across it. When the molecule passes through a pore, it disrupts the current across the pore and changes the electrical signal characteristically. During nucleic acid sequencing, these electrical signal changes are decoded using basecalling algorithms to determine the DNA or RNA sequence in real time.
Capturing the signal
When sequencing DNA or RNA through nanopores, the characteristic electrical signals are recorded by MinKNOW, the software that controls Oxford Nanopore Technologies sequencing devices. This entire characteristic electrical signal is known as a ‘squiggle’. MinKNOW processes the squiggle into reads in real time — each read corresponding to a single strand of sequenced DNA or RNA. These reads contain not only canonical bases but can also include base modifications such as methylation.
The nanopore structure
The structure of the nanopore determines the information contained within a squiggle: the raw signal that reflects the molecules that have passed through the nanopore before basecalling. Different nanopores contain different ‘readers’. The previous R9 nanopore had a single reader in the middle of the barrel, but the new and improved R10 nanopore has two readers spaced along its length, meaning more bases within a DNA or RNA strand can contribute to the squiggle at any one time. This leads to improvements in capturing signals around homopolymer regions, where multiples of the same nucleotide appear one after the other on a DNA or RNA strand.
Basecallers
Basecalling algorithms process the raw signal to decode the sequence of bases within strands of DNA or RNA into data stored in BAM or FASTQ files. Dorado, the default basecaller integrated within MinKNOW, can perform basecalling during or after sequencing, depending on experimental needs. All basecalling software and base modification models are first released as open-source tools on the Oxford Nanopore GitHub to provide the latest features and accuracy improvements as early as possible. By providing open access to the newest advances in software tools, researchers can provide feedback and help shape the progress of nanopore technology whilst benefitting from new performance features before they are integrated into MinKNOW.
Neural networks
The basecalling algorithms currently deployed by Oxford Nanopore Technologies are based on neural networks (a machine learning model) to predict base sequences from the raw signal. These computational neural networks are loosely modelled on biological neural networks within the human brain, with layers of ‘nodes’ (equivalent to neurons) passing data between themselves to arrive at a predicted base sequence. Crucially, just like a human brain, these neural networks can learn and improve their predictions over time. Oxford Nanopore Technologies uses a variety of neural network architectures, including transformer models and recurrent neural networks, to develop basecaller algorithms. The variety of architecture allows information from across the entire raw signal to better inform the basecaller output. Alternative basecalling algorithms are continuously being developed and assessed by Oxford Nanopore to improve the accuracy and speed of basecalling models.
Outstanding accuracy improvements with basecalling updates
Advancements in both the Oxford Nanopore Technologies platform and machine learning have consistently improved basecalling accuracy results, including raw-read single molecule accuracy, consensus accuracies, and completeness of genome assemblies.
Basecalling acceleration
During sequencing experiments, MinKNOW streams signal data in real time, meaning basecalling can begin even before a DNA or RNA strand has finished passing through a nanopore. For basecallers to keep up with sequencing, they utilise graphical processing units (GPUs) to calculate multiple values in parallel and provide real-time data.Devices with integrated compute (for example, GridION and PromethION 24) feature onboard GPUs to enable real-time basecalling. This setup is compatible with modification calling, barcode demultiplexing, and alignment to a reference genome during live sequencing.
Get in touch
Subscribe
Talk to us
If you have any questions about our products or services, chat directly with a member of our sales team.
)
)
)
)
)
)