Analysis solutions for nanopore sequencing data
Nanopore sequencing presents a number of significant advantages which allow the sequencing process to be tailored to your requirements:
- Real-time basecalling, enabling immediate access to results
- Stop sequencing as soon as sufficient data has been obtained
- Stop, wash and reuse a flow cell
- Onboard basecalling with Guppy means that neither a local infrastructure nor a stable internet connection is needed
The nanopore sequencing analysis workflow is simple and easy to follow: with five steps from raw data acquisition to analysis completion and experimental interpretation. From the moment data acquisition begins, analysis can be performed in real time. As detailed on this page, Oxford Nanopore provides solutions at each stage, accommodating all user needs, applications, and levels of bioinformatics expertise.
Primary data acquisition with MinKNOW
MinKNOW, the operating software that drives nanopore sequencing devices, carries out several core tasks, including data acquisition, real-time analysis and run feedback, local basecalling, and data streaming. Adaptive sampling is also incorporated into MinKNOW; this is a novel method of targeted sequencing requiring no additional library prep – regions of interest are rejected or accepted in real time by the software itself.
MinKNOW produces FAST5 (HDF5) files and/or FASTQ files, according to your preference. FAST5 files contain raw signal data that can be used for basecalling and calling base modifications, such as methylation.
Basecalling and primary data analysis
Basecalling can be defined as the process of converting the electrical signals generated by a DNA or RNA strand passing through the nanopore into the corresponding base sequence of the strand.
A choice of basecalling tools is available, some of which are fully supported and some of which are in development. Guppy, an example of the former, is a data processing toolkit that contains Oxford Nanopore’s basecalling algorithms, and several bioinformatic post-processing features, such as barcoding/demultiplexing, adapter trimming, and alignment. The Guppy toolkit also performs modified basecalling (5mC, 6mA and CpG) from the raw signal data, producing an additional FAST5 file of modified base probabilities. Guppy is integrated into MinKNOW and is also available as a standalone version.
Research basecallers, such as Bonito, are available on Oxford Nanopore’s GitHub, providing users with access to the latest, high-performance cutting-edge algorithms which are currently in development.
A range of approaches are available for downstream analysis of nanopore sequencing data, to suit all requirements and levels of bioinformatics expertise.
Oxford Nanopore offers EPI2ME, a cloud-based analysis platform providing real-time analysis workflows, with no command line experience required. EPI2ME Labs Notebooks deliver post-run analysis workflows in a tutorial format designed for developing skills and confidence in nanopore sequence data analysis and exploration. EPI2ME Labs workflows automate the tutorial data flow, to enable high-throughput and more “hands-off” sequence analysis. The standard data output from nanopore sequencing devices can also be utilised in a variety of research software that are continually being developed and released by the teams at Oxford Nanopore. Lastly, a range of Community-developed tools are available, which have been developed by the user community for a wide variety of research applications.
Which analysis approach should I use?
|EPI2ME||EPI2ME Labs Notebooks||EPI2ME Labs Workflows||Community-developed tools||Research software and custom analysis|
|Bioinformatic capability needed|
|How||Use the cloud-based EPI2ME platform for real-time analysis workflows.||Use EPI2ME Labs for local, post-run analysis and data exploration.||Access EPI2ME Labs tutorials via simplified, automated Nextflow workflows.||Run open-source tools written and developed by the Nanopore Community.||Access the latest research algorithms from Oxford Nanopore|
|Where||EPI2ME Labs Notebooks||EPI2ME Labs Workflows||Community-developed tools||Oxford Nanopore GitHub|
EPI2ME: real-time analysis workflows
EPI2ME is a cloud-based data analysis platform, offering easy access to several workflows for end-to-end analysis of nanopore data in real-time. An intuitive graphical interface facilitates the interpretation of individual or multiple barcoded samples. Full QC metrics give feedback on run performance and include number of reads, read length distribution and quality scores.
Analysis workflows provided on the EPI2ME platform are described in the table below. For more detailed descriptions and tutorial videos for each workflow, please follow the “Analysis workflow” link.
|Rapid species identification of fungi, bacteria, viruses, or archaea, based on execution of the Centrifuge classification engine. Output: Real-time building taxonomic tree. Watch WIMP video|
|Rapid, real-time organism identification: similar to FASTQ WIMP, but uses a database restricted to the human genome and complete virus genome sequences (NCBI RefSeq). Output: Real-time building taxonomic tree. Watch WIMP video|
|Real-time identification of bacteria and archaea at genus level using the 16S rRNA gene; with nanopore long reads, sequencing of the full-length 16S rRNA gene is achieved. The experimental workflow involves 16S gene amplification from a biological sample and nanopore sequencing. The 16S Rapid Sequencing Kit is available to buy on the nanopore store. Output: Classification report, including the number of reads analysed and their taxonomic classification. Watch 16S video|
|Comprehensive analysis of antimicrobial resistance (AMR) in individual and metagenomic samples. The ARMA workflow identifies the genes responsible for antibiotic resistance from FASTQ data, using species identification with WIMP and AMR identification via ARMA, which is integrated with the CARD database. Output: Report detailing the resistance genes found and gene overviews (depth of coverage, alignment details, and the resistance profile from the CARD database for specific genes). Watch ARMA video|
|Production of high-quality SARS-CoV-2 consensus sequences, based on ARTIC FieldBioinformatics software. Consensus sequences are evaluated using the Nextclade software to visualise genetic variants and for clade assignment. Output: Consensus sequence FASTA file and Nextclade report.|
Human genome analysis
|Thorough exome alignment and analysis. Can be used for a variety of analyses, including amplicon sequencing, sequence capture and sequence enrichment. Reads are aligned to the human exome using the minimap2 aligner. Output: Report providing sequencing and alignment metrics, listed on a per-gene basis.|
|One-click analysis of structural variation (deletions, insertions and duplications) within a human whole-genome dataset versus the reference genome. Output: Report providing a searchable list of variants and their genomic location, and a VCF file of the results. Watch SV caller video|
|FASTQ alignment of reads in real-time against the GRCh38 human reference using minimap2. Output: Alignment report displaying depth of coverage across each chromosome, and accuracy of alignment.|
|Custom reference FASTA file upload to EPI2ME for subsequent read alignment using the Custom Alignment workflow. Output: Report of alignment success.|
|Tailor sequencing analysis to your specific requirements without the need for complex bioinformatics pipelines, by uploading and aligning to a custom FASTA reference. Reads are aligned to an uploaded custom FASTA reference using the minimap2 aligner Output: Report stating the success of the alignment, including depth of coverage across the reference, alignment accuracies, and number of reads analysed per barcode.|
Quality control and raw data processing
|Demultiplexing of barcodes in sequencing data; the barcoding option can be selected within other workflows, such as WIMP or Human Exome. Output: Demultiplexed reads returned in individual subfolders; a barcoding report is also produced, including the number of reads per barcode.|
|DNA QC. It is advised to run a Lambda control experiment to try out your nanopore sequencing platform before sequencing your own samples. Output: Report detailing sequence length, accuracy, quality score, and the amount of data generated.|
|RNA QC. It is advised to run an RNA control experiment to try out your nanopore sequencing platform before sequencing your own samples. Output: Report detailing sequence length, accuracy, quality score, and the amount of data generated.|
|A check to ensure that the EPI2ME cloud-based analysis is functioning correctly, and that basecalled reads can be uploaded and downloaded. Output: Confirmation of the sxuccess of read uploading and downloading.|
EPI2ME Labs Notebooks: guided bioinformatics workflows
EPI2ME Labs tutorials are notebook-based bioinformatics solutions, designed to assist you in developing your skills and confidence in the analysis of nanopore sequencing data. The tutorials provide best practise examples of how to analyse and explore nanopore sequencing data, using both open-source software and our own research tools.
EPI2ME Labs tutorials are based on the Jupyter notebook environment; notebooks contain the code needed to run the analysis, and are pre-configured with sensible default parameters and example datasets. Notebooks are therefore an accessible option to those new to sequencing analysis as well as more experienced bioinformaticians comfortable working with Python but new to analysis of nanopore sequence data. EPI2ME Labs runs on Windows, MacOS, and Linux operating systems.
What’s the difference between EPI2ME and EPI2ME Labs?
EPI2ME is a cloud-based platform, with a graphical interface and simple, point-and-click solutions: no bioinformatics experience is needed. The platform provides pre-configured analysis workflows and is focused on the real-time analysis of your data and its presentation.
EPI2ME Labs is local, currently supported on the GridION nanopore sequencing platform. It is customisable, with the freedom to develop your own workflows and databases, and to modify outputs according to your preferences. EPI2ME Labs Notebooks are designed to advance your bioinformatics skills and assist you in tailoring analysis to your individual requirements. EPI2ME Labs is not designed to be a real-time analysis solution.
As well as the best practice tutorials, EPI2ME Labs provides these structured analysis workflows in a simplified format (see EPI2ME Labs Workflows tab).
|EPI2ME||EPI2ME Labs||EPI2ME Labs Workflows|
|Location||Cloud-based||Local||Local and distributed (cluster and/or cloud)|
|Aim||Provide simple, one-click analysis solutions||Provide bioinformatics best practices and training||Provide formalised workflows for higher throughput analyses|
|Focus||Simple, rapid, real-time analysis||Customisable, exploratory, post-run analysis||Standardised, high-throughput analysis|
EPI2ME Labs tutorials
The EPI2ME Labs notebook tutorials require no additional installation and provide step-by-step support with dynamic, interactive outputs. Notebook tutorials are linked to from the EPI2ME Labs landing page, and also via Oxford Nanopore’s GitHub; a list of the current notebook tutorials can be found here.
The following tutorials are available, covering a wide variety of applications, with more in development:
- Basic tasks – tutorials introducing file formats (e.g. FASTQ, VCF, BAM, and FAST5); workflows on simple quality control; adaptive sampling input file requirements
- Assembly – a workflow applying Flye and Medaka for high-quality assembly of small to mid-sized genomes
- Metagenomics – metagenomic classification and targeted 16S analysis tutorials
- Transcriptomics (RNA and cDNA) – quality control workflow, isoform detection, and differential gene expression and transcript usage analyses with long nanopore reads
- Variant calling – structural variant and small variant calling analysis tutorials
- Modified base detection – using Medaka to process and summarise the modified base output of Guppy
- SARS-CoV-2 analysis – a notebook based around the ARTIC pipeline for the analysis of SARS-CoV-2 multiplexed amplicon datasets
- Cas9 targeted enrichment – a tutorial to complement the Cas9 targeted enrichment and nanopore sequencing experimental workflow
- Clone validation – validation of synthetic biological constructs using nanopore sequencing
Structural variation pipeline tutorial: Analysis plots
EPI2ME Labs Workflows: simplified, automated analysis
EPI2ME Labs Workflows provide users with the structured analysis workflows introduced in EPI2ME Labs tutorials, but in a simplified format intended for high-throughput and automated analyses. EPI2ME Labs Workflows are built using the Nextflow language; these Workflows are therefore also ideal for bioinformaticians familiar with Nextflow solutions, but new to the analysis of nanopore sequence data.
EPI2ME Labs Workflows use bioinformatics packages that have been installed in Docker containers to simplify and streamline complex analyses. We also support the installation of software using Conda to accommodate a range of different use-cases.
The following workflows are currently available, with more in preparation:
- ‘Basic tasks’ package; this includes mapping sequences to a reference and preparing summary statistics, and deriving species abundance (wf-alignment)
- Assembly of small plasmid sequences for verification of molecular cloning experiments (wf-clone-validation)
- Structural variant calling in human whole-genome sequencing data (wf-human-sv)
- Small variant calling and annotation in haploid samples (wf-hap-snps)
- Metagenomic taxonomic classification of reads from mixed samples (wf-metagenomics)
- ARTIC SARS-CoV-2 ARTIC analysis workflow (wf-artic)
- Coming soon: Exome analysis
To customise your analysis, FAST5 and FASTQ files produced by MinKNOW can be taken forward into a variety of analysis tools developed by users of nanopore technology. These tools are designed both to work with the reads of any length produced by nanopore sequencing, from short to ultra-long, and to use real-time analysis wherever it is needed.
Such tools can be found in our Resource Centre, and have a wide variety of applications, from data processing (e.g. demultiplexing and filtering), to assembly, variant calling, gene expression analysis, and fusion detection.
Research software: the latest algorithms
The bioinformatic analysis of nanopore sequencing data is a rapidly evolving and continually advancing area of research. The latest algorithms from the research teams at Oxford Nanopore are available for users to test and incorporate into custom user-developed pipelines, tailored to your specific application. Due to their rapid evolution, these research algorithms are less supported than EPI2ME and EPI2ME Labs options outlined in this page.
Research software can be accessed via the Oxford Nanopore GitHub page, and includes the latest research basecallers, such as Bonito, and Taiyaki, an algorithm that can be used to train neural network models for basecalling of nanopore sequencing reads.
Other key tools available from the Oxford Nanopore GitHub page include:
- Medaka variant caller: a tool for calling small variants in nanopore data
- Megalodon: high accuracy modified base calling and variant detection