Analysis solutions for nanopore sequencing data
Nanopore sequencing presents a number of significant advantages which allow the sequencing process to be tailored to your requirements:
- Real-time basecalling, enabling immediate access to results
- Stop sequencing as soon as sufficient data has been obtained
- Stop, wash and reuse a flow cell
- Onboard basecalling with Guppy means that neither a local infrastructure nor a stable internet connection is needed
The nanopore sequencing analysis workflow is simple and easy to follow: with five steps from raw data acquisition to analysis completion and experimental interpretation. From the moment data acquisition begins, analysis can be performed in real time. As detailed on this page, Oxford Nanopore provides solutions at each stage.
Primary data acquisition with MinKNOW
MinKNOW, the operating software that drives nanopore sequencing devices, carries out several core tasks, including data acquisition, real-time analysis and feedback, local basecalling, and data streaming – whilst providing device control including selecting the run parameters, sample identification and tracking, and ensuring that the platform chemistry is performing correctly to run the samples. MinKNOW produces FAST5 (HDF5) files and/or FASTQ files, according to your preference. FAST5 files contain raw signal data that can be used for basecalling.
Basecalling and primary data analysis with Guppy
Guppy is a data processing toolkit that contains the Oxford Nanopore Technologies' basecalling algorithms, and several bioinformatic post-processing features. It is provided as binaries to run on Windows, OS X and Linux platforms, as well as being integrated with MinKNOW, the Oxford Nanopore device control software.
Early downstream analysis components such as barcoding/demultiplexing, adapter trimming and alignment are contained within Guppy. Furthermore, Guppy now performs modified basecalling (5mC, 6mA and CpG) from the raw signal data, producing an additional FAST5 file of modified base probabilities.
Which analysis approach should I use?
|EPI2ME||Protocol & analysis tutorials||Community developed tools||Custom analysis pipelines|
|Bioinformatic capability needed|
|How||Use the cloud-based EPI2ME platform for real-time analysis workflows.||Get analysis recommendations and clear tutorials on the use of open-source tools.||Run open-source tools written and developed by the Nanopore Community.||All the data, raw or basecalled, can be used in custom analysis pipelines written by the user for specific applications.|
EPI2ME: real-time analysis workflows
EPI2ME is a cloud-based data analysis platform, offering easy access to several workflows for end-to-end analysis of nanopore data in real-time. An intuitive graphical interface facilitates the interpretation of individual or multiple barcoded samples. Full QC metrics give feedback on run performance and include number of reads, read length distribution and quality scores.
Analysis workflows provided on the EPI2ME platform are described in the table below. For more detailed descriptions and tutorial videos for each workflow, please follow the “Analysis workflow” link.
|Rapid species identification of fungi, bacteria, viruses, or archaea, based on execution of the Centrifuge classification engine. Output: Real-time building taxonomic tree|
|Real-time identification of bacteria and archaea at genus level using the 16S rRNA gene; with nanopore long reads, sequencing of the full-length 16S rRNA gene is achieved. The experimental workflow involves 16S gene amplification from a biological sample and nanopore sequencing. The 16S Rapid Sequencing Kit is available to buy on the nanopore store. Output: Classification report, including the number of reads analysed and their taxonomic classification|
|Comprehensive analysis of antimicrobial resistance (AMR) in individual and metagenomic samples. The ARMA workflow identifies the genes responsible for antibiotic resistance from FASTQ data, using species identification with WIMP and AMR identification via ARMA, which is integrated with the CARD database. Output: Report detailing the resistance genes found and gene overviews (depth of coverage, alignment details, and the resistance profile from the CARD database for specific genes)|
Human genome analysis
|Thorough exome alignment and analysis. Can be used for a variety of analyses, including amplicon sequencing, sequence capture and sequence enrichment. Reads are aligned to the human exome using the minimap2 aligner. Output: Report providing sequencing and alignment metrics, listed on a per-gene basis.|
|One-click analysis of structural variation (deletions, insertions and duplications) within a human whole-genome dataset versus the reference genome. Output: Report providing a searchable list of variants and their genomic location, and a VCF file of the results.|
|FASTQ alignment of reads in real-time against the GRCh38 human reference using minimap2. Output: Alignment report displaying depth of coverage across each chromosome, and accuracy of alignment.|
|Custom reference FASTA file upload to EPI2ME for subsequent read alignment using the Custom Alignment workflow. Output: Report of alignment success.|
|Tailor sequencing analysis to your specific requirements without the need for complex bioinformatics pipelines, by uploading and aligning to a custom FASTA reference. Reads are aligned to an uploaded custom FASTA reference using the minimap2 aligner Output: Report stating the success of the alignment, including depth of coverage across the reference, alignment accuracies, and number of reads analysed per barcode.|
Quality control and raw data processing
|Demultiplexing of barcodes in sequencing data; the barcoding option can be selected within other workflows, such as WIMP or Human Exome. Output: Demultiplexed reads returned in individual subfolders; a barcoding report is also produced, including the number of reads per barcode.|
|DNA QC. It is advised to run a Lambda control experiment to try out your nanopore sequencing platform before sequencing your own samples. Output: Report detailing sequence length, accuracy, quality score, and the amount of data generated.|
|RNA QC. It is advised to run an RNA control experiment to try out your nanopore sequencing platform before sequencing your own samples. Output: Report detailing sequence length, accuracy, quality score, and the amount of data generated.|
|A check to ensure that the EPI2ME cloud-based analysis is functioning correctly, and that basecalled reads can be uploaded and downloaded. Output: Confirmation of the success of read uploading and downloading.|
Protocol builder and analysis tutorials
Theprovides recommended extraction protocols, library preparation methods, and downstream analysis workflows, enabling you to build a bespoke end-to-end protocol to suit your specific requirements.
Ais now available providing tutorials on tools available for analysing your nanopore sequencing data. Each tutorial provides clear step-by-step instructions and example data. Current tutorials available include:
- Differential transcript usage and gene expression analyses using pipeline-transcriptome-de
- Annotation of gene transcripts and novel gene isoform discovery with pinfish
- Differential gene expression analysis with DESeq2
- Tutorial BasicQC providing guidance for the review of sequencing summary statistics based on the sequencing_summary.txt file produced by the Guppy basecalling software.
- Structural variation calling with pipeline-structural-variation
- Evaluation of read-mapping characteristics from a Cas-mediated PCR-free enrichment
Oxford Nanopore Technologies also has its own Github page featuring a wide variety of analysis tools, including those featured in our analysis tutorials, tailored specifically to the analysis of nanopore long-read sequencing data. Please refer to the “Custom analysis pipelines” tab for further information regarding the Github page.
To customise your analysis, FAST5 and FASTQ files produced by MinKNOW can be taken forward into a variety of analysis tools developed by users of nanopore technology. These tools are designed both to work with the long reads produced by nanopore sequencing, and to use real-time analysis wherever it is needed.
Such tools are available in the resources section and have a wide variety of applications, from data processing (e.g. demultiplexing and filtering) to assembly and variant calling.
Custom analysis pipelines
The standard FAST5/FASTQ data output from nanopore sequencing devices allows data utilisation in a variety of downstream analysis platforms and custom user-developed pipelines, tailored to your specific application.
Achieve the greatest flexibility by writing your own custom scrips from either FAST5 or FASTQ sequencing data and explore new routes of analysis tailored to your unique requirements. The research software Taiyaki can be used for training neural network models for basecalling of nanopore sequencing reads. This software is available from the Oxford Nanopore Github page.
Other key tools available on our Github page include:
- Medaka consensus tool: a tool for base-space consensus polishing, used by the team at Oxford Nanopore
- Medaka variant caller: a tool for calling small variants on nanopore data, used by the team at Oxford Nanopore
- Research basecallers: enabling users to deploy the latest algorithms from the research teams at Oxford Nanopore