A MinION™‐based pipeline for fast and cost‐effective DNA barcoding

DNA barcodes are useful for species discovery and species identification, but obtaining barcodes currently requires a well‐equipped molecular laboratory and is time‐consuming, and/or expensive. We here address these issues by developing a barcoding pipeline for Oxford Nanopore MinION™ and demonstrating that one flow cell can generate barcodes for ~500 specimens despite the high basecall error rates of MinION™ reads. The pipeline overcomes these errors by first summarizing all reads for the same tagged amplicon as a consensus barcode. Consensus barcodes are overall mismatch‐free but retain indel errors that are concentrated in homopolymeric regions. They are addressed with an optional error correction pipeline that is based on conserved amino acid motifs from publicly available barcodes. The effectiveness of this pipeline is documented by analysing reads from three MinION™ runs that represent three different stages of MinION™ development. They generated data for (i) 511 specimens of a mixed Diptera sample, (ii) 575 specimens of ants and (iii) 50 specimens of Chironomidae. The run based on the latest chemistry yielded MinION™ barcodes for 490 of the 511 specimens which were assessed against reference Sanger barcodes (N = 471). Overall, the MinION™ barcodes have an accuracy of 99.3%–100% with the number of ambiguous bases after correction ranging from <0.01% to 1.5% depending on which correction pipeline is used. We demonstrate that it requires ~2 hr of sequencing to gather all information needed for obtaining reliable barcodes for most specimens (>90%). We estimate that up to 1,000 barcodes can be generated in one flow cell and that the cost per barcode can be <USD 2.

Authors: Amrita Srivathsan, Bilgenur Baloğlu, Wendy Wang, Wei Xin Tan, Denis Bertrand, Amanda Hui Qi Ng, Esther Jia Hui Boey, Jayce Jia Yu Koh, Niranjan Nagarajan, Rudolf Meier