25th January 2018 - BioRxiv
DNA barcodes are useful for species discovery and species identification, but obtaining barcodes currently requires a well-equipped molecular laboratory, is time-consuming, and/or expensive. We here address these issues by developing a barcoding pipeline for Oxford Nanopore MinION and demonstrate that one flowcell can generate barcodes for ~500 specimens despite high base-call error rates of MinION. The pipeline overcomes the errors by first summarizing all reads for the same tagged amplicon as a consensus barcode. These barcodes are overall mismatch-free but retain indel errors that are concentrated in homopolymeric regions. We thus complement the barcode caller with an optional error correction pipeline that uses conserved amino-acid motifs from publicly available barcodes to correct the indel errors. The effectiveness of this pipeline is documented by analysing reads from three MinION runs that represent three different stages of MinION development. They generated data for (1) 511 specimens of a mixed Diptera sample, (2) 575 specimens of ants, and (3) 50 specimens of Chironomidae. The run based on the latest chemistry yielded MinION barcodes for 490 specimens which were assessed against reference Sanger barcodes (N=471). Overall, the MinION barcodes have an accuracy of 99.3%-100% and the number of ambiguities ranges from <0.01-1.5% depending on which correction pipeline is used. We demonstrate that it requires only 2 hours of sequencing to gather all information that is needed for obtaining reliable barcodes for most specimens (>90%). We estimate that up to 1000 barcodes can be generated in one flowcell and that the cost of a MinION barcode can be <USD 2.