Linked machine learning classifiers improve species classification of fungi when using error-prone long-reads on extended metabarcodes

The increased usage of long-read sequencing for metabarcoding has not been matched with public databases suited for error-prone long-reads. We address this gap and present a proof-of-concept study for classifying fungal species using linked machine learning classifiers. We demonstrate its capability for accurate classification using labelled and unlabelled fungal sequencing datasets.

We show the advantage of our approach for closely related species over current alignment and k-mer methods and suggest a confidence threshold of 0.85 to maximise accurate target species identification from complex samples of unknown composition. We suggest future use of this approach in medicine, agriculture, and biosecurity.

Authors: Tavish G. Eenjes, Yiheng Hu, Laszlo Irinyi, Minh Thuy Vi Hoang, Leon M. Smith, Celeste C. Linde, Wieland Meyer, Eric A. Stone, John P. Rathjen, Benjamin Mashford, Benjamin Schwessinger