Semi-quantitative characterisation of mixed pollen samples using MinION sequencing and Reverse Metagenomics (RevMet)

Peel and colleagues describe their RevMet (Reverse Metagenomics) pipeline that enables reliable and semi-quantitative characterisation of mixed eukaryote samples, such as mixed pollen samples. This pipeline has the potential to help us understand plant-pollinator interactions, which in turn could inform strategies for mitigating the drivers of pollinator decline.

The ability to identify and quantify the constituent plant species that make up a mixed-species sample of pollen has important applications in ecology, conservation, and agriculture. Recently, metabarcoding protocols have been developed for pollen that can identify constituent plant species, but there are strong reasons to doubt that metabarcoding can accurately quantify their relative abundances. A PCR-free, shotgun metagenomics approach has greater potential for accurately quantifying species relative abundances, but applying metagenomics to eukaryotes is challenging due to low numbers of reference genomes. We have developed a pipeline, RevMet (Reverse Metagenomics), that allows reliable and semi-quantitative characterization of the species composition of mixed-species eukaryote samples, such as bee-collected pollen, without requiring reference genomes. Instead, reference species are represented only by 'genome skims': low-cost, low-coverage, shortread sequence datasets. The skims are mapped to individual long reads sequenced from mixed-species samples using the MinION, a portable nanopore sequencing device, and each long read is uniquely assigned to a plant species. We genome-skimmed 49 wild UK plant species, validated our pipeline with mock DNA mixtures of known composition, and then applied RevMet to pollen loads collected from wild bees. We demonstrate that RevMet can identify plant species present in mixed-species samples at proportions of DNA >1%, with few false positives and false negatives, and reliably differentiate species represented by high versus low amounts of DNA in a sample. The RevMet pipeline could readily be adapted to generate semi-quantitative datasets for a wide range of mixed eukaryote samples, which could include characterising diets, quantifying allergenic pollen from air samples, quantifying soil fauna, and identifying the compositions of algal and diatom communities. Our per-sample costs were GBP 90 per genome skim and GBP 60 per pollen sample, and new versions of sequencers available now will further reduce these costs.

Authors: Ned Peel, Lynn Dicks, Matthew D. Clark, Darren Heavens, Lawrence Percival-Alwyn, Chris Cooper, Richard G. Davies, Richard M. Leggett, Douglas W. Yu