porefile: automatic profiling of microbial communities using full-length 16S rRNA gene sequencing data


The 16S rRNA gene is a widely used taxonomic marker that has been compiled in quality-controlled databases such as the SILVA database. High-throughput sequencing of the full or near-full length 16S rRNA gene using third generation sequencing approaches have proved to increase species level resolution from complex microbial communities. Here we present porefile, a workflow that gathers different tools for read pre-processing and taxonomic profiling based on the 16S rRNA gene sequencing data generated with third generation sequencing platforms, such as Oxford Nanopore Technologies (ONT). Porefile sub-workflows are managed using the Nextflow system and uses a mapping strategy against the latest SILVA database and the lower common ancestor (LCA) algorithm implemented in MEGAN6 to classify reads at the major taxonomic ranks. After the species-level polishing step, porefile recovers the composition of synthetic microbial communities generated with simulated ONT 16S rRNA gene sequencing data.

Download the PDF