Long Read Annotation (LoReAn): automated eukaryotic genome annotation based on long-read cDNA sequencing

Single-molecule full-length cDNA sequencing can aid genome annotation by revealing transcript structure and alternative splice-forms, yet current annotation pipelines do not incorporate such information. Here we present LoReAn (Long Read Annotation) software, an automated annotation pipeline utilizing short- and long-read cDNA sequencing, protein evidence, and ab initio prediction to generate accurate genome annotations. Based on annotations of two fungal and two plant genomes, we show that LoReAn outperforms popular annotation pipelines by integrating single-molecule cDNA sequencing data generated from either the PacBio or MinION sequencing platforms, and correctly predicting gene structure and capturing genes missed by other annotation pipelines.

Authors: David Cook, Jose Espejo Valle-Inclan, Alije Pajoro, Hanna Rovenich, Bart Thomma, Luigi Faino