TALC: Transcription-Aware Long Read Correction

Long-read sequencing technologies are invaluable for determining complex RNA transcript architectures but are error-prone. Numerous hybrid correction algorithms have been developed for genomic data that correct long-reads by exploiting the accuracy and depth of short reads sequenced from the same sample. These algorithms are not suited for correcting more complex transcriptome sequencing data.

We have created a novel algorithm called TALC (Transcription-Aware Long Read Correction) which models changes in RNA expression and isoform representation in a weighted De-Bruijn graph to correct long reads from transcriptome studies.

We tested TALC on a dataset of short and long reads generated for this study. TALC correction results in more accurate reads with less structural errors than existing methods.

TALC is implemented in C++ and available at https://gitlab.igh.cnrs.fr/lbroseus/TALC.

Authors: Lucile Broseus, Aubin Thomas, Andrew Oldfield, Dany Sevrac, Emeric Dubois, William J Ritchie