Long-read sequencing technologies are invaluable for determining complex RNA transcript architectures but are error-prone. Numerous hybrid correction algorithms have been developed for genomic data that correct long-reads by exploiting the accuracy and depth of short reads sequenced from the same sample. These algorithms are not suited for correcting more complex transcriptome sequencing data.
We have created a novel algorithm called TALC (Transcription-Aware Long Read Correction) which models changes in RNA expression and isoform representation in a weighted De-Bruijn graph to correct long reads from transcriptome studies.
We tested TALC on a dataset of short and long reads generated for this study. TALC correction results in more accurate reads with less structural errors than existing methods.
TALC is implemented in C++ and available at https://gitlab.igh.cnrs.fr/lbroseus/TALC.