The long and the short of it: unlocking nanopore long-read RNA sequencing data with short-read tools

Application of Oxford Nanopore Technologies’ long-read sequencing platform to transcriptomic analysis is increasing in popularity. However, such analysis can be challenging due to small library sizes and high sequence error, which decreases quantification accuracy and reduces power for statistical testing.

Here, we report the analysis of two nanopore sequencing RNA-seq datasets with the goal of obtaining gene-level and isoform-level differential expression information.

A dataset of synthetic, spliced, spike-in RNAs (“sequins”) as well as a mouse neural stem cell dataset from samples with a null mutation of the epigenetic regulator Smchd1 were analysed using a mix of long-read specific tools for preprocessing together with established short-read RNA-seq methods. We used limma-voom to perform differential gene expression analysis, and the novel FLAMES pipeline to perform isoform identification and quantification, followed by DRIMSeq and limma-diffSplice (with stageR) to perform differential transcript usage analysis. We compared results from the sequins dataset to the ground truth, and results of the mouse dataset to a previous short-read study on equivalent samples.

Overall, our work shows that transcriptomic analysis of long-read nanopore data using short-read software and methods that are already in wide use can yield meaningful results.

Authors: Xueyi Dong, Luyi Tian, Quentin Gouil, Hasaru Kariyawasam, Shian Su, Ricardo De Paoli-Iseppi, Yair David Joseph Prawer, Michael B. Clark, Kelsey Breslin, Megan Iminitoff, Marnie E. Blewitt, Charity W. Law, Matthew E. Ritchie