LIQA: Long-read Isoform Quantification and Analysis

Long-read RNA sequencing (RNA-seq) technologies have made it possible to sequence fulllength transcripts, facilitating the exploration of isoform-specific gene expression over conventional short-read RNA-seq. However, long-read RNA-seq suffers from high per-base error rate, presence of chimeric reads and alternative alignments, and other biases, which require different analysis methods than short-read RNA-seq.

Here we present LIQA (Long-read Isoform Quantification and Analysis), an Expectation-Maximization based statistical method to quantify isoform expression and detect differential alternative splicing (DAS) events using longread RNA-seq data. Rather than summarizing isoform-specific read counts directly as done in short-read methods, LIQA incorporates base-pair quality score and isoform-specific read length information to assign different weights across reads, which reflects alignment confidence.

Moreover, given isoform usage estimates, LIQA can detect DAS events between conditions. We evaluated LIQA’s performance on simulated data and demonstrated that it outperforms other approaches in rare isoform characterization and in detecting DAS events between two groups. We also generated one direct mRNA sequencing dataset and one cDNA sequencing dataset using the Oxford Nanopore long-read platform, both with paired short-read RNA-seq data and qPCR data on selected genes, and we demonstrated that LIQA performs well in isoform discovery and quantification.

Finally, we evaluated LIQA on a PacBio dataset on esophageal squamous epithelial cells, and demonstrated that LIQA recovered DAS events on FGFR3 that failed to be detected in short-read data. In summary, LIQA leverages the power of long-read RNA-seq and achieves higher accuracy in estimating isoform abundance than existing approaches, especially for isoforms with low coverage and biased read distribution.

Authors: Yu Hu , Li Fang , Xuelian Chen , Jiang F. Zhong , Mingyao Li , Kai Wang