Epstein-Barr virus long non-coding RNA RPMS1 full-length spliceome in transformed epithelial tissue

Epstein-Barr virus is associated with two types of epithelial neoplasms, nasopharyngeal carcinoma and gastric adenocarcinoma. The viral long non-coding RNA RPMS1 is the most abundantly expressed poly-adenylated viral RNA in these malignant tissues. The RPMS1 gene is known to contain two cassette exons, exon Ia and Ib, and several alternative splicing variants have been described in low-throughput studies. To characterize the entire RPMS1 spliceome we combined long-read sequencing data from the nasopharyngeal cell line C666-1 and a primary gastric adenocarcinoma, with complementary short-read sequencing datasets.

We developed FLAME, a Python-based bioinformatics package that can generate complete high resolution characterization of RNA splicing at full-length. Using FLAME, we identified 32 novel exons in the RPMS1 gene, primarily within the large constitutive exons III, V and VII. Two of the novel exons contained retention of the intron between exon III and exon IV, and a novel cassette exon was identified between VI and exon VII. All previously described transcript variants of RPMS1 containing putative ORFs were identified at various levels. Similarly, native transcripts with the potential to form previously reported circular RNA elements were detected.

Our work illuminates the multifaceted nature of viral transcriptional repertoires. FLAME provides a comprehensive overview of the relative abundance of alternative splice variants and allows for a wealth of previously unknown splicing events to be unveiled.

Authors: Isak Holmqvist, Alan Bäckerholm, Guojiang Xie, Yarong Tian, Kaisa Thorell, Ka-Wei Tang