isoCirc catalogs full-length circular RNA isoforms in human transcriptomes


Yi Xing began his talk by describing how human genes can generate distinct isoforms through alternative splicing of precursor mRNA (pre-mRNA), occurring usually in a linear fashion beginning at the 5’ end. However, in some cases, through the process of ‘backsplicing’,  splicing occurs in a non-linear fashion in which a downstream 5’ splice site is ligated back to an upstream 3’ splice site, generating a unique RNA species, called ‘circular RNAs’ (circRNAs).

In the last decade, short-read sequencing has been the primary approach for the discovery of circRNAs, with a general strategy of depleting the ribosomal and linear RNAs from the total RNA pool, sequencing, and mapping the reads back to the reference genome or transcriptome to identify and quantify back-spliced junctions representing circRNAs.

A number of publications over the past couple of years have documented that circRNAs are abundant within mammalian transcriptomes and play clear roles in regulation of transcription, translation of proteins, and regulation of microRNA or protein binding targets. Therefore, identifying the full-length sequences of circRNAs is important for understanding their function, for example to the study the protein products translated from circRNAs, or identify those circRNAs that act as ‘molecular sponges’ for miRNA or RNA binding proteins. To achieve this, we need to know the circRNA backsplicing junction sequences as well as the linear sequence within the circRNA, which traditional short-read sequencing approaches are unable to do – as highlighted by a number of publications reconstructing full-length circRNA using short-read sequencing technologies.

Yi Xing explained that addressing this critical gap was the motivation in developing isoCirc, using nanopore sequencing to help define full-length circRNA isoforms. Yi Xing presented the general workflow. Starting with total RNA, ribosomal and linear RNA transcripts are first depleted, then a random linear primer is used to initiate a reverse transcriptase reaction against a circRNA template. The product is a ligated circular DNA, which is then amplified using rolling circle amplification and sequenced using nanopore sequencing. The resulting reads are representative of multiple concatemeric copies of a given circRNA template, which can be used to identify consensus sequences. These sequences are then mapped back to a reference genome using stringent criteria to identify high-confidence backsplicing junctions and full-length splice circRNA isoforms.

As a proof of concept Yi Xing applied isoCirc to multiple biological replicates of the HEK293 cell line, giving mapping of a read to the KDM1A gene as an example. Yi Xing explained that isoCirc identified the different isoforms for the KDM1A gene and the four most abundant transcripts were highlighted. When comparing the different isoforms, Yi Xing demonstrated how some of the circRNA isoforms used distinct backsplice junctions whereas others used the same backsplice junction but displayed an alternative splicing event involving exon 2 within the circRNA. This dataset was used to comprehensively evaluate the reproducibility of isoCirc for isoform detection and quantification of circRNAs.

Next isoCirc was applied to a broad panel of 12 different human tissues; within each tissue isoCirc identified tens of thousands of circRNA isoforms, and in tissues such as the testis and brain there was a much larger number of isoforms detected, which is consistent with prior knowledge of circRNA sequence complexity in these tissues. Aggregation of the tissue circRNA isoforms resulted in the overall detection of 107,147 isoforms, 40,628 of >500 nucleotides, and 8,601 of >1,000 nucleotides in length. He stated that the isoCirc data could also be used to study the alternative splicing events within circRNA: focusing on the linear part of the circRNA isoforms, isoCirc identified over 5,000 alternative splicing events corresponding to all types of alternative splicing patterns within the dataset. When compared to previous short-read reconstructions, Yi Xing identified a disproportionately high number of retained intron events. Exon-intron circRNAs have been studied previously and have been shown to interact with chromatin to regulate transcription, however Yi Xing explained that, in these previous studies, it was difficult to comprehensively identify the exons and introns of circRNAs using short-read sequencing, because of the large size of human introns. In his dataset, isoCirc identified 804 exon-intron circRNAs. He gave the example of the PRPSAP1 gene: in tissues such as the brain the predominant circRNA contained the retained intron, whereas in other tissues such as blood, the intron was spliced out, indicating tissue-specific regulation. Yi Xing highlighted that isoCirc could also be used to identify other alternative splicing events such as exon skipping or a cryptic exon.

Yi Xing concluded that short-read RNA sequencing cannot experimentally determine the full-length sequence of circRNAs; isoCirc is a strategy for sequencing full-length isoforms, using rolling circle amplification followed by nanopore sequencing. This technique catalogued 107,147 full-length circRNA isoforms across 12 tissues and the HEK293 cell line. Finally, isoCirc identified widespread alternative splicing events within the circRNAs.

Authors: Yi Xing