Powered by long nanopore reads, liver transcriptome analysis reveals new clues about cancer

Just as short-read RNA-seq technology revealed gene expression data that was not possible with previous methods such as microarrays, long nanopore reads are now unlocking deeper insights into transcriptomics that were previously out of reach.

The assembly process required to stitch short reads back into full-length transcripts leads to data that does not always accurately capture the native biology, precluding investigation of the potential significance of alternative splicing and ratios of specific isoforms. Scientists looking for a more reliable representation of full-length transcripts for cDNA analysis are now turning to nanopore sequencing technology for its capacity to produce long reads capable of spanning entire isoforms, without the need for assembly. This is allowing researchers to reveal novel disease mechanisms that would be missed by traditional techniques, providing new insights into biology and health.

This approach was used in a liver cancer research study that offers a compelling view of how long nanopore reads can overcome the limitations associated with conventional RNA-seq analysis. Scientists in Japan used a MinION sequencing device from Oxford Nanopore Technologies to analyse 84 liver research samples: half from cases of hepatocellular carcinoma and a matched set from livers without cancer1. The liver cancer research samples included cancers driven by hepatitis B and hepatitis C, and all samples had previously been analysed with short-read RNA-seq workflows.

‘Previous studies strongly suggest that splicing variants have important roles that are not recognized without an analysis of full-length transcripts’1

The team embarked on this project to assess the value of long nanopore reads for transcriptomic analysis. ‘The majority of previous transcriptome studies have used microarrays or short-read sequencing technologies, and therefore lacked information on full-length transcripts that may assist in the detection of splicing variants expressed from each gene’, Kiyose et al. reported in a paper published in PLoS Genetics.

The team synthesised cDNA and created libraries for nanopore sequencing on a MinION Flow Cell, with a single run for each sample; data analysis involved the use of SPLICE, an analysis pipeline developed by the scientists for long nanopore reads.

Novel findings

A comparison of gene expression levels identified in the nanopore data and prior short-read data showed a strong correlation, indicating that nanopore sequencing successfully captures the biological signals reported by standard RNA-seq workflows. Going beyond this strong concordance with short-read technology, however, the nanopore sequencing data revealed a wealth of transcriptomic information that had previously been missed by traditional technology.

The analysis identified a significant number of novel transcripts. Importantly, nearly 62% of reads mapped to protein-coding genes spanned the entire coding sequence to support full-length isoform analysis. From a total non-redundant set of more than 60,000 transcripts in the liver cancer research samples and more than 51,000 transcripts in the controls, thousands of transcripts were found to be novel — including 5,366 protein-coding transcripts. The researchers noted that average isoform expression levels were lower for the novel transcripts, which may help explain why they were not previously detected with traditional RNA-seq studies. The team also identified hundreds of novel exons in transcripts from protein-coding genes; nearly all of them were the first or last exon in the transcript.

Next, the scientists focused on differentially expressed genes, which have proven valuable in prior studies for detecting key cancer-associated genes and pathways. In a comparison of expression levels between liver cancer samples and control samples, they discovered that the type of gene-level analysis commonly used in RNA-seq workflows did not return complete results; instead, an isoform-level analysis was more comprehensive. The authors noted that ‘a comparison of expression levels identified 9,933 differentially expressed transcripts (DETs) in 4,744 genes. Interestingly, 746 genes with DETs, including the LINE1-MET transcript, were not found by a gene-level analysis’. The LINE1-MET transcript, highly upregulated in cancer samples, had not been well characterised before. Researchers performed functional experiments and found that overexpression was associated with increased cell proliferation, suggesting a link to cancer growth. This finding contributed to the scientists’ determination that ‘the analysis of transcripts benefits the detection of novel driver genes’.

Additionally, the study explored cancer-specific fusion transcripts. Six genes were found to be fused with a variety of partner genes, leading to 164 fusion transcripts linked to cancer. Many of these were detected at low expression levels, which likely explains why so few had been detected in prior analyses of these samples. The team noted that ‘a comparison between the [short and long reads] revealed that 41 out of 50 fusions (82.0%) were not detected in the short-reads and only 9 fusions (8.0%) were commonly detected’.

The authors concluded that their results ‘strongly suggest that the direct observation of transcripts with [long-read] sequencing contributes to understanding the true picture of transcript aberration in cancer’.

1. Kiyose, H. et al. Comprehensive analysis of full-length transcripts reveals novel splicing abnormalities and oncogenic transcripts in liver cancer. PLoS Genet. 18(8): e1010342 (2022). DOI: https://doi.org/10.1371/journal.pgen.1010342