Native RNA sequencing of human polyadenylated transcripts

According to Dr. Angela Brooks of the University of California, Santa Cruz: ‘Short-read sequencing has revolutionised our understanding of the transcriptome […] but there is a huge limitation’ 1. These limitations include the loss of positional information and base modifications through the requirement to fragment and amplify the RNA molecules respectively. In addition, amplification also leads to bias which can negatively impact results. Angela and her colleagues at University of California, Santa Cruz are part of the Nanopore RNA Consortium, which comprises laboratories from six leading universities. The aim for the Consortium is to generate a reference dataset for the human transcriptome that has been sequenced in its native form, sharing methods and data with the scientific community.

The Nanopore RNA Consortium provides methods and data to the scientific community.

The Consortium sequenced mRNA from the GM12878 cell line using both native nanopore RNA sequencing and cDNA sequencing, generating ~13 million and ~24 million reads respectively28.

Initial analysis showed the median native RNA read length to be longer than the median cDNA read length, which may be due to PCR bias in the cDNA preparation.

Good correlation of gene expression levels was observed between the two techniques and also with data obtained from short-read sequencing of the same cell line – confirming the validity of the nanopore data set. The reproducibility of the native RNA sequencing technique was also demonstrated through the delivery of highly concordant data across all consortium laboratories.

The team are now using orthogonal data to build a high-confidence set of full-length isoforms. The longest isoform that was detected by Angela and the Consortium was for Sorl1, a >10 kb read which spanned 48 exons and has been implicated in Alzheimer’s disease.

The Consortium also demonstrated the potential of long-read RNA sequencing to detect allele-specific expression. Examining the coverage data for a number of nucleotide positions across the Xist gene, which is located on the X chromosome, allowed the identification of paternal expression bias (Figure 1).

Another area of interest for the consortium is the detection of poly-A tail length, which has been shown to play a role in post-transcriptional regulation. As the nanopore sequencing adapter sits at the 3’ end of the poly-A tail, it is possible to use specific signals in the raw data, such as dwell times, to estimate the poly-A tail length.

human fig 6.PNGFigure 1: Nanopore sequencing allows identification of allele-specific expression. Figure courtesy of the Nanopore RNA Consortium.

The accuracy of this technique was confirmed using spike-in controls with known tail lengths.

Direct RNA sequencing also allows the analysis of base modifications which are lost when using alternative sequencing approaches. By synthesising and sequencing RNA transcripts containing only a specific, known modification and comparing these with sequences from molecules without this modification, the team were able to show clear shifts in the nanopore signal. The team are now using these model training datasets to enhance basecalling algorithms, allowing the detection of both the position and type of modification in native RNA molecules.

The Nanopore RNA Consortium data is available at: github.com/nanopore-wgsconsortium/NA12878

This case study is taken from the human white paper.

1. Brooks, A. Native RNA sequencing of polyadenylated transcripts. Available at: https:// nanoporetech.com/resource-centre/native-rnasequencing-human-polyadenylated-transcripts [Accessed: 1 August 2018]