Long-fragment, strand-specific PCR–cDNA libraries and bioinformatics tools for transcriptome analysis

Poster

Date: 22nd May 2019

Strand-specific, full-length cDNA reads are ideal for isoform reconstruction, differential gene expression analysis, identification of fusion transcript breakpoints and characterisation of lncRNAs.

Fig. 1 Isoform reconstruction using stranded reads a) IGV plot b) read length c) performance

New PCR–cDNA protocol gives strand-specific sequences

We generated a transcriptome-wide dataset for Drosophila melanogaster, mapped reads using minimap2 and visualised the results using IGV. The transcript strand is preserved (Fig. 1a). We observe a good fraction of the reference transcripts covered by individual reads up to ~5 kb (Fig. 1b). The presence of the reverse transcription and strand-switching primer sequences in the cDNA sequences allows us to select these full-length reads from the dataset (52.2% of total reads). We used the pinfish analysis pipeline (http://bit.ly/ont_pinfish) to reconstruct isoforms, compared the reconstructed isoform set to the Ensembl gene annotation using gffcompare and saw excellent base and exon-level precision, along with good transcript-level precision (Fig. 1c).

Fig. 2 Differential gene expression a) MA plot b) largest differences c) SELO transcript usage

Measuring response to radiation exposure by differential gene expression analysis

Blood was drawn from three healthy volunteers and half of each sample was irradiated with a dose of 2 Gy. A minimum of 40 million aligned reads per sample was analysed using a pipeline (http://bit.ly/ont_trs_de) based on Love et al. (2018, DOI: 10.12688/ f1000research.15398.2). Briefly, reads were mapped to the transcriptome using minimap2 and per-transcript read counts were estimated with salmon. Differential gene expression was detected using the quasilikelihood method provided by the edgeR Bioconductor package. Fig. 2a shows an MA plot of results. Fig. 2b shows the 15 top hits, purple indicating published radiation-response genes. Differential transcript usage was detected using the DEXSeq Bioconductor package (Fig. 2c).

Fig. 3 Targeted detection a) chr22 translocations b) detection protocol c) fusion breakpoint

Sequencing semi-specific RT-PCR products enables characterisation of fusion genes

The q12 region of human chromosome 22 can be involved in several different translocation events (Fig. 3a). In each of these, a fusion gene is formed by addition of chromosomal material from one of the fusion partners onto exon 7/8 of the EWSR1 gene on derived chromosome 22. These fusion genes lead to different types of cancer. We reverse-transcribed total RNA from a patient with a known translocation affecting the EWSR1 gene, using a poly-TVN primer, and we amplified semi-specifically (Fig. 3b). The wild-type EWSR1 amplicon and fusion amplicons were 2.2 and 3.1 kb respectively. Sequencing of the amplicons, and alignment to the EWSR1 reference, allowed us to pinpoint the position of the breakpoint and to identify the fusion partner (Fig. 3c), which we found to have arisen from a common Ewing’s sarcoma translocation.

Download the PDF

Fig. 4 Characterising lncRNAs a) laboratory workflow b) detected lncRNA isoforms

Investigating a poorly characterised long non-coding RNA with long cDNA reads

Long non-coding RNAs (lncRNAs) are RNA molecules that are > 200 nucleotides long and which do not encode proteins. lncRNAs regulate gene expression, and there are thought to be ~30,000 different lncRNA transcripts in humans, making them the major constituent of the non-coding transcriptome. We sought to characterise a family of lncRNA isoforms which are suspected to have a cancer-killing phenotype upon knockdown. However, only a short section of the sequence was known. We used rounds of semi-specific PCR to identify the 5’ and 3’ ends of the lncRNA, followed by further rounds of semi-specific PCR to establish the correct 5’ and 3’ pairing (Fig. 4a). We identified many different complete isoforms of this lncRNA, including several which were previously unknown. Fig. 4b shows the most abundant isoforms.

Recommended for you