Transposable element expression at unique loci in single cells with CELLO-seq

Rebecca (Cambridge University, CRUK-CI) stated that ‘the role of Transposable Elements (TEs) in regulating diverse biological processes, from early development to cancer, is becoming increasingly appreciated’; the challenge with studying them is that cDNA reads from short-read sequencing cannot be unambiguously mapped to specific genomic loci.

As a result, Rebecca’s team have developed CELLO-seq, a plate-based, single-cell, long-read RNA sequencing method. Rebecca described the end-to-end workflow, including both the laboratory-based steps and their analytical pipeline (Sarlacc pipeline).

To validate the performance of CELLO-seq, 96 human induced pluripotent stem cells (iPSCs) and 6 mouse 2-cell blastomere cells were isolated and each were sequenced on a single MinION Flow Cell. Short-read data for each library were also produced alongside as a comparison. Rebecca explained that, from ~100 iPSCs sequenced on a single flow cell ‘we can evaluate 1,000 genes with around 10,000 reads’;~10 million reads were obtained per flow cell. If 6 cells were loaded, they could therefore identify 5,000 genes with ~200,000 reads each. Around 75% of reads were demultiplexed from their ~10 million reads. Rebecca further stated that with nanopore sequencing: ‘we were able to get full-length transcripts’.

Regarding allele specific expression, Rebecca showed how ‘the long-read data has less bias than the short-read data’. Moreover, both known and novel isoforms could be identified, as well as those isoforms derived from transposable elements (TEs).

Transposable element (TE) expression at unique loci

Rebecca explained that ‘we wanted to use CELLO-seq to look at transposable element expression at unique loci’, and therefore next focused on her analysis of LINE transposable element expression in the mouse and human genome, showing how nanopore reads spanned the full length of these elements. They also performed allelic TE expression. Overall, they analysed 10,000 TEs per cell in mouse 2-cell blastomeres, and 100 TEs per human iPSC.

They wanted to know more about the mappability of ‘young’ TEs (i.e. those with very high sequence identity, having arisen more recently in evolutionary history) at unique loci. Young elements from each of three TE classes (LINE, LTR, and SINE) could be seen; however, the youngest TEs could not be detected. They investigated whether this was due to the method itself, or the data produced, using transposable element simulations. They found that the problem lay with PCR duplicates not being grouped together and consequently not being mapped to the genome. Perfect grouping of the reads before mapping helped to resolve the issue, greatly increasing mappability (by 20% for mouse reads) and read identity.

Rebecca described how they then performed a UMI simulation to work out how they could perform perfect grouping of the reads. This involved simulation of 10,000 UMIs, with evaluation of UMI length, UMI coverage, pre-grouping vs. no pre-grouping of reads, and different Levensthein distance thresholds of the UMIs. From their simulation analyses Rebecca wanted to advocate that ‘to improve mapping of the youngest transposable elements, 50nt UMIs are necessary’.

Authors: Rebecca Berrens