Transposable elements in human dopaminergic neurons: the impact of Parkinson’s mutations on the retrotransposome

Natalia began her talk by discussing the rationale for studying transposable elements (TEs) in the nervous system. She noted that TEs have been shown to be elevated in the nervous system and display an even higher abundance in ageing nervous tissue. Natalia said that overexpression of TEs observed in many neurological disorders is a benign hallmark of the condition rather than being the driving mechanism. However, she pointed out that in ALS, TEs have been established as the cause underlying pathogenesis.  Natalia explained that TE transcripts can have negative impacts on cells, for example, by binding to transcripts of genes with antisense sequences and forming dsRNA species and subsequently triggering RNA silencing systems.

Natalia explained that characterising the expression of TEs is not a trivial task, stating that there are many copies of TEs that are highly similar to one another, compounding this is the fact they can be found all across the genome. Natalia expressed the difficulties of using short reads to characterise expression of TEs, saying that ‘it is really hard to discern whether a signal is coming from gene transcripts or transposable element transcripts, and which transposable element transcripts are expressed’. Natalia noted the solution to the problem is to use long-read sequencing data which will enable sequencing of whole transcripts.

Natalia outlined the aims of her project, which were to: characterize active TEs and their basal expression in human iPSC-derived dopaminergic neurons, and detect candidate TE expression patterns connected to Parkinson’s disease. To do so, Natalia used a data set from the FOUNDIN PD initiative, which she mentioned the vast majority of cell lines were iPSCs derived from healthy control patients or those afflicted with Parkinson’s disease. Using 10 cell lines, both short-read and long-read data were generated and the number of TEs detected was then compared between the two technologies.

To generate long-read data, PCR-cDNA sequencing was performed on the PromethION. For transcript isoform annotation and quantification Natalia used FLAIR. Using this pipeline, she identified 9,640 transcripts coming from 9,638 TE loci, so that two single loci had two transcripts, with each shorter and longer isoforms expressed across multiple samples. Compositionally most of the elements were ALUs so she was curious to compare and see if they were enriched in their data set or if is a consequence of ALUs dominating the repeatome of the human genome in general.  Natalia confirmed that ALUs were indeed enriched in her dataset, in addition SVAs are enriched too.

Natalia went on to talk about TE expression patterns, stating that expression at individual loci is highly variable between individuals and cell lines. She applied clustering to the expression of TEs; however, no major clustering patterns were detected, although a small subset of TEs were elevated in LRRK2 positive samples. Natalia stated that of the 9,640 transcripts, only around 1,000 were expressed across all the samples. She also mentioned that, of these commonly expressed TEs, many are enriched in L1P and L1H elements. Natalia stated that the enrichment in L1H was particularly interesting, as it provides the machinery for transposition of ALUs and SVs. Natalia discussed her interest in determining the relationship between these commonly expressed TEs and the expression of those genes detected in their data set.  Natalie explained that many of these commonly expressed genes were intronic to a subset of genes, whilst others are antisense to these genes. Picking out these genes, Natalia performed GO enrichment analysis – with the antisense set returning terms related to neuron development.

Natalia moved on to discuss group specific TEs, which are not very abundant. Natalia unsurprisingly pointed out that they were most common in the LRRK2 + positive samples. She then proceeded to compare the transposable element quantification obtained from the long-read and short-read datasets. Interestingly, less than half of the TE transcripts detected by nanopore cDNA sequencing were detected by short-read sequencing. Natalia also pointed out that, because of the pitfalls of short-read sequencing, there is potential for overestimation of expressed loci at the expense of TEs exapted by genes.

Authors: Natalia Savytska