Retrotransposon variation in human genome and tumorigenesis
- Home
- Retrotransposon variation in human genome and tumorigenesis
Retrotransposons are genetic sequences which transpose throughout the genome via an RNA intermediate, using reverse transcription (like "copy and paste"). Almost 50% of the human genome is derived from transposable sequences, although Kimmo said that this is "a fishy estimate" as many are mutated and so difficult to identify. Only a fraction of these are retrotransposons. Long Interspersed Nuclear Element-1s (LINE1s) are capable of retrotransposition; these are autonomous retrotransposons, encoding the proteins necessary for their insertion. LINE1 element activity is associated mostly with tumours and neurones.
Using short-read sequencing to map and genotype retrotransposons is difficult due to their length and high copy number in the reference genome. Moreover, transposable sequences are easy to mistake for translocations in short sequencing reads. Kimmo described how long nanopore reads enable accurate mapping as well as the analysis of their internal structure, including their methylation profile.
Kimmo applied long-read nanopore sequencing to detect LINE1 retrotransposition in colorectal cancer tumour samples. To identify the insertion sites, a targeted analysis was performed using "old-fashioned" inverse-PCR, followed by MinION sequencing of full-length fragments. Kimmo suggested that you could probably do this now with Cas9 enrichment and nanopore sequencing; this would additionally provide methylation data as PCR would not be required. High sensitivity was achieved due to a very high depth of coverage across the target region, and twenty five novel, highly subclonal insertions were identified using this methodology.
PromethION whole-genome sequencing of colorectal tumours and Uterine Leuomyomas has also been performed, and is continuing to be used, to investigate the variability in retrotransposon insertions within humans, including during tumourigenesis. Phasing of heterozygous Alu variants has been performed; Kimmo stated that with short-read sequencing data such phasing is less clear. He described his sequencing and analysis workflow, which involved read alignment to GRCh38 with minimap2, structural variation (SV) calling with sniffles, and alignment of SV sequences to the repeatmasker database using mappy. Kimmo suggested that at least 3 reads containing an SV should be present to confirm its detection.