Into the unknown: the epigenetics of repetitive DNA
Ariel Gershman (Johns Hopkins University, USA) began her plenary talk with an introduction to DNA CpG methylation. She explained how CpG methylation is typically associated with the silencing of transcription via promoter hypermethylation, whilst hypomethylation of the same region plus hypermethylation of the gene body is associated with activation of transcription. Furthermore, methylation is associated with the repression of transposable elements (TEs): hypermethylation of TEs are thought to repress their activity. However, Ariel highlighted that CpG methylation is only one component of the epigenome – ‘the multitude of chemical compounds that tell the genome what to do’. This also includes nucleosome occupancy and chromatin accessibility, histone modifications, chromatin looping, and protein binding events, all making up the higher order structure of nuclear organisation of DNA. The epigenome influences gene expression and other DNA elements.
Using nanopore to reveal genome-wide methylation patterns
Ariel and her colleagues use PCR-free nanopore sequencing to detect CpG methylation. She described how cytosine and 5-methylcytosine bases produce different current distributions as they pass through nanopores during native DNA sequencing, which can be analysed to determine methylation state. They detect these differences using the tool Nanopolish, producing results which are both highly accurate and highly concordant with bisulphite sequencing. Ariel described how there are many advantages to nanopore methylation calling, compared with bisulphite sequencing. First, the sequencing of native DNA molecules decreases PCR and GC bias seen in bisulphite data. The longer nanopore reads also increase mapping certainty for repetitive and GC rich regions. Comparing the methods, Ariel showed how data is lost for CpG islands, LINE elements, and satellites using bisulphite sequencing due to this issue. Finally, she noted that long reads provide insights into long-range epigenetic patterns, spanning multiple genomic elements. However, there are gaps present in the human reference genome (GRCh38) in which the methylome has never been assessed, imposing limitations on whole-genome methylation analysis.
To address the gaps in the human reference genome, Ariel and her colleagues have been working with the Telomere-to-Telomere Consortium, who successfully generated the first complete assembly of a human genome (CHM13) with only five remaining gaps – the most complete assembly ever produced. The new assembly features all centromeric regions, and fills in most satellites and gaps that were previously present. Using this, Ariel was able to assess genome-wide methylation patterns. Displaying nanopore sequencing data for a TE, she illustrated how TEs were shown to be largely CG-dense and hypermethylated. Gene promoters, meanwhile, were seen to be largely CG-dense and hypomethylated; these findings were generally expected. The nanopore sequencing data also enabled the team to reveal the methylome of regions that had never been explored before – regions largely comprised of satellite repeats, present in the telomeres, centromeres, and some arrays across the chromosome arms. Ariel showed the rich repeat structure making up the centromeric region.
Exploring the unknown regions
Ariel then presented the results of going ‘into the unknown’: analysing centromeric higher order repeats (HORs), human satellite 2 (HSAT2), human satellite 1 (HSAT1), and microsatellite repeats.
HORs are 171 bp tandem repeats, providing binding sites for the centromere-associated histone variant CENPA. Analysis of the HOR array revealed distinctive methylation patterns: most regions were consistently hypermethylated, with a distinct hypomethylated region present in every centromere across the human genome. Ariel noted that these represented ‘epigenetic events that have never been previously probed’. As CENPA marks the site of kinetochore attachment, Ariel wanted to further understand the role of methylation in kinetochore attachment and chromosome segregation. Her team found that the highest binding of CENPA overlapped the regions of hypomethylation.
Secondly Ariel described their findings regarding HSAT2 and HSAT1 methylation patterns. HSAT2 regions comprise 1.5% of the genome and occupy the largest gaps in the GRCh38 reference genome assembly – meaning that they have not been interrogated before. Ariel explained how their findings were unexpected – the regions were mostly associated with hypomethylation in the chromosomes investigated, which is unusual for highly repetitive regions. However, they did have a periodicity of methylation state throughout the array. HSAT1 regions were also highly hypomethylated; in contrast to HSAT2 regions, HSAT1 arrays were CpG poor and predominantly comprised of AT-rich repeats, and without a clear periodicity of methylation state.
Ariel lastly discussed microsatellite repeats, focusing on the DXZ4 array on the X chromosome. This is the anchoring site for the ‘superlooping’ domains of the X chromosome, during the process of X chromosome inactivation. ‘Very intense’ periodicity could be seen in methylation state throughout the entire array. Such methylation patterns enable differential binding of the CTCF protein – binding at hypomethylated regions enables the looping seen for X chromosome inactivation and formation of the barr body. As CHM13 cells have both inactive and active X chromosomes, they could determine allele-specific methylation patterns in the DXZ4 array, to a level of detail that had not been observed before.
Ariel concluded that long nanopore reads allow probing of epigenetic regulation in large repetitive arrays, and here revealed how such repeats have a higher order pattern of methylation. She stated that ‘we are just scratching the surface about unveiling epigenetic control of these regions’. In future, they plan to phase the modifications using diploid genome assemblies, and profile other epigenetic regulatory events using other methods, such as exogeneous DNA labelling, and nanopore sequencing.