Nanopore sequencing unveils unexpected variability in ribosomal RNA gene repeats


In the breakout session Resolving repeats, Emiliana Weiss presented her work on resolving rDNA arrays in the human genome using ultra-long nanopore reads.

Emiliana began by providing the background to her work. She described that ribosomal RNA genes (rRNAs) encode the cell’s ribosomal machinery; and to meet the cell’s high metabolic demands, ribosomal machinery is expressed in extremely large numbers. As such, rDNA is arranged in repeated arrays in five acrocentric human chromosomes. Emiliana then went on to explain the organisation of each rDNA gene. She highlighted the key features, such as the promoter, 18S, 5.8S, and 28S coding regions, and intergenic spacers between each adjacent rDNA units. However, she touched on the fact that rDNA repeats have so far resisted complete assembly – this is attributed to their inherently repetitive nature. Alluding to the drawbacks of short-read sequencing, Emiliana described the challenges of mapping the repetitive rDNA reads and tracing them back to a specific unit.

To overcome the limitations of using short-read technologies, Emiliana described the utility of ultra-long nanopore reads for characterising the rDNA repeat locus. In particular, she emphasized the importance of traversing rDNA tandem arrays in single reads. To this end, Emiliana noted that the main aim of her project was to develop a robust methodology for characterising the rDNA locus, which will, in turn, contribute to the study of rDNA variants and their impact on cell phenotype. Forming the foundations of her work was an ultra-long GridION dataset (~15x 100kb+ reads) from the Human Pangenome Reference Consortium.

Kicking off with the pipeline, Emiliana mapped back the FASTQ file of the ultra-long reads to the human reference genome using minimap2. Further downstream in her pipeline, she discussed the use of Megalodon and BLAST to investigate the methylation status and the sequence variation of the rDNA reads, respectively. By using this pipeline, Emiliana was able to identify 918 reads with length greater than 100 kb containing a vast number of rDNA candidates.

Emiliana then proceeded to present her findings in more granular detail.  By recording the number of bases between adjacent 28S coding regions, she found that rDNA units have highly conserved sizes, and surmised that the majority of rDNA units have full coding potential. She then went on to explain that around half of the rDNA gene copies are transcribed to mRNA, and so half are transcriptionally silent. This prompted further investigation, and in particular she wanted to look at how methylation plays a role in gene silencing.

To do so, the raw nanopore signal was analysed, and CpG methylation of the rDNA reads were predicted by the software Megalodon. She was able to distinguish between methylated and unmethylated CpG regions in the RNA reads, with around a 50:50 split. She explained that the unmethylated rDNA locus was methylated in intergenic spacers only and coding regions remained largely unmethylated – she postulated that these are transcriptionally active. Conversely, the hypermethylated rDNA genes were methylated in the promoter region and coding sequences – which she speculated was indicative of transcriptional repression.

She also discovered structural rearrangements that affected the orientation of rRNA genes. In particular, she found inverted reads, transcribed in the inverse direction, which were interestingly highly methylated. Emiliana also identified units with convergent and divergent orientation, although she noted that the methylation pattern was not modified, and the rDNA could therefore still be transcribed.

To further interrogate the sequence variation within the rDNA genes, Emiliana used an alternative long-read sequencing technology. In doing so, she identified 2,924 variants; of which 1,661 were located in transcribed regions and 1,263 in regulatory regions, respectively. Next, to assess the variability of rDNA regions, Emiliana used the MinION from Oxford Nanopore to perform RNA sequencing. Using the cytosolic fractions of the same cell line, they found that many of the rDNA variants were also expressed in the cytoplasm as rRNA. And on top of this, many of the variants present in low frequency in rDNA were also found in rRNA with high relative abundance.

Overall, nanopore sequencing was instrumental in Emiliana’s work, aiding the characterisation of rDNA repeat arrays in the human genome. Through her analysis of an ultra-long nanopore human genome dataset, Emiliana was able to call CpG methylation status of rDNA repeat units, and gain a unique insight into the mechanisms that underpin rDNA gene transcriptional activity. Finally, her work revealed an unexpected degree of variation across rDNA repeats with sequence substitutions that are expressed into rRNA molecules.

Authors: Emiliana Weiss