Discovering and exploiting multiple types of DNA methylation from individual bacteria and microbiome using nanopore sequencing


Alan Tourancheau began his talk by introducing the prokaryote methylome, which varies from the more well-characterised human methylome in that it is highly motif-driven and features three types of methylation: 4mC, 5mC and 6mA. In a 2016 survey of the epigenome landscape in prokaryotes, 93% of organisms tested were shown to have methylating motifs, highlighting the need for detection techniques.

These methylating motifs, Alan explained, have clear functions, being implicated in DNA repair mechanisms, gene regulation pathways and the restriction-modification system. Alan noted that many tools exist currently for methylation detection, and these can largely be classified into either tools that compare to a reference, or those that compare to the raw signal data. The former group is more accurate but depends on knowledge of the motifs that should be present, whereas the latter group are the only option where you are lacking information, but have a reduced detection accuracy. For these reasons, Alan and team decided to develop a new method to implement in their software nanodisco in order to identify both the modification type and the position of the modified base. From their initial investigations it became clear that methylation signature is variable dependent on the context, specific motif, and DNA modification type, meaning determination of both of these pieces of information should be possible.

In order to test this theory, the team gathered a large dataset of 7 different bacteria with 46 distinct methylation motifs present across the 3 different modification types. They sequenced those samples on MinION, generating on average 117X coverage. From the resulting data, the signatures from all 46 motifs were aggregated and represented on a t-SNE plot. Each point on the plot represented a current difference for one motif at one site, demonstrating clearly that methylation signature is complex and motif-specific. Alan went on to explain that if this graph is then coloured by modification type (4mC, 5mC and 6mA), very clear clusters emerge, indicating further that it should be possible to train a classifier to identify type.

If we zoom in on a specific motif, it can be seen that there are multiple sites in the sequence at which the current differs from the expected level, on either side of the modified base. This also implies that it should be possible to train a classifier to look for these sections, however, Alan noted, if the modification position is unknown it becomes impossible to examine a set number of sites up- and downstream of this point, so the classifier must be trained to use multiple offsets in order to help with location determination.

Using the sequenced training set, Alan accumulated all this data, and used it to examine performance against validation data. By using the leave-one-out cross validation technique, Alan demonstrated that nanodisco was highly reliable for typing and fine-mapping, and 98% of the motifs from the sequenced strains were accurately typed and fine-mapped, with only one suboptimal result. When adding in data from two additional bacteria, every motif present in these was also typed and fine-mapped.

The software is available at github.com/fanglab for installation and trial by the Nanopore Community.

In the final section of his talk, Alan moved on to discuss mixed metagenomic samples. In nature, he highlighted, bacterial samples are very rarely isolated, and the microbiome is becoming more and more relevant for both human health and environmental surveillance. The microbiome though, is incredibly complex, and metagenomic assemblies can be highly fragmented.

A key part of the metagenomic analysis process is the binning of reads into their appropriate genome bin, and Alan explained that methylation profile can be beneficial here to aid the binning process. Previously, Alan’s team had published a method for methylation-based metagenomic binning using an alternative long-read sequencing platform. The principle of this method is that contigs from the same genome will have the same modified motifs, and so by identifying which motifs are modified in a contig you can group them together. This method has the additional benefit of being able to associate mobile genetic elements such as plasmids back to their host genome.

Now, Alan explained, they have developed and extended this method for nanopore sequencing, which is able to determine more distinct types of modification. To do this, they class signatures into what they call “methylation features”, which are then used for binning. Features are quite consistent for a given genome, and so should work well on real samples, which led Alan and team to trial this method on the mouse gut microbiome.

After computing methylation features for over 200,000 motifs and plotting them on a t-SNE, it can be seen that contigs bin very well together from the same genome, and mobile elements can be binned too, showing how well the approach can perform in real, complex settings. A more refined approach using multiple rounds of binning resulted in identification of 13 bins with over 80 motifs discovered – 45 of which were 5mC, and so would have been missed with the alternative long-read sequencing approach.

To conclude, Alan reiterated that more information is available, on both GitHub and in their pre-print, on using nanodisco to characterise diversity of methylation signals and motifs, including position and type of modification, as well as accurate metagenomic binning.

Authors: Alan Tourancheau