Genomes from metagenomes: assembling bacterial genomes with nanopore sequencing

Dylan opened her talk by exploring how the human gut microbiome impacts health and disease. The microbiome has an array of functions: it extracts energy from food, it regulates our metabolism, it protects us from disease-causing organisms, it regulates our immune system, and it produces essential vitamins. Nonetheless, despite its clear importance and extensive involvement in our physiology, our understanding of the microbiome is incomplete - "we know very few species present, let along what function they might be having".

Methods involving isolation and culture, and shotgun metagenomics, for species identification only provide part of the picture. This is because they have limited throughput, and are restricted to those organisms that can be cultured.

Dylan explored how microbial genomes can be de novo assembled directly from sequencing stool samples, providing a more holistic understanding of what bacteria comprise the gut microbiota. She discussed how a lot of effort has been made to increase our understanding of the microbiome by sequencing metagenomic samples from individuals with a diverse range of lifestyle, age etc. This has greatly contributed to our understanding of the human gut microbiome.

However, metagenome-assembled genomes (MAGs) obtained using short-read sequencing often miss mobile genetic elements, such as phages and transposons, because they tend to be repetitive. These elements can also be repeated multiple times throughout the genome, and they may transpose to other bacteria via horizontal transfer; therefore, many genomes are incomplete or contaminated with genomic sequences from other bacterial species. These mobile elements are important for carrying cargo or influencing gene expression, and they mediate diverse phenotypes "that are very important for our health", such as virulence and antibiotic resistance. Therefore, their analysis is fundamental to understanding the microbiome, but in order to resolve these elements, Dylan explained, we need to have complete genomes.

Dylan next discussed how traditional methods of DNA extraction in gut microbiome labs can yield very fragmented DNA. This is because bacterial cells are very hard to lyse, and there is lots of other material present in stool which needs to be removed using vigorous treatment of the sample. Therefore, her team employed a new method of DNA extraction to extract highly intact DNA from stool samples, for subsequent nanopore long-read sequencing.

Their method involved enzymatic digestion of the microbial cell wall (using a mixture of different lytic enzymes), phenol-chloroform extraction, proteinase K and RNase A digestion, gravity column purification, and SPRI bead size selection of high molecular weight DNA. Dylan said that with this method they could get DNA fragments up to 49 kb in length. They also developed a new analysis method for assembling and processing the long nanopore reads which incorporated assembly tools such as Canu and Flye.

For initial validation of their approach, they performed nanopore sequencing of a mock microbial community; this yielded many contiguous genomes. Displaying circular plots of these individual bacterial assemblies, Dylan pointed out that clearly the "nanopore assemblies were much more contiguous than the corresponding short read assemblies." In some instances, the genomes were assembled in single contigs. Next, Dylan described how nanopore sequencing of human stool also closed several bacterial genomes that were originally constructed from short-read sequencing. She stated that their method consistently outperformed other methods used for genome closure. In particular, insertion sequences which were typically challenging for short-read assembly were resolved with nanopore long reads. These insertion sequences were found to vary in both abundance across the genome, and over time. As an example, Dylan looked closer at the assembled genome of Prevotella copri - one of most abundant species in our gut. For this bacterium, the nanopore-based genome assembly was much more continuous compared to the original short-read assembly. Focusing on insertion sequences, there were over 100 of them within this genome, which were in fact 5 different forms repeated numerous times throughout the genome. This was a significant part of the problem in assembling the genome of this species.

In terms of the insertion sequences changing over time, Dylan described how they observed insertion sequences changed in a bacterial genome in one individual over 15 months of study. Why is this interesting? One of the insertion sequences detected was next to a gene involved in multidrug export, another insertion sequence was next to a gene involved in capsule biosynthesis (a mechanism for antibiotic resistance); this means that the insertion sequences might be, for example, impacting drug metabolism over time. As another example, insertion sequences next to genes related to the utilisation of complex sugars, such as Beta-galactosidase, SusC and SusE, were found change over time, and this provides us with an idea of what the bacteria are being exposed to in the gut at different time points.

As well as closing known bacterial genomes, nanopore sequencing data closed several novel bacterial genomes. Dylan described how a genome was assembled that seemed to be a species of Cibiobacter. These species are notoriously hard to culture. Nanopore sequencing confirmed that the organism was similar to Cibiobacter;its genome contained many insertion sequences and 5 phage regions. Dylan discussed how such phage genomes present within assembled bacterial genomes reveal adaptive mechanisms of a bacterium, such as nutrient metabolism, revealing what the organisms have been exposed to.

Dylan turned her attention to discussing the "dark matter" of microbial genomes - the uncharacterised parts of genomes. She discussed how nanopore sequencing has been used to illuminate the genomes of uncharacterised non-Western microbiomes, such as the genomes of novel Treponema species in South Africa. This region is transitioning into a developed state so is particularly interesting to investigate how microbiomes are evolving. Previous short-read sequencing was unable to classify a large proportion of the genomic reads for this species, in comparison to similar species isolated in the Western world. They decided to perform a pilot investigation and applied nanopore sequencing to these metagenomic samples to see if they could gain more insight. What was particularly interesting was that the genome of thisTreponema species was less repetitive, with less insertion sequences, and so Dylan questioned why was the corresponding short-read assembly so poor? Perhaps it was due to other sequences that could not be resolved? They have yet to determine the reason(s).

In the last part of her talk, Dylan asked why do some highly abundant organisms still evade genome circularisation? Stating her frustration at this, she said that "this is annoying for us, we got greedy and we really like circular genomes!" She said that they have observed an inverse correlation between the quality of a genome assembly and the relative abundance of organisms present within a sample. Sometimes extensive gene transfer occurs between closely-related species, and this makes it particularly challenging to assemble their genomes.

In summary, Dylan discussed how new methods for DNA extraction and metagenomic assembly with nanopore long reads have enabled completion of many organism genomes, both known and novel, as well as provided greater insight into mobile genetic elements.

Authors: Dylan Maghini