Metagenomics of South African gut microbiomes reveal a transitional composition and novel taxa
- Home
- Metagenomics of South African gut microbiomes reveal a transitional composition and novel taxa
Dylan Maghini began by introducing her team’s ongoing collaboration to study the gut metagenomes of South African individuals. Like most STEM subjects, Dylan explained, microbiome research is concentrated in areas of the most extreme lifestyle and resource access, i.e. the global west, where individuals are part of industrialised communities and have unrestricted access to antibiotics, fatty foods, and other similar hallmarks of resource-rich countries.
Showing data for the quantity of metagenomic studies across the globe, Dylan highlighted that the US, China, and parts of Europe are highly represented in the distribution of research, but there are very limited numbers of studies in the opposite extremes of society. In particular, hardly any studies have been conducted in Africa, and such small numbers cannot reasonably be used to reflect a continent of over 1.2 billion people.
This lack of research, Dylan stressed, becomes a significant problem when we acknowledge the direct relationship between microbiome and human health. As communities undergo industrialisation, they see a decrease in the risk of infectious disease, and an increase in the risk of non-communicable disease such as heart disease or obesity. We know that the microbiome has a strong link to conditions like these, as well as being related to situations such as response and non-response to vaccines.
Taken together, the lack of characterisation of the gut microbiome across Africa really compromises the ability to assess relevance and applicability of therapeutics across a large proportion of the planet.
This context, Dylan explained, was the motivation behind the team’s collaboration with a team in Johannesburg in South Africa, with the goal of studying the adult microbiomes of the population. The study, Dylan cited, was made possible by the H3Africa consortium – a continent-wide group involved in a range or projects studying genetic and environmental risk factors influencing disease – and in particular was part of the wider AWI-gen project.
To begin the project, the team sought involvement from two South African communities, one in urban Soweto and another in rural Bushbuckridge. Soweto is a much denser community, with a significantly higher proportion of households connected to piped water and with access to flush toilets. A cohort of 190 adult women was assembled, a stool sample collected from each, and these were assessed with both 16S and metagenomic sequencing. The 16S work was conducted by Dylan’s collaborator in South Africa, Ovokeraye Oduaran, and QR code linking to this work can be found in the presentation recording.
Focussing in on the metagenomic analysis then, Dylan explained that the microbiomes from rural Bushbuckridge had a significantly higher alpha-diversity than their counterparts in Soweto. Dylan outlined that this was to be expected, as prior studies had shown a decrease in alpha-diversity with and increase in industrialisation. In addition, the sequencing identified a number of genera that were significantly enriched in one community over the other, some of which were also concordant with previous work showing an increase or decrease in prevalence with industrialisation.
So, Dylan explained, the next step was to contextualise these findings with communities at more extreme ends of the lifestyle spectrum. In order to do this, they took publicly available data from communities in the US and Sweden as well as some from a rural community in Madagascar and hunter-gatherer groups from Tanzania.
Comparing these datasets compositionally revealed that the South African metagenomes were placed between those of the more industrial and rural communities, but didn’t cluster along a simple continuum, instead occupying their own unique position in multi-dimensional space. However, a straightforward classification of taxa may not be the complete picture, and so Dylan moved on to explain how they sought to obtain a measure of how much unclassified diversity might be present in the South African cohorts when compared to the other datasets.
Quantifying the proportion of reads classified found that the more rural the community, the lower the percentage of reads that classify to the existing reference databases. This, Dylan indicated, implies that the taxa in these communities are not available and represented in the databases, which posed the question of what was not present. To determine this, the team sought to perform de novo genome assembly. For the highest chance of generating complete genomes, they needed high-quality DNA to generate long reads for metagenomic assembly.
Previous work by Eli Moss in the team had established a process for this, beginning with a high-molecular weight DNA extraction before sequencing with nanopore, basecalling with Guppy and assembling with Flye before final polishing and circularisation steps. By applying this process to the South African samples, the team were able to circularise many genomes, including the first circular genomes for many taxa with very few or no public references.
In particular, the team was very excited to generate a complete genome of Treponema, a hallmark organism for non-industrialised communities, as it is extremely difficult to isolate and culture, meaning this genome assembly represents the first of its kind. In addition, many genomes could be assembled that either had highly repetitive genome structure or very low GC content, which were hugely underrepresented in matched short-read sequencing data. In fact, when comparing short reads to nanopore reads on these metagenomic samples, Dylan showed that nanopore data contained a much higher proportion of reads with low GC content, representing a fraction of data that would be completely missed if the study had been performed with short reads only.
Having these complete assemblies, Dylan described, gives unprecedented insight into genome structure and function as part of these communities. In particular, nanopore sequencing also placed mobile genetic elements correctly into genomic context, identifying and locating many more phages, recombinases, transposases, and antibiotic resistance genes than the short-read counterpart assemblies – highlighting the increased importance of long-read sequencing to move beyond classification to function.