Automated strain separation in low-complexity metagenomes using long reads
- Home
- Automated strain separation in low-complexity metagenomes using long reads
Opening the Metagenomics assembly breakout, Riccardo Vicedomini from the Institut Pasteur described a method that they have recently developed to assemble individual strains in low-complexity metagenomes using long-read technology. Riccardo outlined how conspecific strains (strains of the same species) can have different functions in a microbiome, and that the precise characterisation of strains can have many biotechnological applications.
According to Riccardo, ‘thanks to long-read sequencing, standard metagenome assembly methods are usually able to assemble complete bacterial genomes, yet they are usually limited to the species level’. Current strain-level metagenomic assemblies rely either on references or short-read data. To combat this, the team at Institut Pasteur developed the Strainberry pipeline, which utilises state-of-the art tools (for variant calling, haplotype phasing, and genome assembly) to assemble individual strain genomes from long-read data alone.
The pipeline starts with the generation of a ‘strain-oblivious’ metagenome assembly, for which metaFlye is commonly used. The next step is to identify SNVs, which are then phased into haplotypes (individual strains). These groups of haplotype/strain-specific reads are then assembled independently using standard genome assembly tools, and the assemblies are then processed to remove low-quality regions. Finally, scaffolding is performed to link together strain-resolved contigs.
Riccardo moved on to discuss some of the results obtained from both mock community and ‘real-world’ metagenomic samples. For the mock community sample, which comprised 9 bacterial strains including two Staphylococcus aureus stains, initial metagenomic assembly using metaFlye and Canu was unable to completely resolve the two S. aureus strains. However, use of the Strainberry pipeline allowed the genomes of both S. aureus strains to be fully reconstructed. In addition, Strainberry achieved an average nucleotide identity in excess of 99.9%. According to Riccardo, metaFlye provided the superior input assembly as Canu was found to over-assemble the dataset, producing a large number of duplicated sequences.
Moving on to ‘real-world’ samples, the team utilised a dataset obtained from natural whey starter cultures (Somerville et al. 2019), a low-complexity microbiome with four dominant strains. Again, Strainberry was able completely recover all of the strains, while the initial input metaFlye assembly failed to resolve one of the dominant strains.
Closing his presentation, Riccardo presented results obtained from a much higher complexity microbiome sample, human stool, for which the long-read nanopore sequencing data was generated by Moss et al. (2020). It was evident that after utilising Strainberry, the assembly size for many strains increased significantly, confirming the utility of the pipeline.
Riccardo pointed out that the Strainberry pipeline is freely available on GitHub (github.com/rvicedomini/strainberry) and that they are currently preparing a manuscript of this research.