Long-read metagenomics reveals cryptic and abundant marine viruses

Marine bacteriophages impact global biogeochemical cycles via their influence on host community structure and function, yet our understanding of viral ecology is constrained by limitations in culturing of important hosts and the lack of a 'universal' gene to facilitate community surveys. While recent advances in short-read viral metagenomics offer solutions, they are confounded by microdiversity issues and fail to assemble over genomic islands (GIs). Single-virus genomics can overcome such issues by targeted sequencing of single virus particles, but is technically challenging and costly to implement. Another approach would be to leverage long-read sequencing technologies to bridge GIs and connect SNPs within populations, but current technologies require far too much input material (micrograms) and have high error rates. Here we sought to establish a generalizable, long-read, low-input metagenomic sequencing approach (VirION) to survey viruses in nature and applied it to nanogram concentrations of DNA from mock and natural viral communities. These experiments showed that our VirION method (i) is as relatively quantitative as short-read methods, (ii) captured many abundant and ubiquitous viral genomes that short reads did not, (iii) significantly increased median contig lengths and captured a complete 316 kbp viral genome - 100 kbp longer than the current longest genome from Global Ocean Viromes (GOV), (iv) overcame issues of microdiversity in viral assemblies and (v) captured more genomic islands than short-read assemblies, providing insight into the gene content of whole genomic islands from metagenomic data. Thus, long-read viral metagenomics adds another tool to the maturing viral survey toolkit, recovering ubiquitous and abundant viral taxa missed by short-read sequencing.

Authors: Joanna Warwick-Dugdale, Natalie Solonenko, Karen Moore, Lauren Chittick, Ann C Gregory, Michael J Allen, Matthew B Sullivan, Ben Temperton