Discover nanopore sequencing

What can it do? How does it work? Our platform performance and accuracy

Explore products

Prepare Sequence Analyse
Store Resources Support About

Using nanopore reads to recover the complete genomes of ocean viruses without need for assembly


Date: 3rd December 2020

Over 1,800 high-quality full-length virus genome sequences obtained in single reads from a single MinION run using an assembly-free bioinformatics analysis pipeline

Download the PDF

Fig. 1 End-to-end workflow for obtaining complete viral genome sequences without assembly

Direct recovery of complete viral genome sequences from environmental samples

The inherent genetic complexity of virus populations poses technical difficulties for recovering complete virus genomes from the environment. To address these challenges, we developed an assembly-free, single-molecule nanopore sequencing approach enabling direct recovery of complete viral genome sequences from environmental samples. Water was collected at three different depths from the Pacific Ocean near Hawaii and viral particles were enriched from each sample (Fig. 1a). A library was prepared from each sample and sequenced, generating tens of thousands of sequences (Fig. 1b) all containing the hallmark of complete dsDNA tailed bacteriophage: direct terminal repeats (DTRs). We designed a custom assembly-free bioinformatic pipeline to cluster and polish these reads to produce novel viral genomes (Fig. 1c).

Fig. 2 Binning by k-mer frequencies using UMAP

Reference-free read binning resolves micro- diversity in phage genomes

The dimensionality-reduction tool UMAP was used to create a two-dimensional embedding of 5-mer count features for each read in the 250 m seawater sample and read bins were called (Fig. 2a). Bin 75 is representative of many other bins in that the read-length distribution revealed enrichment for reads of a specific length, suggesting that these reads fully span a virus genome (Fig. 2b). The genome-scale reads within Bin 75 were further clustered by pairwise alignment scores to reveal strain-level differences in virus reads (Fig. 2c). Polished draft genomes from each alignment cluster share large regions of high sequence identity (>98%), although several regions are significantly diverged (Fig. 2d).

Fig. 3 Comparing polished genomes with available references

Environmental phage genome validation and functional annotation

The genomes produced by the bioinformatic pipeline were assessed for quality by comparing the lengths of annotated coding sequences (CDS) with gene annotations of available marine phage reference genomes. Clusters with more reads available for polishing resulted in higher quality genomes, and additional gains can be made by short-read polishing from a matching sample (Fig. 3a). Furthermore, lambda phage DNA was spiked into one of the samples in order to provide another validation of genome quality, resulting in a reconstituted lambda genome at 99.92% accuracy from just 22 reads. Fig. 3b shows the proportion of polished genomes obtained from each depth with PFAM annotations (bit score > 30) to common virus and prophage marker genes.

Fig. 4 Phage-induced chromosomal islands (PICI) observed directly from nanopore reads

Long reads capture concatameric sequences of unique mobile elements

Several thousand concatameric sequences (Fig. 4a) were also observed in these samples with a similar length distribution to the complete phage genome reads (Fig. 4b). Gene annotations suggested a role as putative mobile elements. The concatemer repeat copy number (4-7 complete copies) and the length of each repeat are calibrated so that the overall product length remained under 40 kbp (Fig. 4c), matching the size of the most commonly observed phage genomes. These facts are consistent with phage-induced chromosomal islands (PICIs), which are mobile elements that can hijack the phage machinery to mediate horizontal gene transfer (Fig. 4d). PICIs have never before been observed in an environmental sample.

Recommended for you