From flu to you: sequencing at any scale


Dan Turner, VP Applications, introduced a central focus of the Oxford Nanopore Apps team over the past 18 months: using nanopore sequencing technology to help mitigate the spread of and understand the genetics of the SARS-CoV-2 virus. This has included the development of Oxford Nanopore-supported versions of well-known virus sequencing protocols, such as the SISPA protocol for metagenomic analysis of respiratory samples, and the ARTIC multiplex PCR method for SARS-CoV-2 genome sequencing. The latter of which includes the most recent ‘Midnight’ version of the ARTIC protocol which uses longer amplicons, and Spike-seq for targeted, highly-multiplexed analysis of the spike gene to identify, for example, variants of concern.

Dan highlighted a collaboration with researchers at the Mount Sinai School of Medicine, with whom the team have been investigating host responses to COVID-19 vaccination. They have been applying adaptive sampling for real-time, targeted sequencing of the host genome and haplotype-resolved assembly of the IGH locus, paired with single-cell transcriptome analysis of B-cells (see the Applications poster for more information).

Beyond the pandemic

Going ‘beyond the COVID pandemic’, Dan described how they have been looking to ‘get a head start’ on what the media have warned could be a severe flu season this winter. His team have been developing whole-genome sequencing protocols able to precisely identify influenza type, sub-type, and genomic variants. Such information could assist genomic epidemiology efforts and vaccine selection.

Dan explained how they have focused on developing novel library prep methods here, wanting to combine the advantages of both rapid and ligation-based preparation to produce a method that is fast, has good control over fragment length, and is easy to automate. To this end, Dan introduced us to the vaccinia virus, a double stranded DNA virus, from which they are exploiting its topoisomerase enzyme that is used to cut and unwind DNA and remove supercoiling. By preloading the topo enzyme with sequencing adapters, the enzyme can then be used to rapidly add adaptors to amplicons, with no fragmentation, and requiring few steps. This approach could be applied not just to influenza genome sequencing, but also as a general library preparation method for whole-genome sequencing, as it is simple, quick, gives good control over fragment length, and should eliminate any chimeric reads. Dan explored how topo library prep could be ideal for sequencing ultra-long reads, potentially making a 4.4 Mb read seem ‘lacklustre’ in the not-too-distant future.

Haplotype-resolved assembly

Dan next introduced developments that have been made in constructing haplotype-resolved assemblies, explaining how most assemblers produce collapsed haploid assemblies from diploid data, where heterozygous sites are represented by a randomly selected allele. ‘In the ideal world’ we want haplotype-resolved assemblies, with chromosome-length scaffolds consisting of single haplotypes, which would allow us to understand cis interactions and how variants are influenced by their genomic context. ‘Using long reads are a very good place to start’ here; Dan showed graphs demonstrating how longer reads give rise to longer phase blocks, with a 30X ultra-long dataset producing a phase block length of 12.61 Mb for chromosome 21.

What about phasing the whole genome? Ultra-long reads help, but they only get us ‘some of the way’. We could use other sources of haplotype information, such as parent or population genome data, but such data may not be available. Another approach is to use orthologous methods, such as chromatin conformation capture methods that uncover the three-dimensional organisation of the genome, e.g. Pore-C. As there is no need to amplify the DNA in Pore-C sample preparation, longer reads are more easily produced, revealing multi-way contacts, whilst also retaining base modifications for analysis as and when desired. Individual long Pore-C reads contain different segments from the same homologous chromosome, so this information can be used to help resolve haplotypes. Dan displayed graphs demonstrating increased haploid human genome contiguity with the addition of Pore-C data: from 45.3 Mb N50 to 152.2 N50. ‘The longest scaffold that we’ve made is…over 230 megabases…which corresponds to… the entire chromosome 2’. When haplotypes are scaffolded separately, the addition of Pore-C data similarly increased contiguity of each haplotype. Dan suggested that, as far as they are aware, ‘this is the most contiguous haplotype-resolved human assembly to date’. He emphasised that all the information presented here comes from just two PromethION Flow Cells, one to produce the ultra-long reads, and one for the Pore-C data. This represents a ‘great leap forward’ when it comes to population-scale generation of haplotype-resolved assemblies.

Authors: Dan Turner