Nanopore Community Meeting 2018 - Day 1 writeup
Wed 28th November 2018
After the conclusion of the action from day one of the Nanopore Community Meeting 2018 in San Francisco, here's a detailed roundup of the talks!
For more information on the talk by Oxford Nanopore CTO, Clive Brown, please see this Twitter moment summary.
Plenary: Ami Bhatt, Stanford University
Ami Bhatt, Assistant Professor of Medicine (Hematology) and Genetics at Stanford University kicked off proceedings with a highly entertaining presentation describing her team’s latest research utilising nanopore sequencing to assemble complete genomes from complex microbial communities. In addition to understanding the microbial pathogenicity, Ami is particularly interested in how microorganisms may impact the outcomes of patients with cancer or cardiac disease. Ami explained that the human body contains approximately 10 trillion microbes – as many as there are cells in the human body. The sheer number, variety and complexity of microbes, combined with the high level of host DNA makes accurate genomic analysis of human microbiomes extremely challenging. Ami explained how this situation limits our understanding of gene content and regulation. Traditional techniques such as 16S rRNA gene sequencing and shotgun sequencing provide pieces of the puzzle but large gaps still remain.
To support their aim of generating complete genome assemblies from human metagenomic samples, the team at Stanford University evaluated a number of approaches, including a novel DNA partitioning approach using the Chromium platform combined with a bespoke assembly platform called Athena. In practice, the genomes of some microbial species assembled well with high contig N50 values; however, others, such as Prevotella spp (a dominant constituent of the gut microbiome of the Hadza community in Tanzania) assembled poorly. At 4800x coverage, Ami stated that sequencing depth was not the issue and instead suggested that the problem lay in the high level of genomic repeats found in these species.
Ami described how Eli Moss, a graduate student in her lab, whom she credits for the majority of this work came to her with the idea to use long-read nanopore sequencing to provide more complete genome assemblies. However, citing the challenge of obtaining high-molecular weight DNA from stool samples, Ami rejected his proposal. Undeterred, Eli spent 6 months testing and optimising various sample preparation methodologies – which Ami humorously described as “not a deeply pleasant task”. His perseverance paid off though as his technique, which comprises enzymatic cell wall degradation, phenol:chloroform extraction, gravity column purification and bead-based size selection, delivered both yield and molecular weight suitable for the generation of long sequencing reads.
Following library preparation using the Rapid Sequencing Kit, the DNA was sequenced using a MinION. A number of assembly tools were evaluated, with optimal results provided by Flye and Canu. Both of these tools allowed the assembly of the highly repetitive Prevotella copri genome in a single contig of 3.8 Mb – a result that Ami described as ‘breathtaking’. Not only was this achieved from metagenomic sequencing of stool samples – a significant challenge in itself – but it also represents the first full-length P. copri genome. This genome significantly improves on previous short-read based assembles of the organism.
Using the CheckM tool, the genic completeness was considered low; however, after refining their computational workflow and implementing sequence polishing using pilon a high level of completeness was obtained. The full computational workflow is freely available online for use by other researchers. In summarising her presentation, Ami commented that their method: ‘represents an effective, straightforward solution for the complete and efficient de novo characterization of structurally complex bacterial genomes within the gut microbiome’. The team now plan to assemble further microbial genomes from the metagenomic data and also analyse the data to explore methylation profiles (which is provided as standard in the raw nanopore sequencing data) to predict plasmid-chromosome pairing.
Plenary: Eoghan Harrington, Oxford Nanopore Technologies
Eoghan Harrington, a Senior Applications Bioinformatician, took to the stage to give an update on the work being carried out by the Applications department at Oxford Nanopore Technologies. He started by explaining the role of both the U.K. and U.S. contingents, describing the former as the sample technology specialists and the latter as the customer-like users that showcase the technology by using it to answer biological research questions. Giving an overview of the theme of his presentation, Eoghan stated that the majority of his talk would revolve around applications that were once difficult but have now become possible through the rapid platform improvements seen over the last few years (video: http://bit.ly/2DGSENr). Furthermore, Eoghan stressed that the majority of the technology showcased in his talk is currently available to customers.
Eoghan used the example of assembling large vertebrate genomes with long nanopore reads as an example of a procedure that was, until recently, considered a significant challenge but now, with the increase in throughput and single molecule accuracy obtained by Oxford Nanopore sequencing, has become “almost routine”. Eoghan described the overarching goals of large genome sequencing as; obtaining resolved haplotypes and telomere-to-telomere contigs with high accuracy in order to generate accurate gene models. Using nanopore long reads allows scientists to get closer to these goals by providing solutions to a number of key challenges. These being: large genome size, repeat regions, heterozygosity and sample heterogeneity. As an example of sample heterogeneity, Eoghan described some “unfinished business” that was spoken about by Clive Brown in his London Calling 2018 talk, that being the extensive terminal blocking caused by DNA libraries made using chicken blood as a source of DNA. The solution to this problem was a platform fix involving a nuclease wash that would “refresh” the blocked pores. Eoghan mentioned that “depending upon who you talk to, this solution could be viewed as either a scalpel or sledgehammer”.
Showing how effective this was on a PromethION, Eoghan described how multiple nuclease washes took the overall sequence yield from a respectable 50 Gb to just under 100 Gb from a single flow cell. Eoghan then went on to discuss how this data was then used to create a genome assembly totalling 960 contigs, with a contig N50 of 18.5 Mb in under a week. He noted that there have been recent improvements in algorithms that have significantly reduced this computation time from days to hours.
Putting this into context, Eoghan showed that previous attempts to generate assemblies of this genome began in 2004 and only came close to having this level of completeness once long reads were used across multiple sequencing runs by other long and short-read technologies.
To demonstrate how these advances in large genome assembly can be easily transferred to Oxford Nanopore users, Eoghan described the new protocol builder (poster: http://bit.ly/2FHi4ND, protocol builder: http://bit.ly/2QfVHmx) which guides users through bespoke protocols and suggests data analysis routes one might use to answer the proposed research question. He then went on to discuss nanopore chromatin conformation capture, named "Pore-C" (Poster: http://bit.ly/2QgcgyE). PoreC is a way of obtaining information regarding the spatial localisation of specific regions of DNA and also as a way to generate long range information that helps provide better scaffold contiguity when assembling large genomes. Moving on, Eoghan described new updates to the cDNA and direct RNA kits that allow for accurate detection of strand specific isoforms (Poster: http://bit.ly/2R80J1s) in combination with a number of Nanopore-made bioinformatic tools that have been written to aid users in doing this (pinfish software link: http://bit.ly/2RdSQYq). Generating full length reads spanning whole transcripts provides further information to help generate accurate gene models especially in non-model organism. Finishing the first section of his talk, Eoghan touched on the new sequence reader head developed by Oxford Nanopore, R10, and its ability to generate higher accuracy consensus assemblies by dealing with homopolymeric regions. This will be expanded upon in Clive’s talk later.
The next section revolved around the “holy grail” of microbial genomics, that being assembling full genomes from metagenomic samples. One of the common aims of metagenomics is to understand the interplay between different microbial taxa in a given ecological niche. In order to achieve this, genomes must be complete as possible so that metabolic pathways involved in this interplay can be accurately inferred. Along with the long-read nature of nanopore sequencing data, drastic improvements in throughput over the last few years have allowed numerous researchers to attempt to assemble full genomes from these complex systems. Eoghan explained that there were 4 main ways to attempt to partition out sequences belonging to different taxa in a mixed sample in order to aid in assembly. These are methods include: reference-based approaches, where sequences are aligned to known genomes; binning by sequence composition; exploiting changes in abundance patterns; and using sequence length variability (poster: http://bit.ly/2E0vKBA). Eoghan touched on a project with Ed DeLong, whereby three sea water samples were depleted of non-phage sequences by a reference-based approach prior to sequence clustering based upon sequence composition. Read length variability was then used to help identify reads spanning full length phage genomes (poster: http://bit.ly/2P4Q7ih). This showed how read lengths could be used as a taxonomic marker in themselves.
While whole phage genomes may be captured in single reads, a suggestion from Mads Albertsen from Aalborg University showed that using different DNA extraction methods on the same sample can cause differential lysis of organisms (poster: http://bit.ly/2E0vKBA). The differences in abundance of specific sequences between the different extraction methods can be exploited to infer bins of taxonomic significance. This expands upon a previously suggested method that utilises changes in abundance across a time series but can be performed on a single sample negating the requirement for complex time series experimental designs. Finishing this section, Eoghan describes something “hot off the pores”. A new version of PoreC, called metaPoreC, was described where the PoreC method was applied to model system of two bacterial species. Results showed that individual genomes, and their associated plasmids, could be differentiated and resolved suggesting that this type of interaction measurement could further aid in metagenomic assembly.
The final section of the talk involved the titular “magic bullet”. This being a reference to the film JFK where a single bullet takes a convoluted path resulting in much more destruction than one would think possible. Here it was used as a metaphor for a complex structural variation involving a duplication-triplicate inversion–duplication with loss of homozygosity. Detecting long, complex structural variations is very difficult and can only really be resolved using long read technology. The mechanism behind this structural variation involves fork stalling and template switching with microhomology-mediated break-induced replication (poster: http://bit.ly/2RdnVeR). The whole structural variant spans approximately 2 Mb and thus is a prime example of a complex structural variant that can only be resolved with long reads. A PromethION was used to generate 16 x coverage of the 2 Mb region of which was enough to resolve this rearrangement. Eoghan explained, step-by-step, how each section of the duplication-inversion triplication-duplication event could be distinguished using the sequencing data. Across the triplicated section of the structural variant, the copy number could be seen to increase from 1 to 4, representing counts from both the triplicated allele and the wild type allele. Interrogating individual reads, those spanning the duplicated sections were shown to align to the reference in a strand specific manner highlighting the predicted breakpoints and thus providing further evidence that the hypothesised
mechanism was correct. Finally, Eoghan showed how loss of heterozygosity could be detected from that point in the chromosome onwards due to a switch to the homologous chromosome and he explained how analysis of absence of heterozygosity could reveal recessive mutations through loss of a dominant allele. On top of this chromosome 14 contains imprinted loci that overlap with the region with AOH, so the parent of origin for this region is important for the phenotype. Future steps Eoghan described could be to investigate the maethylation patterns present in the same set of nanopore data to confirm the proposed mechanism of variation.
To see all of the apps posters demonstrating these platform improvements and how they have helped answer specific research questions see this link http://bit.ly/2AoieUb.
Breakout: Targeted sequencing
Martin Elferink from the University Medical Center in Utrecht spoke about using CRISPR-Cas9 to enrich for genomic repeat structures in a number of diseases. Martin described that there are over 40 neurological and neuromuscular diseases caused by repeat expansions. These repeat expansions can range in size from approximately 2 – 60 bp and can propagate through replication errors resulting in repeat structures up to kilobases in length. In addition, the number of repeats is often indicative of disease severity and thus it is difficult to resolve using traditional or short read sequencing solutions.
The use of long reads which span the entire repeat expansion “have the potential to solve this problem” and could be used to accurately count and categorise the types and number of expansion events. Martin said that his overall aims were to use nanopore long read sequencing to generate reads spanning entire loci containing repeat expansions and ideally determine epigenetic modifications.
In order to do this Martin explained a targeted enrichment approach using CRISPR-Cas9 where all native DNA ends are dephosphorylated preventing ligation of sequencing adapters. Next, cleavage of the target sites is performed using an RNA guide and Cas9 protein. This cleavage leaves phosphorylated ends to which adapters can be preferentially ligated. The advantage of this approach is that multiple target sites can be excised in one step providing a multiplex solution to targeted sequencing. Due to this, cost prohibitive WGS sequencing is not required and more information can be obtained in a single workflow than just targeting a single locus. Furthermore, this protocol does not use PCR and thus it is the actual native DNA, with preserved modifications, that will go through to sequencing.
Looking at 10 specific loci, Martin undertook a project in collaboration with Oxford Nanopore Technologies. The main aims in this collaboration were to determine if all 10 loci could be accurately distinguished, along with the number of repeat expansion events in each one. Using DNA from a healthy control sample, Wigard Kloosterman, both short and long read sequencing was performed to 30x and 70x respectively and the number of repeat copies were determined for 6 of the 10 loci. Using the CRISPR-Cas9/Cas12a approach the Cas9 approach yielded a median of just under 600x coverage of the loci under study while the Cas12a yielded a median coverage of approximately 200x each on a single flow cell on a GridION. Both protocols gave over 100x coverage of the target sequences. Examining the read coverage and alignment plots, Martin stated that cleavage sites were very specific and there was no evidence of allelic dropout across the multiplex panel. Expanding upon this he stated that for 7 of the loci all alleles were detected, as were the relevant SNPs, and for the remaining three, although detected, no informative variants were seen.
Next Martin moved on to attempting to detect and count the number of repeat expansion events across six patients with known pathogenic repeat expansions in different alleles. Except for one patient sample with degraded DNA, all samples processed using the Cas9 protocol had over 100x coverage of the target regions. Using the remaining patients as controls, each disease associated allele showed that the correct diagnosis was determined for each patient and the repeat expansion count broadly agreed with traditional diagnostic assays.
Thomas Couvreur from the Institut de Recherche pour le Developpement, opened his talk by discussing how global plant biodiversity is huge and many of the taxa present in the tropics are threatened by global climate change, for example deforestation. Other areas that Thomas and his colleagues are interested in are the evolutionary history of important crop species and the conservation and dynamics of tropical ecosystems. Due to this, Thomas stated that the need for rapid species identification is paramount to detected and analyse genomic data from novel, non-model organisms.
A number of challenges surround this aim, as plants typically have large complex genomes with large repeat regions and multiple copies of chromosomes. There is a distinct lack of reference genomes available and all these facts together make assembling these genomes a significant challenge.
In an attempt to overcome this, and provide rapid species identification, Thomas spoke of a method they had developed to target specific regions of the plants genetic material using short oligonucleotide probes hybridised to magnetic beads. This was performed, “in solution” in order to target chloroplast DNA, generally regarded as the best genetic barcode for plants. The reasoning behind using this approach was that it could increase multiplexing ability and read depth while reducing the analysis complexity. The benefits of using chloroplast sequences as a species ID tool is that it is a moderately long target, approximately 150, 000 bp, circular, haploid (so there is no phasing) and provides a good level of phylogenetic resolution for species typing. The downside is that it is in relatively low abundance compared with nuclear DNA and there are difficulties in de novo assembly.
Proposing some research questions, Thomas aimed to determine; how long were the sequences that were captured; what are the size ranges of these captured sequences; what was the coverage of the target; and do long reads improve plastome assembly.
Thomas and his team used the model grass species Oryza sativa to develop their sequence capture approach before moving on to use this method in 6 non non-model grasses and palms. At this point Thomas pointed out that the DNA from the model organism was fresh, however it is common for plant DNA samples to be shipped in a silica dried form. Therefore, a number of comparisons between silica dried and fresh DNA would be made. Summarising the method, Thomas said probes specific to plastid DNA were generated using fragmented products of long-range PCR products originating from a single plant species. These generated probes could be used to hybridise to chloroplast DNA and, via attached magnetic particles, pull down and enrich for chloroplast DNA.
When discussing the results, Thomas started by saying “It works!” and qualifying this with the fact that 70% of reads were of plastome in origin (5 fold over control) and the longest read was 26 kb with a median length of all reads being 4.2 kb. When examining coverage using this protocol on the model organism Oryza sativa average coverage was 364 x with 100 % coverage at 10 x and 99.9996 % with a coverage of 50 x. In the non-model organisms an increase from 15 % to 98 % of sequences were of plastid origin when compared with controls and fold coverage increased from 12 – 156 X. Fragment length distributions in the non-model organisms ranged from a median read length of 3.4 kb to 4.6 kb using fresh DNA extracts but were shorter when silica gel preserved DNA was used. These non-model organisms had an average coverage of 500 x although some gaps were present but 96 % was covered to 10 x depth and 91 % to 50 x. As a result, the fresh DNA samples produced assembled plastids in two contigs for each fresh dna samples while Silica gel dried samples produced 10 – 17 contigs.
Concluding, Thomas summed up the benefits of the sequence capture protocol they developed as capturing long fragments from plats was possible and that although the generated assemblies were good, manual finishing would significantly improve these.
Ravi Sachidanandam, from Girihlet inc. spoke about mitochondrial genetics and how overcoming heteroplasmy, that being having multiple types of a cellular organelle within an individual cell, is important for accurately identifying haplotypes. Furthermore, Ravi spoke of how targeted sequencing of mitochondria has significant clinical applications in terms of identifying and interpreting low abundant pathogenic variants.
Starting his talk, Ravi gave some background on how mtDNA is passed down on the maternal side and how heteroplasmy patterns can change during this transmission. Next, he described how heteroplasmy occurs in disease systems, whereby a continuous gradient of disease severity is related to the uneven distribution of mitochondria in daughter cells.
Moving on, Ravi said that some of the main confounding effects of trying to measure heteroplasmy in mitochondria is that mtDNA is typically of low abundance and copy number will vary between cells and individuals. All the common methods used to analyse mtDNA suffer from significant contamination from genomic material and that to avoid this, complex organelle purifications must be used. Ravi spoke about a method he and his team were developing called Mseek for target enrichment of small circular genomes from cellular organelles. Similar to enzymatic plasmid purification methods, nucleases were used to degrade linear and nicked mtDNA leaving only fully circularised mtDNA.
As the mitochondrial genome is only 16kb long, Ravi said that you don’t need much sequence data to get great coverage. Moving on, Ravi spoke about how Heteroplasmy seems to be consistently passed for a parent cell to a daughter cell when in pure culture but and he suggested that because of this, heteroplasmy could potentially be used for identification methods.
Moving on to discuss some experimental results from the oxford nanopore platform, the purified mtDNA was cut using a single restriction endonuclease and used as a template for long range PCR to amplify prior to library prep and sequencing. Ravi noted that the PCR did not cause chimeric molecules, but more variants were detected in the nanopore data compared with short read data and some of the frequencies were different. In terms of on target percentages, across the replicate samples 19.6 % to 78 % of reads aligned to the mitochondrial genome with most samples giving over 50 % of reads aligned. Examining the variants called in both short and long read sequencing data generated from the same samples, there was good concordance between the two platforms but the long read nanopore data picked up a more. Examining the technical variability between a number of different sample Ravi said it was remarkably consistent and to convince himself of this he needed to use an extra barcoded, non-mitochondrial sample as a control to convince himself there was no significant barcode misclassification.
Breakout: Clinical microbiology
Matthew Keller, PhD, of the CDC Influenza Genomics Team (IGT), presented his team’s development of a rapid, portable, end-to-end influenza A virus sequencing pipeline using Oxford Nanopore’s MinION device.
Matthew began by describing the scale of seasonal influenza, which causes ~290-650,000 deaths a year; in 2017-8, it was responsible for ~80,000 deaths in the United States alone, with influenza A virus (IAV) the primary cause. The RNA virus is highly variable “to the point that virions are essentially unique”; it is able to rapidly mutate, via antigenic drift and reassortment of its segmented genome with that of other coinfecting viruses. He stressed the need for a concerted surveillance effort of this process: this is the main goal of the IGT. The team, together with the NIRC, have sequenced over 25,000 influenza samples from across the world using their high-throughput Global Surveillance pipeline: RNA is extracted, reverse transcribed and amplified via M-RTPCR, barcoded and prepared for sequencing in a total of ~11.5 hours, then sequenced for 24 hours using a short-read technology; sequencing data is then analysed and curated in ~2 hours. Matthew explains that, whilst most of the samples sequenced represent seasonal influenza, the team also look for novel viruses which could cause future pandemics.
Matthew described the potential for zoonoses, viruses which can transmit from animal to human hosts, and reassortment of viruses, to cause pandemics. He noted that the Spanish flu virus, which caused more deaths than there were casualties in World War One, transferred to humans from wild water fowl, whilst the virus behind the 2009 H1N1 outbreak had complicated origins, from four viruses in three species, with pigs acting as the “mixing vessel”. With transfer of zoonoses from swine to humans posing an important source of potential pandemics, Matthew asked: where could swine-human contact occur? One answer is exhibition swine shows: with pigs and humans travelling from far and wide, and with lots of contact between the two, these could pose a zoonosis risk: “sick pigs and transmission everywhere.”
Matthew then introduced MIA: IGT’s Mobile Influenza Analysis pipeline for rapid, portable IAV sequencing and analysis using nanopore sequencing. To enable faster, on-site sequencing, the team redesigned the Global Surveillance pipeline from start to finish, cutting RNA extraction, M-RTPCR and barcoded sequencing library prep down to ~3.5 hours and sequencing samples in multiplex on the MinION device for 1-6 hours; analysis was run in real-time alongside sequencing for 1-12 hours.
Matthew notes that the high-powered laptop used for analysis was “actually the least portable part of our pipeline.” The team then took the MIA pipeline on the road, in two small suitcases and a cool box, to a swine show, for on-site testing. They initially screened ~100 pigs using rapid tests, noting that these are known to frequently produce false-negatives; that night, they returned to the 7 pigs that had tested positive for influenza, taking nasal wipes from these and their neighbours. They then set up in the barn and, overnight, extracted RNA from the samples, sequenced the 24 samples in multiplex on the MinION and ran analysis. Data was basecalled, demultiplexed then mapped via IRMA.
By the afternoon of the next day, they were able to report virus coverage for every barcoded sample and construct a phylogenetic tree of the 13 influenza A viruses identified in the samples. 11 of these clustered as an outbreak of H1N2, which Matthew noted had originated from a human H1N1 virus (for which a vaccine was made), passed into swine and subsequently evolved to the H1 delta 2 lineage; this data indicated that the strain was circulating in the swine population. He described how children <10, of which there were many at the swine show, could be particularly vulnerable to the strain. Could the H1N2 identified pose a risk of a future zoonosis event? The data was then used to search for candidate vaccines; in the absence of a good match with any current candidates, the data was emailed to the CDC, from which a synthetic vaccine could be synthesised – “18 hours after unpacking MIA.”
The samples were subsequently run through the IGT Global Analysis pipeline and produced the same 13 genomes, with 99.3% consensus accuracy between MinION-generated data and that of the short-read technology. A follow-up phylogenetic tree constructed with collaborators at USDA allowed for tracking of the evolution of H1 delta 2 through swine until detection. Since the study, two cases of H1N2 have been identified in humans, whose ZIP codes matched the locations of the pigs; the strain matched that identified by MIA.
Matthew concluded that MIA allows for the successful on-site sequencing of swine influenza virus isolates. The pipeline demonstrates the ability of rapid sequencing to improve surveillance of influenza and potential zoonisis risks. The team plan to deploy the pipeline again in Thailand and Puerto Rico, and hope to further improve the screening process for even faster sample-to-answer time.
Ganna Kovalenko, of The National Academy of Agrarian Sciences of Ukraine, presented her work using Oxford Nanopore technology to sequence two lethal viruses that recently emerged in Ukraine.
The African swine fever virus (ASFV) causes lethal hemorrhagic disease in wild boar and pigs in both backyard and commercial pig farms, presenting a “significant threat to the global pig industry”; outbreaks have spread through Eastern Europe, Russia and China, and there is currently no effective vaccine or antiretroviral therapy available. The virus is 170-190 kbp (“this is huge for a virus”) and has 24 genotypes; it uses the argasid tick as a vector (though this is not seen in Europe) and can infect macrophages and monocytes.
Ganna and her team selected two ASFV genomes, collected in 2014 and 2017 from outbreaks in widely separated regions of Ukraine, for nanopore sequencing. For their pilot experiment, loci providing a “genomic signature” to determine the epidemiological origin of the 2017 sample were enriched via targeted PCR; the library was then prepared using the 1D^2 sequencing kit (SQK-LSK308) and sequenced on the MinION device. The amplicon data was mapped mapped and consensus regions extracted; analysis via BLAST identified the sample as belonging to the virulent ASFV/Georgia/2007 lineage.
Next, the team prepared the 2014 ASFV DNA sample for whole genome sequencing using the Rapid Sequencing Kit (SQK-RAD004), sequencing on the MinION device. Reads were basecalled via Albacore, trimmed with PoreChop, aligned to the 2007 reference genome using Minimap2 and variant-called with Nanopolish. This generated an average of 30x coverage of the ASFV genome. Ganna displayed the few variant substitutions in the data: “ASFV replication is high fidelity!”. The team used these variant substitutions in their annotation of the genome. Construction of a phylogenetic tree showed that the two nanopore-sequenced samples clustered closely, despite their geographic and temporal separation. Ganna showed the multiple clusters of ASFV across Ukraine, and described how the virus appears to have been introduced into the country in several distinct events at different locations.
Ganna then went on to describe the team’s sequencing of the avian influenza A virus (AIV) in Ukraine. AIV is carried by wild birds; however, it can cause “lethal, highly pathogenic outbreaks in poultry (HPAIV).” Over 7,500 samples were collected in the field; initial PCR-based testing suggested the presence of the H5 or H7 subtype of AIV. Viral RNA collected from environmental samples and necropsy tissue from poultry and wild waterfowl was then then reverse transcribed and amplified via MS-RTPCR to generate cDNA libraries; these were then barcoded and sequenced in 1D on the MinION. Each of the eight segments of the viral genome were assembled via Geneious R11; Ganna displayed the alignment of data from one sample to the Eurasian LPAIV M gene with 99% identity, revealing a low pathogenicity subtype. This approach was used to genotype six avian influenza virus samples from mute swans, ducks and chickens in Ukraine via reference-based assemblies and identify reassortment in the segmented genomes.
Ganna concluded that the use of nanopore sequencing enabled “point-of-outbreak” diagnosis of a rapidly spreading, virulent disease. She added that the method that she and her team have developed could be scaled up and applied more widely: “there are many pathogens to sequence in wildlife, livestock and humans.” Their work represents the first time the full genome of an ASFV of the Georgia/2007 strain and full genomes of avian influenza viruses have been sequenced on the MinION, helping further understanding of the evolution and emergence of these viruses.
Julie Karl, Senior Research Specialist at the University of Wisconsin-Madison, opened the Assembly breakout session by describing her recent work applying long-read nanopore sequencing to create a complete de novo assembly of the Mauritian macaque MHC haplotype. She explained that the 5 Mb macaque MHC genomic region is currently unexplored, despite its importance in transplantation and immunological research. The MHC region is gene-dense, highly complex and exhibits a high level of repetitiveness - caused by the presence of multicopy genes and pseudogenes. Furthermore, many copy number variants are evident, arising from segmental duplication events. Julie stated that this high level of genome complexity precludes the accurate assembly and analysis of the MHC region from current macaque reference genomes which have been created using short-read sequencing technology.
The long reads provided by nanopore sequencing are able to span complete regions of repetitive DNA, allowing compete genomic resolution. In order to generate ultra-long nanopore sequencing reads, Julie adopted a phenol:chloroform sample preparation approach previously demonstrated by Nick Loman, Matt Loose and co-workers to deliver high-molecular weight DNA. In order to simplify analysis, the team obtained their macaque DNA from an isolated population from which animals with homozygous MHC can be obtained.
Julie revealed that this approach provided approximately 53x genome coverage, with the longest read in excess of 1.4 Mb. The reads were mapped to the human hg38 reference genome and sequences mapping to the MHC region were extracted and de novoassembled using Canu. The longest read in the MHC region spanned 613 kb. Julie showed how the long reads provided by nanopore sequencing allowed the entire MHC region to be resolved in a single contig.
The team took an initial look at where the longest reads (>250 kb) were localised in the MHC regions and, as expected, the majority of reads spanned multiple genes, which Julie stated: ‘provided confidence that resultant assembly would be highly contiguous’. Polishing using short sequencing reads was used to further enhance the consensus sequence. The resulting, highly accurate MHC contig spanned 5 Mb, with genomic regions exhibiting 100% sequence accuracy with cDNA sequence. Comparing this contig with existing BAC-based MHC contigs revealed many similarities but also identified a large insertion in the new data set, which, based on initial analysis, the team believe to be caused by a duplication event.
Closing her presentation, Julie listed a number of future research activities that they are planning, including analysing the dataset for other immune gene regions of interest and assembly of the full macaque genome. Additional work will also include characterising additional macaque genomes/MHC regions and exploring the feasibility of using target capture technologies for ultra-long reads.
Zhijan Jake Tu
Zhijian Jake Tu introduced his presentation with the startling fact that mosquitoes are the world’s deadliest animal, with mosquito-borne diseases (such as malaria and dengue) responsible for 725,000 human deaths per year.
The Tu lab at Virginia Tech study sex determination in mosquitoes to answer a range of questions – from basic biology of sex determination and evolution of the sex chromosomes to potential mosquito control applications. Zhijian described how, in some mosquito species, a dominant male-determining factor (M-factor) on the Y chromosome provides the primary signal that initiates male development. However, studying M-factors is complicated by the repeat rich nature of the Y chromosome.
In order to study the mosquito Y chromosome, the Tu lab applied nanopore technology, which delivers long sequencing reads that can span large repetitive regions. In the case of Anopheles albimanus, the team used the MinION to generate 120x genome coverage. Following Canu assembly, sequencing polishing using short-read sequencing data and HiC scaffolding, a contig N50 of approximately 10 Mb was obtained. Furthermore, a BUSCO score of 99 indicated a highly complete assembly. The assembly comprised 6 Mb of Y chromosome sequence. While the Y chromosome was not incorporated in a single contig, the assembly significantly improved on existing references and allowed the team to identify another potential M-factor. The team are now planning to explore the generation of ultra-long reads to complete the Y chromosome assembly.
The team at Virgina Tech are also applying nanopore sequencing technology to study the Anopheles merus Y chromosome. As the X and Y chromosomes of this species share a number of repeats, which can confound analysis, the team sequenced an F1 hybrid from male A. merus and female A. coluzzii mosquitoes — allowing easier differentiation of the sex chromosomes. Initial analysis indicated that the data are encouraging and, Zhijian suggested, will allow more contiguous assemblies of the Y chromosome.
Zhijian’s work on sex determination could have a profound impact on mosquito control measures. As only female mosquito's bite and are therefore responsible for mosquito-borne disease, the manipulation of sex ratios through the use of Y-linked ‘maleness’ genes, could be a novel and effective form of disease control.
Raja Shekar Varma Kadumuri from Indiana University, shared his work utilising direct RNA sequencing to identify transcriptome-wide N5-methyl cytosine (m5C) modification events at single molecule resolution in human cell lines. Unlike alternative sequencing technologies that require additional sample processing to capture base modifications, direct RNA sequencing using nanopore technology allows simultaneous detection of nucleotide sequence and base modifications. The m5C RNA modification is known to regulate RNA processing, stability, mRNA export and translational processes. Raja and the team at Indiana University developed RAVEN, a deep neural network-based framework for the detection of m5C modifications from nanopore sequencing data. RAVAN utilises known m5C genomic locations and their modification frequency to support base modification detection. The framework has been validated on approximately 10,000 known m5C and unmodified cytosine signals from HeLa cells. Raja presented data showing that RAVAN predicts the m5C modification at single base resolution with an accuracy of 85%. Key features of the framework is that is provides a read-level probability score for each modification loci and the extent of methylation for each loci based on read depth. Raja now plans to test the RAVEN on additional human cell lines, validate results with bisulfite sequencing and examine other RNA modification types. RAVEN, which utilises albacore basecalled fast5 files, is freely available to download from GitHub.
Alison Tang of the University of California, Santa Cruz presented her research on using long nanopore reads to characterise full-length RNA isoforms. Long nanopore reads can span entire RNA molecules thereby facilitating the identification of exon-exon connectivity and allowing discovery and quantification of novel isoforms. Existing short-read RNA analysis tools are ill equipped to work with long nanopore reads. To address this, Alison developed the analysis tool FLAIR (full-length alternative isoform analysis of RNA). FLAIR contains two alignment steps to produce an accurate, nanopore-specific reference. It also incorporates promoter chromatin states to distinguish between 5’ truncations and true transcription start sites. Alison presented data showing how the tool works with both cDNA and native RNA sequencing data, allowing the detection of many novel isoforms (65% using native RNA sequencing of GM12878).
Using FLAIR on cancer samples with splicing factor mutations, the team at UC, Santa Cruz demonstrated that the tool is able to detect subtle splicing aberrations. In closing her presentation, Alison stated that this is ‘one of the many ways in which nanopore sequencing can expand our understanding of the transcriptome’.
Blake Billmyre from Stowers Institute for Medical Research showcased his work characterising the WTF gene in the yeast Schizosaccharomyces pombe (fission yeast). He first described how alleles are evolutionary competitors. This is particularly true for the WTF gene, for which specific alleles enhance their own transmission to the next generation by producing a poison that kills offspring that do not contain that allele. WTF genes are part of a large and rapidly evolving family; however, Blake described that the repetitive nature of these genes and proximity to long terminal repeats makes the assembly of WTF loci particularly challenging when using traditional short-read sequencing technology. To overcome these issues, the team at Stowers are now using the long sequencing reads provided by nanopore technology to fully characterise these complex regions. Initial results have shown that copy number and sequence identity can vary greatly between strains, resulting in reproductive barriers between different strains. Concluding his talk, Blake stated that nanopore sequencing is allowing assembly of the WTF region and delineation of WTF genes with accuracy comparable to alternative sequencing approaches.
A new addition to the agenda this year was the Spotlight Session, where early career researchers get the opportunity to pitch their research to the audience before a vote, after which the winner stays on to deliver a plenary talk on the main stage. Everyone still gets the chance to present though, with the runners up delivering a mini theatre talk immediately afterwards.
The three researches to pitch on day one were Tom Nieto, Natalie Ring and Nada Kubikova. First up to the stage was Tom Nieto, a surgical registrar as well as researcher from Birmingham. Tom pitched his talk on how sequencing is changing the field of renal transplantation. Solid organ transplantation, Tom explained, is arguably the medical marvel of the twentieth century, and near-patient DNA sequencing may well be the marvel of the twenty-first. But, there are significant barriers to solid organ transplantation globally, particularly in low-income countries, where cost and logistics inhibit progress. Due to this, Tom and his colleagues are developing a tissue typing laboratory in a single suitcase, removing the need for specialised equipment or specialist staff and improving global access to transplantation. Broadening access to HLA-typing is an application ideally suited to long reads delivered in real time, and could reduce a process that takes days down to just hours.
Tom finished by defining the goal of the project: deliver high resolution, accurate, affordable, near-patient tissue typing, globally.
Following Tom was Natalie Ring from the University of Bath. Natalie began her pitch by telling the audience that her application to the Spotlight Session involved her sending in a video of herself advertising her talk to Oxford Nanopore, but she refused to show the video to anyone. However, she made a bet with a friend that if she made it through the application phase to go to San Francisco she would recreate that video for the audience and deliver her pitch in poem format. The live blog will almost certainly fail to do justice to the rhyming section, but during Natalie explained how she loves talking about the history of Bordetella pertussis and gaining the understanding we all lack in the disease by using long and ultra-long reads to unlock the secrets of the bacteria.
The third and final pitch came from Nada Kubikova, on how sequencing is changing the present and future of reproductive medicine. Unlike Natalie, the pitch didn’t come with added rhyming, but would focus on preimplantation genetic testing and its link to IVF. More specifically, Nada posed the questions: Why is success of IVF so heavily linked to maternal age? And can we avoid disease transmission from high risk couple to their children.
After Nada’s pitch the session moved to the vote, where a tense minute or two concluded with Tom winning with 42%, Natalie just a tiny bit behind on 39%, and Nada pulling in 19% of the vote.
Beginning his talk proper, Tom outlined that his application is one particularly suited to expansion to the developing world, where tissue typing labs simply do not exist. But more than that, how can we take portable tissue typing into clinical practice, and how do we validate it?
Tom explained that kidney disease affects 1 in 10 adults worldwide, and that transplantation is the best possible form of treatment for kidney disease, improving quality and length of life of the patient versus the alternative, haemodialysis. Safe transplantation requires tissue matching to avoid rejection, and the better the tissue match, the longer the kidney lasts before further intervention is required.
To accommodate this need for rapid tissue typing for organ matching, Tom introduced the “mark one” tissue typing laboratory in a suitcase, featuring such equipment as the miniPCR, MinION, and a laptop to which just pipettes and DNA could be added. Tom then asked whether such a setup could be validated quickly in Birmingham, which hosts the largest solid organ transplant system in the UK. Once validated, the equipment could be taken on the road with Transplant Links Community, a charity which takes transplantation to the developing world – in particular Ghana, the Caribbean and Papa New Guinea. In these regions, Tom explained, the problem is not a lack of patients or clinicians, merely that the labs for them to use don’t exist. Tissue samples, instead of being processed in-country, have to be sent back to the UK to establish whether they are a match to the patient or not. This clearly isn’t a sustainable method and demonstrates an urgent need for technology near the location of tissue sampling.
In more detail, the process for nanopore sequencing of the tissue samples involved use of the Ligation Sequencing Kit with barcoding, before using Porechop for demultiplexing, alignment with minimap and samtools, and finally HLA-PRG-LA for HLA typing. All software is open-source and so easy for other users to replicate.
For the pilot scheme, Tom described their analysis of 11 samples from kidney donors, extracting DNA from patient blood and typing with short-read sequencing, SSP (sequence-specific primer) techniques, and nanopore long read sequencing. Tom detailed that traditional SSP methods type to single field accuracy and overall the process carrries approximately 1% error – not in least because technicians must manually transfer typing results across to a large spreadsheet, often late at night.
While the short-read sequencing yielded 4-field accuracy, MinION comfortably obtained 3-field accuracy, with further optimisation yet to be done. Combined with this result, MinION also provides the advantage of being suitable for overnight provision of results, taking just 9 hrs in comparison to significantly extended timelines for short read assays. This rapid turnaround also brings closer the possibility of deceased organ transplant.
To conclude, Tom listed the advantages of nanopore sequencing for HLA typing for solid organ transplant: improved access to typing globally; removing logistical barriers to renal transplantation; lower costs; faster results; more robust protocols; and greater accuracy that commonly-used SSP techniques.
Plenary: Edward DeLong
Edward DeLong, from the University of Hawaii, gave a talk expanding upon the work referenced by Eoghan Harrington of the Oxford Nanopore Applications department earlier in the day. Edward began his talk by describing the research station in Aloha where the samples analysed later in the talk were taken. To give an insight into the types of research this station facilitates, Edward spoke about an ongoing 30 year time series experiment aiming to track changes in ocean microbial populations. Using the example of ocean acidification, Edward spoke of how one of the interests of microbial ecologists is to determine how microbial populations change over time in response to varying environmental variables. In the case of ocean acidification, this is directly linked to climate change as CO2 dissolves in sea water, reducing the pH and potentially affecting microbial organisms such as coccolithophores whose outer protective plates are made of calcium carbonate. Segueing into a description of ocean microbial as the the “forests of the sea” Edward mentioned that ocean microbial populations fulfil many essential biogeochemical processes, and that organisms such as the cyanobacteria as primary producers. However, the focus of this talk was that of ocean phages as they affect, maintain, and alter prokaryotic populations through, among other things, infection and release of nutrients through cellular lysis. Due to the interrelated nature of phage and prokaryotic ocean populations, understanding not only which phage taxa are present in sea water, but how these populations change over environmental and temporal gradients, is important to understand how changes in environmental conditions may alter ecosystem function. In order to do this there are a number of challenges, not least of which is a lack of good viral reference genomes.
Moving on to describe his study of viruses in the oceans, Edward described how sea water from the study site in Aloha was taken at 15m, 117m and 250m. As an overview of the methods used, water was put through filtration systems in order to select for virions and then nucleic acids were then extracted from each sample using a Qiagen genomic tip 20/G to produce 2-5ug of DNA. This was then used as the input material for the standard LSK-109 ligation sequencing kit by David Dai at the New York branch of Oxford Nanopore's Applications department. Samples were sequenced on a GridION using 9.4.1 flow cells and the resultant data was put through an analysis pipeline constructed by Oxford Nanopore's Applications bioinformatician, John Beaulaurier. First, the raw 1D reads were put through kaiju to generate taxonomic bins of known viruses. The remaining un-binned reads were filtered for known cellular fractions to remove sequences belonging to known non-viral organisms. K-mer clustering was performed in order to generate k-mer bins alongside the viral taxonomic bins. The k-mer bins were further processed by Canu for error correction and, using a read length filter, reads that spanned whole potential viral genomes were isolated. Next these reads were clustered based on nucleotide frequency using PyANI and polished using Racon and Nano polish to produce draft phage genomes.
Expanding upon each section of the data analysis, Edward explained in more detail how t-distributed stochastic neighbour embedding (t-SNE) plots were constructed and the density of points on the graph were used to delineate taxonomic bins in a reference-independent fashion. Then, using Kaiju, each taxonomic bin was filtered for cellular DNA “noise” and then screened for unique properties of phage genomes to determine which could be used for correction and draft genome construction. Edward said that initially they were having difficulty assembling the reads from each bin, but then upon examining the read length distribution he proposed that this may be due to the fact that whole viral genomes were captured in single reads. The next task was to convince himself this was true. In order to do this, read length distributions of each bin were compared with the overall distribution of sequence lengths in the sample, suggested that, indeed, whole viral genomes were covered in single reads. Furthermore, using pairwise comparisons of average nucleotide identity within each bin, Edward showed that many contained single viral genomes while others contained multiple, closely related viral genomes. In the case of the former, one read was picked and polished using the rest of the reads within the cluster, while in the latter, reads from each average nucleotide sub-cluster were chosen as references and polished.
In order to validate these draft viral genomes, a number of checks were performed. A program called Virsorter was used to determine how many of the reads within each cluster were predicted to be of viral origin and this resulted in 100 % concordance. Furthermore, 95 % of the polished reads had 200 – 2000 bp direct terminal repeats, a feature very common in viral genomes. Pulse field gel electrophoresis of known oceanic phages were compared with the read length distributions of the suspected phage bins and the polished read lengths matched closely suggesting that whole viral genomes had actually been caught in single nanopore reads. Next homology between the polished reads and a number of environmental viral genome databases was calculated, showing that the majority of proposed genomes had significant homology known marine viral genomes. However, Edward pointed out that a number of novel viral genomes were detected in just these three samples and, although homology with known viral genomes was low, they had many of the characteristics of a complete phage genome.
Discussing the results of the depth study Edward showed that as depth increased, the proportion of phage genomes of unknown origins increased, but many at the 15M depth were reasonably well characterised taxonomically. Furthermore, many of the marker genes expected to be seen in viral populations could be detected in these draft genomes.
Edward then showed some short-read phage data and how these populations change across time and depth. This highlighted both dynamic and predictable patterns of ocean phage communities with some taxa appearing and disappearing sporadically while ecological patterns in others were relatively predictable. Finishing this section Edward compared short read sequencing with nanopore sequencing on the exact same samples and showed that nanopore reads appear to recover rare phage types very efficiently.
In his closing remarks, Ed said that nanopore sequencing of viral metagenomic samples required no assembly, with many reads spanning whole viral genomes. In addition, nanopore sequencing efficiently recovers whole virus sequences from complex environmental samples and, compared with standard short read assembly methods, novel viral types seem to be recovered using a nanopore-based approach.
Plenary: Crystal Gigante
Crystal Gigante an ORISE fellow from the Centres for Disease Control and Prevention (CDC) opened her plenary presentation by outlining how rabies is a global burden. It is present on all continents (except Antarctica) and kills nearly 60,000 people every year. While post-bite vaccination is 100% effective if given early, once symptoms present, the disease is fatal. Crystal further revealed that >95% of rabies deaths occur in Africa and Asia. In order to combat this disease, the World Health Organization, together with other leading healthcare bodies have initiated the ‘Zero by 30’ campaign, an effort to eliminate dog-mediated rabies – the cause of 99% of human rabies cases – by 2030. However, Crystal explained that, while regions with endemic canine rabies are enthusiastic and motivated to meet this goal, they may not have the resource to monitor and track rabies. As a result, rapid, portable and affordable sequencing could provide the critical information required to tackle the disease, allowing the identification of outbreaks, monitoring of elimination efforts and investigation of viral evolution.
To support the goals of the Zero by 30 initiative, the project team evaluated the facility of nanopore sequencing using the MinION, to provide low-cost, in-field analysis of rabies virus. The rabies virus (Rabies lyssavirus) is a negative-sense, single-stranded RNA virus with a 12 kb genome. The team performed PCR to amplify the full-length coding sequence of two (N and G) of the five R. lyssavirus genes. The amplicons were combined and each sample was given a unique barcode before pooling and sequencing.
At as little as $8 per sample when barcoding 96 samples, Crystal revealed that the cost of nanopore sequencing was significantly lower than provided by other sequencing technologies, which ranged from $23-151 per sample. For comparison, it costs approximately $40 to type for rabies using antibodies – the current gold standard test.
Comparison of the nanopore sequencing data with that provided by Sanger sequencing, for samples representing 15 rabies variants, showed a 0.048% consensus sequence error rate. Although some deletions and insertions were observed, these could be removed through manual correction. After sequence polishing and correction, complete concordance was obtained with the Sanger sequencing results.
The regional veterinary diagnostic laboratories that typically receive rabies samples often lack the resources and expertise for sequencing. To determine whether nanopore sequencing can be carried out in such labs, the CDC team set up pilot projects in Guatemala, India, Kenya and Vietnam – countries with endemic canine rabies. These partner labs, who had not previously performed rabies sequencing, were each asked to collect approximately 40 confirmed positive rabies samples. Using the workflows provided, the labs were able to sequence 10-100 RNA extracted samples per week. Crystal suggested that the latest workflows could shorten the entire sequencing workflow, from sample to result, to a single day. She also shared initial unpublished results showing how nanopore sequencing allowed the generation of distinct clades.
Summarising her presentation Crystal commented that this evaluation: ‘suggests the MinION can produce informative rabies sequences from clinical and field samples’. The team are currently working to finish validating the workflow and publish the resulting data.
For more information on the talk by Oxford Nanopore CTO, Clive Brown, please see this Twitter moment summary.
Stay tuned for the action from day two tomorrow!