London Calling: Day 1 (morning) writeups
Thu 24th May 2018
The annual London Calling conference kicked off today at Old Billingsgate in the City of London. This blog contains writeups of many of the talks, or you can also follow the action at Twitter on #nanoporeconf. The page will be updated frequently over the coming days so do check back if you can't see the talk you were interested in.
Plenary: Angela Brooks, UCSC
Angela Brooks from the of the Nanopore Human RNA Consortium gave the opening plenary talk at London Calling. Angela detailed the work she and the consortium have been undertaking in order to sequencing the native RNA from of the well-studied GM12878 human cell line. Angela started by talking about the how it amazed her that cells, each containing the same genome, can have such different phenotypes and functions. Using this as a nice segue into a description of RNA and its function, Angela described how RNA can have many isoforms, modifications and non-coding regions. Discussing how traditional short read sequencing techniques make these properties hard to accurately define, Angela said that in this talk she was going to focus upon the sequencing of GM12878 poly adenylated RNA in both a direct fashion, using Oxford Nanopore’s Direct RNA Sequencing Kit, and after PCR amplification, using the PCR-cDNA Sequencing Kit.
The main aims of the Nanopore human RNA consortium were; to generate a community resource of human poly A RNA that had been sequenced in its native form; identify technical reproducibility by sequencing the same sample to high depth across many labs; and to share both wet lab and data analysis methods. Detailing the methods undertaken by the many research labs associated with the Nanopore human RNA consortium, Angela described how RNA extraction was performed followed by polyA RNA enrichment. Furthermore, spike in control RNAs were added to the sample as an internal control.
After the different labs had sequenced their respective samples using both the direct RNA and PCR cDNA kit, 13 million and 24 million reads were generated from the two methods respectively. Comparing the two methods, direct RNA had longer alignment lengths and both methods showed comparable accuracies and good correlations in terms of gene abundance. Furthermore, both had good correlations with externally sequenced short read data. Angela said that the full data set generated by the Nanopore Human RNA Consortium was available for download.
With a focus on the technical reproducibility of the direct RNA data, a principle coordinate analysis showed that this was high when the same tissue culture sample was sequenced across all labs. Furthermore, a separate aliquot was sequenced showing separation on the plot highlighting the fact that sample origin explained a greater amount of the variance within the data than technical replication.
Examining read length over time, there was no correlation, suggesting that no detectable RNA degradation was occurring on the flow cell during sequencing and that the 30 % of reads with partial alignments must be caused by another factor. Delving deeper into this data, it became apparent that a number of these partial reads were due to easily fixable software artefacts and Angela hypothesised that the remaining partial reads may be a result of degradation during sample processing or as a result of relevant biological processes.
Angela showed that orthogonal data, such as promoter targeted ChIP-seq or comparison with current transcript annotations, could be used to aid in the assembly and validation of transcript isoforms. The longest high confidence isoform that was detected by Angela and the consortium was Sorl1, a > 10 kb read which spanned 48 exons and has been implicated as a determinant of Alzheimer’s. Furthermore, Angela demonstrated that allele-specific expression could be detected where Xist, a gene located on the X chromosome, has a paternal bias for expression.
Estimating PolyA tail length was the topic of the next section of the talk. As the adapter used for sequencing the RNA directly sits at the 3’ end of the polyA tail, specific signals in the raw data, such as dwell times, can be used to estimate the poly A tail length. Comparing the estimated polyA tail length from sequencing, with the known tail length range of human RNA and the spike in controls, matching length distributions were observed.
Angelia finished her talk by discussing the detection of modifications in native RNA. By synthesising and sequencing RNA transcripts containing only specific, known modification and comparing these with sequences from molecules without this modification, model training datasets showing shifts in signal were generated. As the positions and type of modification were known, these model training datasets can be used to enhance basecalling algorithms allowing the detection of both the position and type of modification in a native RNA molecule.
Plenary: Dan Turner, Oxford Nanopore
Dan Turner, Vice President of Applications at Oxford Nanopore Technologies, gave an update on work being carried out by the Oxford, New York, and San Francisco groups. This year a major focus of Dan’s talk was on helping customers choose the correct experimental path to answer their research question by guiding users from sample extraction right the way through to data analysis. This concept was first introduced at London Calling in 2017, and since then considerable work has been undertaken to transform the current protocol selector available on the Nanopore Community website, into a full, bespoke, end-to-end, protocol builder designed to allow customers to obtain both high quality input material and output data, with a downstream analysis pipeline in mind. As a demonstration of this, Dan showed the protocol builder in action starting with a broad experimental research question and working from there. The aim of the hypothetical experiment was to detect structural variation and, through a series of simple mouse clicks denoting whether the approach was amplicon based or whole genome-based, Dan landed on a page which allowed him to reorder the priorities of the output data. Choosing read length and maximum data over speed and simplicity while selecting a GridION as the sequencing platform of choice, Dan was presented with three library prep options along with predicted output metrics such as throughput, run time and read length distributions along with a recommendation of which to choose. Having chosen a suitable sequencing approach, the protocol builder moved on to tailoring a nucleic acid extraction method to fit the sample type under study, in this case a blood sample from a mammalian animal. Again, several options were presented, this time for DNA extraction, along with their predicted read N50s, difficulties, and throughput metrics. Selecting the recommended method, the data analysis section of the protocol builder then suggested two bioinformatic pipelines that could be used within the context of structural variant calling. Alongside both the recommended and alternative methods were the required input file formats, the output file formats and a brief overview of the programs used for each step. The overall output was PDF file containing a detailed explanation of DNA extraction, library preparation and data analysis along with a list of third party consumables and links to ONT reagents. The protocol builder is scheduled for full release later this year, however in the meantime the information used in each step is available on the community website with more being added continuously.
Dan moved on to talk about removing short DNA fragments from Nanopore libraries prior to sequencing. A certain degree of DNA fragmentation during DNA extraction seems to be somewhat inevitable and a method currently under development at Oxford Nanopore aims to remove these using cheap disposable reagents. Based upon DNA precipitation and filtering through a membrane, a drastic shift in read length distributions was seen in favour of longer fragments with the main peak moving from around 20 kb in the control library to between 30 kb and 40 kb in the processed library. Dan pointed out that although this can increase read length in samples, DNA fragments of this length must be present in the sample to start with.
Next, work identifying the impact of common library preparation contaminants on sequencing was presented, showing that current kit iterations are surprisingly resilient to contaminant carryover. For example, up to 20 % ethanol, 10% Isopropanol, 1 % phenol, 100 mM Sodium chloride, 100 mM Guanidine HCl, 50 mM Guanidine SCN or 10 mM EDTA can be tolerated in the final library using the ligation-based sequencing approach, while the rapid based methods were slightly less resilient.
Dan announced a live sequencing demo of Sarisin’s minnow using Oxford Nanopore's GridION X5 and PromethION sequencing platforms taking place during the conference. With the aim of generating > 100 Gb and a read N50 of 15 kb – 20 kb, Dan said that sequencing data from this project would be made available to delegates at the end of London Calling. Library preparation for this sample will also be demonstrated on VolTRAX in the Live Lounge, as well as real time alignment through an EPI2ME workflow, thus providing an end to end example of a “soup to nuts” sequencing project.
In further VolTRAX news, Dan showed a video of the device being used completely in a field setting attempting to identify the presence of an unknown organism via sequencing a deposited faecal sample. With minimal lab equipment, a VolTRAX prepared library run on a MinION was able to identify the organism as the common rat. With a release date scheduled for late summer, the V2 iteration of VolTRAX was then discussed. Specific upgrades to this version of the device include; more accurate control of heaters; more input ports and sample volumes; optical detection and quantification ability; fine control of droplet movement; simplified bead washes through custom magnetics; more power provided through a USB-C supply to name a few.
With the release of Oxford Nanopore’s direct RNA sequencing kit and the associated Nature Methods paper, Dan described how native RNA can be sequenced without the need for amplification or conversion to cDNA. One of the main benefits of this is that modifications to the RNA are preserved which cause a shift in the raw signal. Using Tombo, a software package developed by Oxford Nanopore, these shifts can be used to infer the position of modifications with no a priori knowledge. Using native rRNA as an example of a highly modified RNA molecule, Tombo was able to detect majority of known modifications with very few false positives. Being the only sequencing technology able to accurately detect m5C, Experimentally modified RNA with a 25 % spike in of m5C was able to show that shifts were only seen in C containing 5-mers. As a result, Tombo now supports m5C detection in native RNA and further efforts are underway to expand this to m6A. In addition to RNA, Tombo is also able to detect modifications in DNA. Both m5C and m6A were detected in E.coli genomes with an AUC score of 0.99 and 0.98 respectively. Furthermore, decent performance was seen on single reads alone and optimisation for PromethION data is underway using CpG methylation of NA12878 DNA as a test case.
Moving towards the final section of his talk, Dan spoke about the detection of structural variation (SV) in human DNA. Highlighting the fact that short reads are unable to resolve complex and repetitive regions and that SV accounts for more genetic variability than SNPs in humans, the ability of Oxford Nanopore’s long reads to span regions of variability make it perfect for the detection of SV. However, in an initial analysis of SV in human DNA, numerous false positives were observed. Examining the raw sequence data, it became apparent that these false positives were not due to sequence inaccuracies or library prep, but rather a by-product of the basecalling process. Increasing the chunk size, that being the amount of raw data bascalled in one go, from 1 kb to 10 kb reduced the false positive rate to similar levels to that of other sequencing technologies. Specifically, 92.2% of deletions and 91.6 % of insertions were supported by other sequencing technologies. The long-term fix for this is to remove chunking altogether to get the maximum possible sensitivity for SV detection.
At the end of his talk Dan introduced a collaboration with the Imielinski lab, at NYGC, termed Pore-C. This is a long-read chromatin conformation capture method whereby chromatin is crosslinked to DNA and the nucleic acid is then fragmented. DNA molecules close together in 3D space, due to attachment to chromatin, are ligated together in a process called proximity ligation. DNA is then purified from the chromatin, size selected, PCR amplified before sequencing. Dan explained that the theory behind this is that regions of DNA that align to non-sequential regions of the genome, but are proximity due to interaction with chromatin, are potentially interacting with one another. Due to the long-read nature of Oxford Nanopore sequencing, multiple potential interaction events can be detected in single stretches of Pore-C processed DNA. Although these potential interactions could have a high false positive rate, pairwise interaction maps give a higher confidence in findings and higher coverage would increase this confidence for long range interactions. Furthermore, due to the nature of the interaction map, it can be used to help scaffold contigs and verify genomic assemblies by finding regions on the interaction map which do not fall on the diagonal of the plot. Concluding, Dan suggested that by the next Community meeting he would hope to be able to talk about a more interesting sample, possibly in collaboration with members of the LC delegation.
The Applications group released a series of posters today, on diverse subjects including Tombo, RNA, Pore-C, SVs. - More to follow soon
Breakout 1: Sequencing viruses
Guillaume Croville, from the University of Toulouse, spoke about the use of Oxford Nanopore sequencing on the MinION device to generate avipoxvirus (APV) genomes from clinical avian samples. He described how outbreaks of avian pox affect both commercial poultry and wild birds, including endangered species, so may be of great economic and environmental importance. He described how there is currently little genetic information on poxviruses, with fewer than 10 genomes available in Genbank; the viruses have large (200-300 kbp), repetitive genomes and are highly diverse, making assembly via short read data difficult.
Guillaume outlined his goal: to sequence APV genomes directly from avian lesions, without the need for isolation, purification or enrichment (to avoid bias). To this end, Guillaume and his team developed a phenol:chloroform-based protocol using a homemade lysis buffer to extract high molecular weight DNA from lesions for sequencing.
The protocol was first validated using APV propagated on the chorioallantoic membrane of host eggs; the extracted DNA was then prepared for sequencing on the MinION device using the Ligation Sequencing Kit 1D. This produced 100% coverage of the APV genome, sequencing to a depth of 638x. The validated protocol was then applied to clinical samples, obtaining DNA from three lesions of cutaneous and tracheal origin on chickens. Sequencing generated between 19-30x coverage of the APV genome; identification, typing and assembly of the virus genome was made possible from clinical samples without the need for pathogen enrichment. Guillaume noted that the long reads produced in sequencing made for easy whole genome assembly for APV, with 99.7-8% identity to the available reference genome.
He then described an ongoing project for his team: studying suspected adenoviral pancreatitis in guinea fowl. Employing the same protocol, viral DNA was extracted directly from a pancreatic sample, sequenced via MinION and assembled de novo to produce an assembly including a 43 kbp contig; BLAST analysis indicated very good mapping of the assembly to Fowl adenovirus A.
Ben Temperton, from the University of Exeter, presented his group's work using Oxford Nanopore’s MinION device to develop a long-read viral metagenomics pipeline: VirION. He described the importance of studying the viruses within microbial communities, often neglected in favour of studying bacteria: "it's like trying to understand the Serengeti without knowing what the lions are doing."
Ben noted that the use of short read technology in viral metagenomics presents two major problems. Firstly, highly abundant (“there is the same amount of carbon in all SAR11 viruses as 45 million blue whales”), highly microdiverse viruses do not assemble when using short reads; he provided an example short-read assembly comprised of 2005 scaffolds, the longest of which was 3.5 kbp. Secondly, the hypervariable regions of viruses, of particular importance to elucidating host-virus interactions, do not assemble. Ben described how Oxford Nanopore’s long read technology can solve these problems, producing longer contigs, spanning gaps between them and enabling the assembly of regions of high microdiversity.
Ben and his team investigated the viral metagenome of seawater samples using a linker-amplified shotgun approach: viruses were concentrated from seawater, then the viral DNA was extracted and amplified to solve the issue of the low availability of input material (20 litres of seawater yields ~20-100 ng). The viral DNA was then sequenced using the MinION device. Several bioinformatics approaches were used to generate assembles. Oxford Nanopore’s long read data was used to generate an Overlap Layout Consensus (OLC) assembly, which was error-corrected via PILON with the incorporation of data from a short-read technology. The long read data was also used along short read data to construct hybrid De Bruijn Graph (DBG) assemblies. Ben demonstrated that the long read OLC assembly enabled the assembly of a putative new pelagiphage in a single scaffold of 28.9 kbp, with the long reads able to cover regions of microdiversity. He described how the scaffold DBG assemblies produced longer contigs, spanning the gaps seen in assemblies using short reads alone and allowing the analysis of critical hypervariable regions. He also noted that the unassembled reads alone were long enough for the analysis of encoded genes and synteny.
Ben proposed that long-read viromes revealed the presence of viruses at some sites that had been "entirely missed" in previous short read assemblies and were able to "capture significant novel diversity" in these marine environments.
Luke Meredith, based at the University of Cambridge, spoke about sequencing Hepatitis B Virus in Sierra Leone on Oxford Nanopore’s MinION device. He described how healthcare is limited in Sierra Leone, one of the world’s poorest nations, with 4 hospital beds available per 10,000 people; despite infectious disease being the most common cause of death, little is known of the pathogens circulating.
The University of Makeni Infectious Disease Research Laboratory was set up in Makeni, Sierra Leone’s largest city, in response to the West African Ebola outbreak by which the country was heavily affected, and has since become a permanent research and training facility. Luke detailed a collaborative patient screening facility set up with Magbenteh Community Hospital to identify potential pathogens in over 7,000 patients across two years. Blood samples were taken from patients presenting with symptoms of infection, after first ruling out hemorrhagic fever; qPCR & RT-qPCR were then used to test for the presence of pathogens including HIV, Zika, Ebola and many others; the replicating Hepatitis B Virus (HBV) was detected in 8.2% of these samples. The samples were then taken forward for next-generation sequencing.
Luke described that HBV is prevalent in Sierra Leone, ranging from 8.5-32.5% of the population in previous studies. He explained how sequencing the virus would allow for a greater understanding of the geographical distribution of the diverse virus, and provide valuable information on the genotype, serotype and presence of drug-resistance and latency markers in isolates for use in guiding treatment and vaccination strategies. Furthermore, the ability to perform the sequencing within Sierra Leone would be beneficial for many reasons, including simpler data management, ensuring that data ownership stayed with the country and enabling knowledge-building within local communities.
The team designed a protocol for enrichment of the HBV genome, first reverse transcribing to ensure complete coverage of the 3.2-3.4 kbp DNA/ RNA hybrid genome, then amplifying using primers generated by PRIMAL (Quick, Loman, Pybus et al). Samples were then barcoded using Oxford Nanopore’s Native Barcoding Kit and 96-well PCR Expansion Kit, pooled and prepared for sequencing in multiplex on the MinION device via the Ligation Sequencing Kit 1D. Luke reported that samples were sequenced to very high coverage, with an average depth of 32,823x and the longest single aligned read spanning most of the genome, at 3.14 kbp.
The team were able to successfully identify both the genotype and serotype of isolates from the sequencing data; 100% of samples were identified as genotype GtE, of which the least is known about clinical outcome, and 97.5% were identified as serotype ayw4, for which mutations associated with vaccine escape are currently being investigated. A phylogenetic tree was constructed via MEGA7 using both Sierra Leonean samples and GtE genomes randomly selected from NCBI, indicating clustering of isolates by region and demonstrating circulation of strains within communities. Furthermore, sequencing also identified drug resistance markers in 27/68 patients, a surprisingly high proportion given the limited availability of antibiotics and demonstrating the potential of long read sequencing in identifying targeted treatments.
Luke concluded that the study has “potential long-term health consequences on a country that has a struggling health system”, and presents a strong case for the use of a vaccination program (not currently freely available for adults), with the goal of eliminating Hepatitis B along with Hepatitis C by the year 2030.
Breakout 2: Cancer & Oncology
Adams Burns from the University of Oxford began the Cancer and Oncology breakout talking about detecting clinically relevant molecular alternations Chronic Lymphocytic Leukaemia (CLL). Adam began his talk by introducing CLL, the most common form of Leukaemia in Western world. CLL is a heterogeneous disease, the course of treatment is influenced by the mutational status of immunoglobulin heavy chain (IgHV) and TP53, as well as deletions of 17p where the TP53 gene is located.
Current methods of mutation screening for CLL involves the use of multiple platforms including FISH analysis, Sanger sequencing and/or sequencing by synthesis. Current methods are labour intensive and typical turn around time for test results is currently 4 weeks. As such the aim was simplify this pipeline to generate a single combined test that could be used prior to beginning treatment for CLL. The team designed the test to look at all the mutation status of all of these loci using a combination of amplicons (for SNVs) and low coverage WGS (for 17p deletion) using the 1D Ligation Sequencing Kit and Rapid Barcoding Sequencing Kit respectively. To test the approach they used 4 CLL Biobank samples that have previously been confirmed with NGS sequencing, SNP array and WGS. IgHV mutation status showed high homology when after using Canu to build a consensus and error correct. Further, all productive rearrangements were found after the data was further polished with nanopolish, importantly they found 6 SNVs in TP53 gene in the across the 4 CLL patients. Two indels were missed by the analysis pipeline however upon analysis of bam files, indels could be identified. QDNAseq was used to call deletions on WGS data, successfully calling loss of 17p in one case, although it did miss a minor 17p deletion. Adam concluded by saying the test showed promise and further work will be focused on improving the analysis pipeline.
Isac Lee from Winston Timp's group at Johns Hopkins University spoke about his research using nanopore reads to simultaneously perform methylation profiling and chromatin accessibility in breast cancer. Isac has adapted NOMe-seq (Nucleosome Occupancy and Methylome sequencing) for use with nanopore sequencing (NanoNOMe). The method involves pre-treating nuclei with a GpC methyltransferase prior to DNA purification, the methyltransferase is only able to methylate GpC sites that are not bound by nucleosomes (open chromatin) thus providing a footprinting effect detailing the position of nucleosomes. The resulting DNA is then sequenced and analysed with nanopolish to simultaneously identify CpG methylation and nucleosome occupancy (GpC methylation).
The open chromatin state and CpG methylation pattern was shown to correlate well with previously published NOMe-seq data at CTCF sites. Isac then went on to show data at the MALAT1 locus, a gene known to be downregulated in breast cancer, in two breast cancer cells lines (MCF-10A and MCF-7). In MCF-7 cells, where MALAT1 is downregulated, the promoter is methylated and inaccessible. Whereas in MCF-10A, where MALAT1 is expressed, the promoter is unmethylated and the DNA is accessible in an allele-specific manner. In conclusion the NanoNOMe-seq builds on earlier technologies extending the ability to simultaneously study accessibility and CpG methylation without the use of bisulfite treatment. Using long reads with NOMe-seq enables studying the heterogeneity of chromatin states as well as allele specific chromatin states.
Cecilia Yeung and Olga Sala-Torra from the Fred Hutchinson Cancer Research Center presented their research using nanopore sequencing to identify fusion transcripts and mutations in leukaemia. Cecilia spoke about clonal heterogeneity in leukaemia and how this can affect response to treatment and relapse. It is important to rapidly detect fusions to be able to provide the correct therapy as fusion transcript leukaemias are frequently resistant to standard chemotherapy but highly responsive to specific treatment. Current methods to identifying fusions (karyotyping, FISH, qPCR) are either too slow, lack sensitivity or are limited to specific fusion events.
Olga went on to discuss their initial work in generating an assay to identify multiple fusion transcripts in a single experiment with nanopore sequencing by making alterations to the ArcherDX NGS kit to make it compatible. These changes involved increasing the amplicon size and making the library prep quicker. Their analysis workflow used minimap2 to align the reads and nanoSV to call the fusion events. When tested on known cell lines and patient samples 6/10 fusion transcripts were identified correctly, in 3 out of 4 cases where the known fusion was not found a different translocation was found. Further work is being carried out to optimise and streamline the library prep and analysis pipelines to improve the assay. Finally Olga showed their data looking at FLT3 tyrosine kinase mutations in Acute Myeloid Leukaemia (AML) which have poor prognosis and higher relapse rates. Here they used long range PCR to amplify the 2.4Kb region of the FLT1 gene, using minimap2 and canu to align and build a consensus from the reads. In all cases the assay identified exon 14 duplications and have ongoing work to look at SNP calling.
Breakout 3: environmental biosurveillance
MinION in the marine environment: from identifying tetrodotoxin producers to tracing sharks and rays using eDNA - Reindert Nijland
Reindert Nijland, Assistant Professor at the Marine Animal Ecology group at Wageningen University, Netherlands, opened the session by explaining that, since 2015, the neurotoxin tetrodotoxin (TTX) present in Dutch mussels and oysters, has exceeded the levels allowed for human consumption. The peak in TTX levels correlates with an increase in water temperature above 18C and also coincides with a diatom bloom, which typically occurs in the second week of June. The TTX is produced by bacteria and accumulates in the host organism; however, gene cluster responsible for TTX biosynthesis is unknown and the microbe responsible for the Dutch TTX outbreak had not been identified. In the literature many species of bacteria had been suggested as the cause of TTX production. Aiming to understand this situation, the team at Wageningen University utilised the MinION to sequence metagenomic DNA taken from TTX positive shellfish. They aim to use both whole metagenome and 16S amplicon sequencing approaches in order to characterise the TTX gene cluster and to identify microbial species and abundance. A significant challenge encountered with the whole metagenome approach on mussel gill-derived samples was the presence of high levels of contaminating host DNA (99%). This was not resolved by changing the sample type to faeces. However, having the MinION, allowed rapid testing of different sample sources and preparation protocols, without the wait associated with sending DNA to a sequencing centre. Using 16S analysis on the same low-quality samples allowed the identification of all potential TTX producing genera. In anticipation of the next TTX outbreak, the team are now testing alternative sample/tissue sources, such as mussel and oyster intestines with the aim of reducing host contamination.
In the second part of his talk, Reindert described how his team are planning to use the MinION as part of a mobile sequencing laboratory, with the intention amplifying and sequencing environmental DNA (eDNA) to investigate the presence of large, hard-to-spot, marine animals such as sharks and rays around offshore wind farms in the North Sea. Windfarms are assumed to attract such large animals as they provide a diverse habitat with increased biodiversity and shelter. The anticipation is that analysing eDNA from a water sample could replace the classical and more expensive surveillance methods of dredging or using divers. In a pilot experiment, the team isolated DNA from water taken from the aquarium at Burgers’ Zoo in the Netherlands. Within seconds of starting the sequencing run on the MinION, sequencing reads matching the scalloped hammerhead shark were identified. The team are aiming to optimise their protocol for use on board a research vessel, with the aim of going from sample to result within 4 hours.
According to Reindert, both of the applications he described in his presentation ‘demonstrate the opportunities of mobile, real-time, long-read sequencing enabled by the MinION’
How MinION sequencing is helping us solve real-life environmental biotechnology issues - Aleida Hommes - de Vos van Steenwijk
Aleida spoke about how the team at Orvion B.V. have implemented the MinION to provide insight into two real-world environmental biosurveillance applications. The first case study centred on a waste landfill near Rotterdam harbour which has been found to be leaching the commonly used herbicide Mecoprop (MCPP) into the ground water. As this herbicide doesn’t degrade under naturally occurring conditions, it is gradually spreading to a wider geographical area. As such, it may start to pose risks to drinking and surface water quality. In order to remedy this situation, the team investigated whether MCPP degrades under stimulated conditions using oxygen, nutrients and bacteria (waste water treatment sludge). Under these conditions MCPP degrades extremely well (99.95% degradation). In order to elucidate the bacteria and the relevant genes which are responsible for this degradation they undertook whole genome sequencing using the MinION. Pseudomonas species were found to be the most abundant bacteria. The team are now culturing the MCPP-degrading bacteria in order to increase their concentration prior to, hopefully, performing a pilot bioremediation project to determine the feasibility of this technique. Aleida stated that nanopore sequencing makes an amazing impact to their research in terms of speed and costs.
In a second study, the team at Orvion B.V., together with a leading water company, are using the MinION to determine the microorganisms present in a number of different waste water treatment plants (WWTP). In these plants, there are a plethora of bacteria which are helping to purify the water; however, there is currently not much known about the composition of these bacteria. They are currently monitoring three different waste water treatment plants monthly using the MinION in order to develop a microbial baseline and a series of microbial indicators that can act as an early warning system (for example for inhibitive nitrification and bulking sludge), indicating sub-optimal performance of the WWTP. Initial results showed good correlation of biodiversity between the three treatment plants. The team now plan to expand his to other treatment plants to generate a reference database.
Closing her presentation, Aleida listed a number of future research areas in which microbes and nanopore sequencing may make water treatment processes more effective, including analysis of antibiotic resistance, pathogen removal and degrading specific compounds (including compounds of emerging concern such as medicines and pesticides).
Bacterial species level identification and strain level differentiation using repetitive extragenic palindromic based amplicon sequencing with Oxford Nanopore technology - Lukasz Krych
In the final talk of the environmental biosurveillance breakout session, Lukasz Krych, a post-doctoral researcher at the University of Copenhagen described a new method for fast and cost-effective bacterial species-level identification and strain-level differentiation. Lukasz first described a DNA enrichment method based on repetitive extragenic palindromic PCR (rep-PCR) that has, in the past, been used in combination with pulsed-field gel electrophoresis (PFGE) for bacterial strain-level discrimination. While this technique can provide strain-level resolution in bacteria analysis, it is particularly laborious and time-consuming. Lukasz informed the audience of his surprise when he learnt of a colleague using this antiquated method to analyse over 700 samples, primarily due to its low cost. This got Lukasz thinking and he soon came up with a modification to this technique, named ON-rep-seq, that allows cost-effective strain-level sequencing using the MinION. Further this process could be achieved in less than 5 hours.
Lukasz presented results demonstrating that the technique provides highly reproducible peak profiles (instead of the fingerprints generated by the original electrophoresis analysis techniques), which correlate to the size and abundance of the DNA fragment sequenced. These peaks act like molecular fingerprints, which, using a novel analysis algorithm, allow visual strain-level differentiation of microbial strains. The optimised analysis pipeline also allows the generation of high-quality reads (>99% accuracy) which can be used to for species-level identification. This technique requires significantly less data for strain-level identification than whole genome sequencing, allowing increased throughput while also decreasing costs and time to result. In fact, Lukasz stated that up to 1200 isolates could be run on single flow cell. The entire ON-rep-seq analysis pipeline will soon be available on Gitub.
Breakout 4: Clinical microbiology
Amy Gargis, from the BioDefense Research and development laboratory section of the U.S. Centre for disease control & prevention, spoke about her work using the MinION sequencing platform to perform whole genome sequencing (WGS) and antibiotic resistance detection in Yersinia pestis (causative organism of plague) and Bacillus anthracis (causative organism of anthrax). - content available only within the nanopore community.
Blake Hanson, from the University of Texas Health Sciences Center, spoke about the efforts he and his team have used to elucidate complex resistance mechanisms employed by a range of pathogenic organisms using nanopore sequencing. Starting his presentation, Blake outlined the global threat antibiotic resistance poses to human health. Blake stated that antimicrobial resistance in Gram-negative bacteria is a serious concern and gave the examples of organisms such as Klebsiella and Escherichia causing thousands of infections and hundreds of deaths each year in the US alone.
Moving on, Blake discussed how resistance mechanisms, such as beta-lactamase encoding genes, disseminate through microbial populations. Along with horizontal gene transfer of resistance genes on mobile elements such as plasmids, small transposable elements can “jump” carrying single or multiple resistance determinants at a time.
Using long read nanopore sequencing on the MinION platform Blake and his team attempted to identify the resistance mechanism in a clinical E.coli isolate with an abnormal resistance profile to Tazobactam/Piperacillin (TZP). Interestingly this resistance profile was previously incorrectly defined as susceptible by a common clinical typing method. This unusual resistance profile manifested as an increasing ability to resist antibiotic stresses under increasing antibiotic concentration. Resolution of transposon repeat units was thought to be key in understanding this phenomenon and using MinION sequencing Blake and his team were able to find evidence for a single large plasmid with multiple 10 kb repeats conferring resistance.
Blake moved on to talk about another example of a plasmid-encoded resistance mechanism in Klebsiella.After running on a pulsed-field gel, there was a suggestion that either a single or multiple circular or linear plasmids were present in these resistant strains. Using two single reads Blake was able to span the majority of the plasmid suggesting it was a single, large circular plasmid that was conferring resistance.
In the final section of his talk, Blake described a case study of a patient who developed septic shock following transuretheral resection. The blood and abdominal fluid cultures produced multidrug resistant Pseudomonas aeruginosa and although the treatment regime was successful, this strain had only previously been detected in neighbouring Mexico. This was an excellent example of how this approach can be used to monitor and track dissemination of antimicrobial resistance.
Blake finished by describing future projects aiming to undertake large scale surveillance of AMR and stated, “these are mechanisms we would not have been able to find with any other sequencing technology”.
Simon Grandjean Lapierre and Niaina Rakotosamimanana
Simon Grandjean Lapierre and Niaina Rakotosamimanana spoke about their recent efforts to develop real-time TB diagnostics at the point of care in low resource and high burden settings in Madagascar. – More to follow
Breakout 5: Data analysis (Assembly)
Antony Bolger: LOGAN: Lossless graph-based analysis of NGS datasets
Tony began by outlining the sequencing projects his group is involved in, starting with short read sequencing of solanum pennellii (wild tomato), and moving on to using nanopore is assemblies of engineered bacterial strains and resequencing of solanum pennellii.
The solanum pennellii dataset was released last year, and reached a similar N50 to previous assembly, but much more quickly with the introduction of long reads!
Tony went on to display the metrics of multiple runs, including average read lengths and yields per run. Citing this variability, Tony described the challenges of using nanopore sequencing for large genomes, including tricky DNA extraction from plants due to a high prevalence of secondary metabolites. After the team’s exploration of successful vs unsuccessful runs, he offered the advice to tune each run by adjusting one single parameter, so that you learn how to optimise your process for your genome of choice.
Moving on to describing LOGAN, Tony explained that highly specialised bioinformatic tools don’t meet current analysis needs as there is a range of sequencing data that can be incorporated into assemblies from multiple technologies as well as supplementary scaffolding data such as that from optical mapping. Taking this idea further, Tony posed the question – how do we use a “less-than-optimal” reference to help us?
All of these ideas contributed to the conception of LOGAN, that is based on a memory-efficient sparse De Bruijn graph, with routing information. Unlike the established Overlap-Layout-Consensus approaches, LOGAN is suitable for both long and short read data.
Presenting some initial results, Tony illustrated the computational efficiency of the LOGAN graph approach across a wide range of datasets.
Finally, Tony explained how the team’s short term target is hybrid error correction, and correcting long nanopore reads using the LOGAN graph.
Todd Michael: The complex architecture of plant T-DNA transgene insertions
Dr Michael began by explaining the significance of Arabidopsis thaliana: a model plant with a 150 Mb transformable genome organised into 5 chromosomes. Over 1000 lines of A. thaliana have been re-sequenced, meaning there are platinum, hand-curated assemblies available for comparative analyses, and it’s probably the most studied plant to date.
However, in recent times, this extensive resequencing combined with optical maps revealed errors in the platinum-grade TAIR10 assembly. This meant that work still needed to be done to improve the best existing assembly to date, using single molecule technologies that could help to span problematic regions. For this, Dr Michael’s team decided to use nanopore sequencing.
Dr Michael went on to describe his workflow for assembling the A. thaliana genome, including a single-day extraction of HMW DNA, followed by preparation of rapid and ligation libraries, taking between 10 and 60 mins. Subsequent sequencing, basecalling and high-quality assembly was completed within one week. The longest read the team have obtained so far is 1.1 Mb.
The assembly process followed two different paths – the first an initial overlap and layout with minimap and miniasm, followed by three rounds on consensus building and polishing with minimap & racon, and finally a round of Pilon polishing. The second alternative was an initial layout with Canu, followed by the same downstream process – three rounds of minimap/racon and a single round of Pilon. The Canu process takes much longer, particularly in highly repetitive genomes. Further details on these methods can be found in the Michael et. al Nature Communications paper here. (https://www.nature.com/articles/s41467-018-03016-2).
Using these assembly workflows, the team were able to resolve the sequence into 40 contigs, comparing favourably to the 104 contigs in the current platinum-grade assembly. This equated to 83% of the non-centromeric misassemblies being corrected. Dr Michael also explained how nanopore assemblies can reach a high per-base quality after consensus piling and polishing, getting the quality of the genome up close to the gold-standard references.
Turning his attention to the utility of long reads, Dr Michael described how even small inbred plants have microvariation that cannot be captured by short read sequencing. There are dramatic differences between individual accessions, and as an example, Dr Michael pointed to a nested translocation of a gene cluster that had been unresolvable using Sanger and BAC sequencing.
Moving on to T-DNA insertions, Todd described how plant transformation relies on Agrobacterium transformation, which involves the replacement of its native tumour-inducing plasmid elements with customisable cassettes. This allows random integration of a transfer DNA (T-DNA) into any plant genome, but the size and complexity of insertions has proven difficult to resolve.
By sequencing just a handful or random T-DNA lines, Todd and his group leveraged long reads to identify hemizygous reads that don’t assemble and demonstrated that long reads enable identification of genomics regions that are segregating in the background.
Salk line inserts do not assemble well due to multiple, highly repetitive copies of the T-DNA and vector, and even with fewer and less complicated inserts, most of the SAIL T-DNA cassettes cannot be resolved. Single-molecule technologies, i.e. nanopore sequencing and optical mapping, resolve T-DNA count far beyond what was previously known.
Taking this one step further, Dr Michael was also able to gain unprecedented insight into T-DNA-associated epigenetic changes, such as an inversion resulting in gained histone domain.
In conclusion, Dr Michael described how nanopore sequencing enables a small plant reference-quality genome in just a week, better than 20 years of hand curation, and that long reads are essential to resolve even microvariation that is missed by short read sequencing. These long reads are also essential to identify structural variation and hemizygous changes, making them ideal to help characterise highly repetitive T-DNAs that can span over 200kb. Finally, reference quality genomes constructed with nanopore sequencing allow a new level of epigenetic analysis, to truly visualise the impact of T-DNA insertions on the wider genome.
Danny Miller: Genome assembly in Drosophila using nanopore
Danny Miller began by explaining that things that we learn working with small organisms translate well to larger genomes, so can we begin to answer questions about the human genome from the Drosophilagenome?
Following this, Danny posed two questions: can we use nanopore to generate a reference-quality Drosophila melanogaster genome assembly? Can this be extended to other species of Drosophila at low cost?
The 12 core Drosophila genomes cover about 50-70 million years of evolution, so they are a perfect test case to establish the utility of nanopore sequencing, including whether these genomes can be improved with longer reads.
Asking whether a reference-quality genome assembly possible using nanopore, Danny broke the whole process down into three steps: DNA isolation, library preparation & sequencing, and assembly. Starting with the Drosophila melanogaster reference genome, the DNA was first isolated by homogenising tissue from 75 individual flies using a Qiagen extraction kit, and the resulting library run for 24 hours on an R9.5 flow cell, giving an average read length of over 7 kb.
Following this, Dr Miller tested three different approaches for assembly, using either minimap/miniasm, Canu, or DBG2OLC. All assemblies appear highly contiguous, with similar total genome size and comparable number of contigs, but the contig N50 ranging from 3.0 to 9.9 Mb. Interestingly, the DBG2OLC assembly had both the highest number of contigs and the longest contig N50. Minimap and Canu gave nearly equal contiguity, but minimap is much faster.
Looking to improve this further, Danny asked if combining two assemblies result in a better overall assembly. Would merging higher contiguity and lower contiguity approaches give better overall statistics?
In this case, that appears be so, as a Canu and DBG2OLC assembly gave fewer contigs than either individual assembly and doubled the contig N50. Using BUSCO, the initial assemblies were assessed to be low quality, but polishing with either nanopore-only or short read data significantly improves this score. The addition of optical mapping data improves assembly even further.
Generating a high-quality assembly revealed novel structural variation, including transposable element insertions and losses, duplications and copy number increase in tandem arrays.
Dr Miller went on to explain how he extended this process to more drosophila genomes, with the aim of keeping cost low. This time, DNA isolation was performed with pestle homogenisation or dounce tissue grinder followed by phenol/chloroform extraction, before prepping the DNA with the 1D ligation kit. Sequencing this time yielded nearly 100 billion base pairs (100Gb) in total, approximately 30x coverage for each genome.
Dr Miller reminded the audience that small labs have limited resources, and computational resources are expensive, so the ultimate aim is a simplified and streamlined assembly process that still results in high quality assembly. Exploring several polishing approaches, Danny found that for racon and pilon, the BUSCO increases before plateauing, and the best results were obtained by using both racon and pilon simultaneously.
But, could the reference genome be improved with nanopore data? Nanopore assemblies closed 64,000 of the 81,00 gaps in the reference genomes at a first pass try, indiating the utility of Danny’s approach.
In conclusion, Danny and team were able to assemble a reference-quality genome of Drosophila melanogaster, and apply this approach to several other Drosophila genomes. Could the team create a process for small labs to create high quality genomes? The answer – yes.