London Calling: Day 2 (morning) writeups
Fri 25th May 2018
Thanks for visiting this page - please come back soon as we will continue to load content throughout the day
Day 2 of London Calling was opened by CEO Gordon Sanghera, who showed Matt Loose's entire 2.3Mb read on the giant screen. He reminded the audience that long read nanopore sequencing is now evolving to high throughput formats GridION and PromethION. Gordon introduced to the stage Professor Dan Branton from Harvard University, and Professor David Deamer from UCSC, who spoke about the conception of nanopore sensing as far back as 1989.
Timothy Gilpatrick: Cas9 targeted enrichment for nanopore profiling of methylation at known cancer drivers
After introducing his genetically inherited moustache, Timothy began his talk by discussing what the advantages of using a Cas9 targeted enrichment approach in combination with nanopore sequencing are. Sequencing a large complex genome on the MinION device may produce lots of sequencing data, but restricting sequencing resource to specific regions of interest allows much higher coverage to answer specific research questions. Furthermore, the ability to multiplex many enriched samples which can be run concurrently significantly reduces cost for an end user while maintaining the ability to interrogate large genomes.
Timothy started by posing the question: why are we interested in doing this? The goal of the work is to enrich for specific loci using the CRISPR/Cas9 system and generate higher coverage at selected loci using just a single flow cell. Mapping out the structure of the talk, Timothy prepared the audience to explain how the approach works, how the assay is currently performing, and how the methylation analysis works, moving finally onto a glimpse in to how this work could be brought into a clinical setting.
The Cas9 enzyme introduces double strand breaks into DNA, and uses a guide RNA comprising of a crRNA and a tracrRNA to target a particular location. The crRNA binds to the DNA and distorts it, allowing Cas9 to bind. After the break, the section of DNA containing the PAM site is less deeply integrated with Cas9 enzyme and so is more likely to dissociate. As a final step, biotinylated capture oligos are used to pull the Cas9-bound section out and into solution.
Examining some initial results, Timothy presented their foray into enrichment for the prokaryotic 16S rRNA gene – this time in E. coli. The 16S rRNA gene is very slow to evolve and so is useful for characterising microbes, as well as being well-studied, making it an ideal choice for primary approach validation.
The resulting coverage graphs displayed 7 clear peaks, equal for the both positive and negative strand, and corresponding to the 7 predicted binding match sites present in the E. coli genome. In total, the coveragewas significantly increased. Looking at the graph in closer detail, Timothy pointed out that most initiate, then terminate at the location where Cas9 has introduced the double strand break. However, some do not terminate at this point, instead producing a characteristic “double bubble” in the curve by dipping in the middle of the expected peak region. The reasoning for this in unclear, and the subject of further study. Crucially, this unusual shape to the curve is reproducible across multiple samples.
The E. coli genome is only approximately 4 Mb long, so having successfully tested the approach, Timothy and team moved onto a more complex sample: a Zymo mock community containing 8 prokaryotic and 2 eukaryotic species. Timothy noted that in this case, pulling their target out with equal fidelity in each genome is not predicted, as each genome is composed differently and contains different number of potential target binding sites.
The DNA was sheared to 7kb but interestingly the resulting read lengths were closer to 2kb in length. Timothy suggested that this was perhaps due to compounding factors including Cas9 breaks, mechanical shearing during the library preparation steps and also a possible bias in the method towards shorter reads. The Zymo mock community experiment generated strong peaks at higher coverage, including 1000x on-target coverage for Salmonella. Interestingly, in species that didn’t have a predicted binding site for the crRNA, reads were still produced, but the curve had a different appearance. Instead of clear initiation and termination sites, the curve was symmetrical, indicating the possibility that the Cas9 enzyme was binding to the DNA, but the sequence match was imperfect so the double strand break was not introduced.
Moving on the work on human telomerase, Timothy gave a quick refresher that telomeres are positioned at the end of the chromosomes and shorten with each cell division, causing cells to senesce after a finite number of divisions. Telomerase remains active in many forms of cancer, allowing cells to resist apoptosis. Methylation is frequently disrupted at hTERT, causing variable expression of the gene.
Starting with 2ug of DNA from thyroid cancer cell lines, a 50-fold increase in coverage was achieved, but background DNA and off-target material was still coming through. To define “on-target”, the team selected any reads that fell within a 30 kb window of the intended target site, and anything beyond this window was deemed as off-target.
The second approach, devised by Oxford Nanopore, uses an inactivated version of the Cas9 molecule that does not cut the DNA. This is so that single guide RNAs, targeting sequences internal to the region of interest, can be used with a fragmented library. Testing enrichment with this catalytically inactive Cas9 gave some enrichment centred around the position of the crRNA, but this wasn’t consistent, highlighting the need to find a good dead Cas9 that gives reproducible results.
Timothy moved on to discuss work to characterise the methylation patterns of the hTERT promoter region, reminding the audience that cytosine methylation is part of a complex organisational system that regulates transcriptional activity. The classical thinking about cytosine methylation is that it is a silencing modification, but this isn’t necessarily the case in all situations. Long range methylation patterns are discernible with nanopore reads, and SNPs can be used to phase maternal and paternal alleles. To do this, the SNP of interest needs to be in the region for which you are profiling the methylation pattern – very hard to do with short reads.
Finally, Timothy described how this work could be taken to a clinical setting, multiplexing promoters on a single flow cell and using this to screen cancers and perhaps even to guide therapeutic strategies for cancer. The team’s next project is investigating FSHD, caused by shortening of hypomethylation of a 3.2 kb tandem repeat on chr4.
Writeups coming soon..
Breakout room 1: Human genomics
Christos Proukakis: Detection of GBA missense mutations and other variants using the Oxford Nanopore MinION
Christos Proukakis began his talk by explaining why GBA is an important gene, detailing how homozygous or biallelic mutations in GBA result in Gaucher disease. Gaucher disease is a lysosomal storage disorder where macrophages become engorged with lipids, affecting the liver, spleen, bones, and, haematopoesis and it can be neuronopathic. The reason for studying GBA comes from astute observations by people working with Gaucher patients: that relatives of the patients (so carriers of the disease) were getting Parkinson’s disease more than they should.
Heterozygous mutation carriers are at higher risk of Parkinson’s disease, the second most common neurodegenerative disorder. Decreased activity of the associated enzyme possibly leads to Parkinson’s – but this is a topic for a different day.
Christos went on to explain why GBA is challenging – an 8 kb gene with 11 exons, but crucially a pseudogene just 20 kb which is virtually identical in parts. The pseudogene displays 96% overall homology, with the highest homology found between exons 8 – 11, where the most pathogenic mutations are found. Missense mutations are often a result of recombination of integration of the pseudogene sequence, rather than single nucleotide variation.
GBA is usually sequenced via Sanger sequencing of 10 exons or short targeted sequencing, but this can fail, miss variants, and, not provide all the information you need. A recent paper by Christos’ team describes the problem: too many short reads align to the pseudogene to make analysis clear. Exons that have the common mutations are tricky to align correctly.
Christos explained that this work indicates a new method of sequencing is required that can characterise a full length amplicon and detect all known mutations. For their workflow, Christo’s team used low quality DNA from saliva or frozen brain tissue, using phenol:chloroform for DNA extraction or spin column for the saliva. This was followed by PCR of the whole gene and library preparation – for some cases multiplexing 12 samples on a single flow cell.
GraphMap or NGMLR were used for alignment to the gene, and NanoOK was used for metrics that gave reassurances that the sequencing was returning good base identity. Variant calling was performed with nanopolish and Sniffles, as well as WhatsHap being used for phasing, which is crucial for clinical genetics.
Christos went on describe how early work with R7.3 flow cells was sufficient to detect known missense mutations in two samples, and happy that the proof of concept was successful. The team have moved on to using the newer chemistry to examine a larger number of cases that have known mutations.
Blinded analysis of 1D libraries of 10 Gaucher patients, carriers and other samples on R9.4 flow cells detected all mutations successfully at correct zygosity. Christos explained that while they see false positives, these are easy to filter out so far. NGMLR data allowed detection of a 55bp coding deletion in RecNcil, a recombinant variant containing pseudogene sequence.
In conclusion, Dr Proukakis noted that his team’s protocol allowed detection of all mutations in the samples tested, with correct zygosity and phasing. Intronic SNPs were able to be called and assigned to common haplotypes, at a recommended coverage of 200-300x.
Graham Taylor: Nanopore sequencing in clinical diagnostics
Graham Taylor started by explaining that this talk was one on the application of nanopore sequencing in clinical diagnostics, describing the difference between research and clinical users as similar to the difference between fighter and commercial airline pilots. Diagnostics requires reliability and simplicity, where research is more cutting edge and can cope with less usability.
Clinical genomics presents several challenges to short read technologies, such as identifying translocations in cancer, phasing and haplotype analysis, trinucleotide repeat expansion resolution and identification of pseudogenes and paralogues.
The objectives of Graham’s work are to develop accredited testing which is cost efficient, easy to access and close to the patient or clinic. Exploring the idea of sequencing in a clinical setting further, Graham explained that staff expertise is often focussed towards following standard procedure with suboptimal IT facilities available, meaning analyses often need to be scaled down in order to be feasible. Scaling down to meet local capacity requirements means MinION is an attractive prospect, and has more than adequate capacity for 16S sequencing, AMR profiling, genes panels and targeting structural genomic changes.
This brought Graham to question how suitable MinION is for use in a diagnostic setting.
Graham outlined how long reads certainly have utility, with very high mappability due to their length. As an example, Graham demonstrated that 90% of the reads in the public Nanopore Consortium human dataset had a mapping score of 60. Although basecall quality was perhaps lower than one might hope for, Graham noted that this does not decline with read length and remains consistent across a given read.
Examining read lengths versus read counts further, Graham posed the question: is there a sweet spot for mapping quality and read counts? For read quality and abundance, the sweet spot appears to be between 1 and 30 kb reads. This is a useful area for Graham’s team to work in, and he describes how they are less interested in one-off spectacular long reads, and more influenced by consistent mapping of medium-length reads.
An area where Graham sees nanopore sequencing going “straight into service” is to elucidate repeat expansion in the HTT gene that plays a crucial role in Huntington’s disease. Using the RepeatHMM tool published in 2017, Graham noted how they could start to match reported repeat length by PCR to the nanopore result. The concordance was extremely high, and Graham plans to takes this data to ISO accreditation. The next steps for the team cover applications in microbiology, cancer and rare disease.
In the final part of his talk, Graham described how he is using restriction enzymes for enrichment instead of Cas9 pulldown, chopping away the bits you don’t need to keep the bits that you do. Digesting most of the genome away into small fragments and isolating the larger fragments gave 15% on target reads that mapped exactly where they were supposed to, with the added advantage of being cheap to produce.
Initially, Graham sees that clinical labs will use long reads to supplement short reads, but only long read sequencing can deliver true whole genome sequencing. There is a place for rapid low cost sequencing too. His main aim is to try to understand to what extent nanoproe technology can meet these needs now, and the team is having fun doing it.
Alba Sanchez-Juan: Complex structural variants resolved by short-read and long-read whole genome sequencing in Mendelian disorders
Alba Sanchez-Juan initiated her section by reminding the audience that structural variants can be classified into canonical types, but also non-canonical or complex types typically involving 3 or more breakpoints. There are 16 subclasses of non-canonical variations, ranging from relatively simple to extreme rearrangements. Clinically relevant complex structural variants (cxSVs) are not typically considered in research and clinical diagnostic pipelines.
The aims of Alba’s team's projects were to identify cxSVs in 124 individuals with Mendelian disease from the NIHR BioResource project and resolve their variant configuration.
The general workflow was to perform initial short read sequencing, followed by alignment using Isaac and SV calling with Manta. After identifying possible cxSVs and subsetting into clinically interesting variants, Sanger sequencing, microarray analysis, and, nanopore sequencing were performed to help understand the cxSV.
Long read whole genome sequencing was used for cases that could not be resolved by any other method. Of four complex cases, three cases had their pathogenicity determined by short and Sanger sequencing, but the 4th case was classified as a variant of uncertain significance (VUS).
The team determined two alternative models for the structural variation present in case 4, with alternative breakpoints in each. Understanding which mechanism had caused the variant was essential to understanding the pathogenicity the variant was causing.
The workflow for this case was to sequence 1D ligation libraries on an R9 flow cell, followed by Albacore basecalling and analysis with NGMLR, LAST, and, sniffles. The read length distribution peaked at 8 kb, but coverage was low, so manual analyses had to be performed. Phasing was confirmed, as well as the presence of an intact copy of CDKL5 alongside a disrupted copy, indicating the first proposed model of variation was probably correct. Nanopore sequencing confirmed all novel breakpoints of the model that could not be confirmed by Sanger sequencing. The sequences surrounding the breakpoints were too repetitive for PCR amplification, so analysis by another method was tricky.
To verify their observations, the team also performed gene expression analysis, demonstrating that the child was expressing both alleles of the gene when compared to the parental expression patterns.
The team hypothesise that multiple mutational mechanisms could mediate formation of cxSVs, and Alba explained that looking at breakpoints gave rise to an understanding of which is the most likely mechanism to cause complex rearrangements, and so the most likely model of rearrangement.
In conclusion, Alba described that long read whole genome sequencing is the best way to understand genetic architecture in frequently misdiagnosed cases.
Breakout room 2: Metagenomic studies
Jangsup Moon: Rapid Bacterial Identification from Clinical Samples using Nanopore Sequencing
Jangsup Moon, based at Seoul National University Hospital, discussed how 16S rRNA gene sequencing of clinical samples on Oxford Nanopore’s MinION device enables the rapid identification of bacterial pathogens in clinical samples. He explained the advantages of a 16S sequencing-based method of identification: diagnoses can be made for unculturable bacteria from small amounts of specimen, including those obtained after antibiotic use. MinION sequencing is also considerably faster than currently used culture-based methods, and enables metagenomic sequencing for the identification of polymicrobial infection. Jangsup demonstrated these advantages with multiple examples from his research experience using clinical samples from patients with bacterial infections.
First, Jangsup recounted the case of a patient with meningitis caused by Campylobacter fetus, for which blood culture was positive. gDNA was extracted from a subcultured colony, 16S amplified and sequenced on a MinION; after less than an hour, Campylobacter fetus was successfully identified. Jangsup and his team then set about testing whether culturing was necessary for MinION-based identification; the process of growing bacteria, harvesting a single colony, re-plating and growing, then 16S amplifying is much slower and labour-intensive, whilst amplification directly from clinical samples is markedly faster, capable of metagenomic analysis and enables sequencing of unculturable bacteria. To test this, gDNA from cerebrospinal fluid (CSF) of a patient with Listeria monocytogenes (confirmed by culture) underwent 16S targeted amplification. After 1.5 hours, the data was analysed and L. monocytogenes successfully identified. Jangsup concluded that culturing is not necessary for identification of bacterial pathogens via MinION.
Several cases were then presented in which 16S sequencing on the MinION was shown to be more sensitive than culture-based methods in identifying bacterial pathogens. In one example, a patient tested positively for Streptococcus agalacticae by blood culture, but negatively via CSF culture. gDNA was extracted from the CSF sample and S. agalacticae was identified via 16S sequencing on MinION, despite producing a negative result in culturing. In another example, a Streptococcus oralisinfection was confirmed via 16S MinION sequencing from a blood sample, despite producing a negative result for blood culturing, a result supported by a positive result from CSF culture. Jangsup concluded: “MinION 16S sequencing is more sensitive than culture studies!” To further demonstrate the sensitivity of the sequencing method, gDNA obtained from the CSF of a patient that had tested negatively for Klebsiella pneumonia via both blood and CSF cultures following 3 days’ antibiotics, having previously tested positive, was sequenced; the pathogen was still detected via MinION.
Jangsup went on to discuss the use of Oxford Nanopore sequencing in prospective post-neurosurgical infection cases. This time, sequencing was performed concurrently with culture studies for suspected infections. In one case, 16S MinION sequencing of DNA from the CSF of a patient with a VP shunt infection identified Staphylococcus aureus two days before this was confirmed by a culture study. In another instance, the rare, difficult-to-culture bacterium Aggregatibacter aphrophilus was identified via sequencing of abscess drainage; confirmation via culturing took another six days. In the final case Jangsup presented, MinION 16S sequencing enabled the identification of a polymicrobial infection, detecting the presence of anaerobic bacteria that were not possible to culture.
Jangsup summarised that 16S sequencing of clinical samples using Oxford Nanopore’s MinION provides faster, more sensitive results than culturing and makes possible the identification of anaerobic and unculturable bacteria and polymicrobial infections, even in samples that have been treated with antibiotics. He concluded that this demonstrates its potential for use in easy, accessible, point-of-care testing, anticipated that metagenomic strategies will change future diagnostic practices and that "the MinION will play a leading role in that change.'
Rob James: Understanding transmission dynamics of TB and AMR in the environment
Rob James, based at the University of Warwick, spoke about the use of Oxford Nanopore’s MinION in two novel methods: firstly, in the strain typing of Mycobacterium bovis, and secondly in comparing the presence of antimicrobial resistance genes within and between an agricultural and a residential area in Karachi, Pakistan.
Rob highlighted how the spread of Mycobacterium bovis, the pathogen that causes bovine tuberculosis, is an issue of “national and global concern”, with the number of cattle tested positive for bovine TB in the UK increasing over 100-fold from 1986 to 2010. The bacterium has multiple domestic and wildlife hosts, can infect humans, can survive independently of hosts and has a low infective dose. In Africa, the BCG vaccine is not used leading to a higher risk of transmission to humans. Analysis of M. bovis transmission presents several challenges: it is present in low abundance in samples (whole genome sequencing without enrichment yields less than 0.02% M. bovis reads) and isolation and culture of the slow-growing bacterium, currently the only method of strain typing, is a difficult process
Rob described his team’s aim: to develop a rapid strain-typing technique that does not require culture. The novel method is based on spoligotyping - strain-typing via identification of variable repeat regions of the genome. Traditionally resolved via amplicon length on gels to give only a presence/absence test, this method allows for greater resolution using long-range PCR and subsequent multiplexed long-read sequencing of the full region of interest to provide more information - “sub-spoligotyping”. DNA samples were extracted from several samples, including directly from badger faeces and from cultures, barcoded and pooled using the Native Barcoding Kit, then prepared for sequencing in multiplex on MinION using the Ligation Sequencing Kit 1D ("after leaving it for an hour, I suddenly realised I had more reads than I knew what to do with). Alignments of long read data mapped to M. bovis generated full coverage of the region of interest with most data mapping end-to-end, allowing the identification of VNTRs and deleted regions in samples and successfully enabling high-resolution strain typing of samples without the slow process of culturing. A phylogeny was constructed, indicating that BCG strains clustered separately from wildtype strains along with their respective reference strains.
Rob then went on to describe an investigation into the diversity and selective drivers of antimicrobial resistance (AMR) within and between two regions in Karachi, Pakistan: a livestock-producing area and a residential settlement. Mobile- and chromosomally-incorporated class I integron gene cassettes were differentially amplified from samples of slurry (livestock area) and sediment (residential area) via long-range endpoint PCR. These were then natively barcoded and sequenced in multiplex on MinION; basecalling and analysis were performed in realtime using Albacore and the EPI2ME Antimicrobial Resistance workflow, resulting in over 20,000x coverage across genes of interest. An NMDS plot was shown to visually demonstrate that samples from the livestock-producing area clustered separately from those from the residential settlement due to the differential abundance of AMR genes, whilst chromosomally-incoroporated genes clustered separately from mobile elements. Furthermore, more similarity was seen between the mobile elements present in the two areas than between the chromosomally-incorporated genes in the two regions. A box plot displayed the differerences in abundance of specific AMR genes between the sites: samples from the livestock-producing area showed a higher abundance of the carbenicillin-resistance gene CARB.9 (often associated with cholera), whilst extended-spectrum β-lactamase-encoding genes were more abundant in residential samples.
Yu Xia & Tong Zhang: Application of Nanopore-based Metagenomic Sequencing in the Rapid Detection of Resistomes in Wastewater Treatment Plants and Impacted Environments
Yu Xia and Tong Zhang, who have been users of Oxford Nanopore's MinION device since the MinION Early Access Program in 2014, each discussed their use of the MinION in metagenomic analyses.
Tong described the use of sequencing on Oxford Nanopore’s MinION device in the rapid detection of antimicrobial resistance in wastewater treatment plants (WWTPs) and environments into which treated water is released. He noted that the emergence and spread of antibiotic resistance genes (ARGs) are “of serious concern for public health”, and can be detected in many environments. They showed a plot that demonstrated the diversity in abundance of resistance genes in eight sludge samples from a wastewater treatment plant in Hong Kong, with genes resistant to aminoglycoside and tetracycline the most abundant.
Tong detailed the reasons that WWTPs are considered "hotspots" of ARGs: highly diverse bacteria originating from environments including hospitals, agriculture and industry are close to each other and in high density ("the only other places with such a high biomass are the animal and huamn gut", enabling transfer of ARGs. Selective pressure is high, with bacteria being exposed to “almost all the antibiotics” and other substances such as heavy metals. They described how these resistant bacteria can be isolated, amplified and cloned; analysis is possible via qPCR, whilst metagenomics tools enable thorough analysis of known and novel ARGs.
Tong described how he and his team sequenced resistomes from samples of influent, activated sludge and effluent in WWTPs on the MinION device; the data enabled the rapid identification and analysis of ARGs and simultaneous host tracking. He reported that 13 types and 97 subtypes of ARGs of varying abundance were detected, whilst the long reads generated enabled host tracking, important in assessing risk. He concluded that long reads put ARGs back into a “genetic context which could provide valuable insight into the evolution and transmission mechanisms underlying AMR development”.
Yu described the use of long read sequencing via MinION in the analysis of plasmids from a multi-drug resistant strain of coliform bacteria and a comparison with the results of a short-read technology. Whilst the data between the long and short read data generally correlated for the abundance of ARGs, the subtype prediction via MinION was more conservative: the long reads aligned to fewer antibiotic resistance genes, whilst reads from the short read technology would map to multiple genes, indicating the higher specificity of long reads to reference genes.
Yu then asked: what causes resistance dissemination from WWTPs? Here, she investigated resistome presence in ShenZhen Bay, Hong Kong, an area of high population density which receives treated wastewater from a nearby WWTP. Yu described how the bay is in a populated area next to a park and is abundant in fish, birds and other wildlife, posing the potential of bioaccumulation of resistomes in the food chain. To assess the prevalence of resistance genes, bacteria were cultured from samples from the bay and subjected to one or multiple antibiotics; colonies were produced even when samples were exposed to many antibiotics. The samples were then sampled five hours apart and sequenced via MinION. 201 ARGs, conveying resistance for 20 different antibiotics, were identified, with a 1.7x higher abundance of ARGs seen from samples taken at the later time point, demonstrating that ARGs were present in an environment exposed to receiving waters even following wastewater treatment.
Breakout room 3: Transcriptomics & gene expression
Chris Vollmers: Improving ONT MinION read accuracy to enable the high-throughput analysis of single cell transcriptomes
Chris Vollmers and his team from the University of California Santa Cruz spoke about their new method termed R2C2, which allows for the accurate demultiplexing of full length, single-cell cDNA pools on Oxford Nanopore’s MinION sequencing platform.
Chris began his talk by discussing the various methods commonly used to generate cDNA molecules, all of which conclude in a fragmentation step to allow for sequencing by technologies with limited read length abilities. When comparing these technologies with full-length cDNA sequencing on Oxford Nanopore’s MinION platform, Chris suggested that the latter is “…Quantitative, has little length bias and that isoform identification is straightforward(ish!)”.
Next, Chris displayed some alignments of full-length cDNAs sequenced by MinION, showing clear and consistent isoform breakpoints along the length of the reference transcript. Using their in-house developed isoform detection software called Mandalorion (to be released by Chris later this week on GitHub), Chris stated that the full-length cDNA reads provided by the MinION identify widespread isoform diversity between single cells. Moving forward, Chris went on to talk in more detail about the method they developed to increase the accuracy of full-length cDNA reads so that short (10–20 n.t.) cellular identifiers can be used to demultiplex single-cell cDNA pools. An evolution of the method detailed in their recent paper, “Nanopore Long-Read RNAseq Reveals Widespread Transcriptional Variation Among the Surface Receptors of Individual B cells: doi: https://doi.org/10.1101/126847”, their method allows for approximately 75 % of reads to be assigned to a cell of origin with 99.8 % accuracy. Furthermore, single cell gene expression profiles of human b cells generated using this method showed a high correlation with other contemporary short read technologies, and Chris demonstrated that a machine learning based clustering approach (t-SNE) successfully grouped cells highlighting differences in Jchain expression patterns, an important gene involved in antibody secretion.
Chris concluded his talk by showing that the majority of the reads generated by R2C2 cover transcripts completely and thus they were able to identify CD19 isoforms which lack the specific epitopes in single cells.
Nathan Roach: Measuring the transcriptome of the C. elegans lifecycle using direct RNA sequencing
Nathan Roach, from John Hopkins University, spoke about how he is using direct RNA sequencing to measure changes isoform abundance, polyA tail length and changes in the UTR regions of C. elegans across different life cycle stages.
Nathan began by describing his model organism, saying it was perfect for life cycle linked transcriptome analysis. This is because of the large brood size, short generation time and compact well annotated genome. C.elegans also has very distinct developmental changes, making biological and technical reproducibility easer in a complex developmental based transcriptomic study.
Nathan spoke about the amount of data his direct RNA sequencing project runs were generating, and although there was a high degree of variability, each sample studied had more than enough data for the analysis required and the mean QScore was around 12.
Nathan then looked at comparing results of short read sequencing technology with Nanopore Direct RNA sequencing from replicate sample across different life cycle stages. Sensitivities and specificities of Nanopore direct RNA sequencing were then calculated using these comparisons with specificities in “..the high 80s – 90 %”, and sensitivities “… ranging from 60% to low 80s”.
Exon specific splicing of Unc-32 was seen in the young adult stage. Additionally, stage specific 3’ UTR isoforms were also seen with the 3’ UTR region of the Ubc-7 RNA being shorter in males than hermaphrodites.
Moving onto the final sections of his talk Nathan spoke about estimating polyA tail length using software currently under development by Jared Simpson. In comparison with short read sequencing technologies that can be used to target the poly A tail, a very similar length distribution was seen. PolyA tail length was also observed to change over developmental stage with an increase in the variability of length over time.
Drawing to a close, Nathan spoke about future directions he will be taking direct RNA sequencing in his lab. Using a tagged polyA binding protein, the aim will be to isolate and pull-down sequence tissue specific transcripts at different developmental stages.
Miranda Pitt: Evaluating the Extensively Drug-Resistant Klebsiella pneumoniae Resistome via MinION Direct RNA Sequencing
Miranda Pitt, a final year Ph.D. student from the University of Queensland, Australia, spoke about detecting and analyzing drug resistance pathways in Klebsiella pneumoniae using direct RNA sequencing and whole genome sequencing on the MinION platform.
Miranda began by showing the scale of the antibiotic resistance problem by stating that it is one of the leading threats to human health contributing to over 700,000 deaths per year and costs in excess of $950 million in Australia alone. Furthermore, this figure is estimated to increase to 10 million mortalities per year and cost in excess of $US 100 trillion globally by the year 2050. Talking about her study organism, Klebsiella pneumoniae, Miranda said that it is a common global hospital-acquired infection; frequently harbors multidrug resistance; and that there is a current emergence of pan-drug resistance - that being resistance to a number of frequently used and last line antimicrobial agents.
Miranda moved on to discuss the aims of her study. Miranda and her team selected four isolates analysed in a previous study where the antimicrobial resistance patterns, gene profiles, phylogenetic lineages and plasmid prevalence had been highly characterised. Their aims were to attempt to assemble the genomes, determine the time required to detect the “resistome”, and evaluate differential expression of resistance genes between isolates.
Using a rapid barcoding approach to sequence multiple K. pneumoniae concurrently, Miranda found that, after assembly, > 75 % of resistance genes resided on plasmids and that > 70 % of the known resistance genes could be detected within 2 hours of starting the DNA sequencing run. Using the MinION sequencing platform to sequence the mRNA from each of the bacterial isolates directly, gene expression profiles allowed the differentiation of a pan resistant isolate from the other three. Furthermore, the expression profiles of most abundant resistance genes were independently confirmed by qPCR.
Mini-theatre Plant Genomics
Richard Finkers: Know your onions
Richard Finkers from Wageningen University and Research, Netherlands spoke about his group's research using nanopore reads to improve the assembly of the highly repetitive 16 Gb genome of the onion (Allium cepa). Richard started by introducing the cultivation of plants: there are 300,00 known plant species on the planet, 500 of which are cultivated, and only 20 species cover 90% of the fields used for farming. Plant genomes can vary in size, from 100 Mb up to the 152 Gb Paris japonica, and genomes can be a wide variety of ploidy, from diploid, tetraploid, hexaploid to decaploid. Plant genomes can also be very repetitive, with the Maize genome containing >50% repetitive elements. Richard asked the question: can we easily improve existing assemblies with long reads? Using the GridION to generate data to improve the assembly of Solanum habrochaites(tomato), on one flow cell they produced 4.8 Gb of data (5x coverage); even sequencing an old (slightly degraded) sample, they were able to generate reads that reduced the number of contigs in the assembly by 20%. This result provides confidence that low-level nanopore coverage can improve the assembly of existing plant genomes.
Richard then moved on to the 16Gb Allium cepa (Onion) genome. The previous genome assembly was 90K scaffolds; Richard's lab generated 62Gb of data (3.5x), 19 Gb of which was 50 Kb+ reads (~1x). Using the Rapid kit, LSK108 and a pre-release of LSK-109 to generate the data, Richard's group found the proportion of long reads was improved by the LSK-109 kit. Due to the massive size, the analysis has been running for several weeks and is still ongoing. Richard finished by admitting he doesn’t know his Onions that well yet, but should do by the next meeting.
Alexander Wittenberg: Sequencing and assembly of lettuce genomes.
The second talk in the plant genomics session from Alexander Wittenberg was a little gem. Alexander presented the latest research from Keygene where they have sequenced and assembling the 2.7Gb lettuce (Lactuca sativa) genome using the PromethION. Alexander began showing research that Kegene do to enable crop innovation through trait discovery and precision breeding. Such breeding has enabled Aphid resistant Lettuce varieties, reducing the need for pesticides. Lettuce is an important crop, with 74 billion lettuces harvested globally every year.
Alexander went on to highlight the challenges that come with extracting high molecular weight plant DNA cos [sic] of the cell wall, polysaccharides and secondary metabolites present in plant cells. The team have found isolating clean nuclei to be an important step to extracting high quality DNA, finding the plant extraction kit from circulomics to be very successful. With all their PromethION runs, they first run a MinION library before loading on to PromethION. The team sequenced two lettuce lines on PromethION, generating 100x coverage of both lines; the highest yield from a single PromethION flow cell was 76Gb, representing approximately 30x coverage of the genome with an N50 of 29Kb. Using a subset of the data (to make sure the assembly finished in time), they ran a de-novo assembly approach with minimap2 and miniasm; the team found the genome to be 2.56 Gb, with a contig N50 of 7.3Mb (1,169 contigs). The de-novo assembly showed high collinearity with a 2017 publication of the lettuce genome, whilst showing a significant improvement on 2017 assembly, which had a 2.21Gb genome size and a contig N50 of 36Kb (21,116 contigs). Using a hybrid approach with optical maps from Bionano, they were able to generate a scaffold N50 of 145Mb with near chromosome level assembly with just 34 contigs.
Sequencing the lettuce genome is just the tip of the iceberg of the upcoming projects to be run on PromethION at Keygene.
Jean-Marc Aury: De novo sequencing and assembly of plant genomes using nanopore long reads
Jean-Marc Aury presented work from the team at Genoscope, where they have de novo sequenced and assembled several plant genomes (banana, citrus and brassicaceae) using data from MinION and PromethION.
Jean-Marc began by reminding the audience of the challenges for genome assembly: 1. Repetitive regions that lead to fragmented assemblies and underestimation of repeat content; 2. Heterozygous regions on different chromosomes lead to fragmented assemblies and over-estimation of the haploid genome size. Using the yeast genome, Jean-Marc then showed why long reads matter: even with the small yeast genome, you need 30x coverage of 25Kb reads to resolve the genome (single chromosome contigs). Next, he described how genoscope have used >1,000 MinION flow cells since 2014, sequencing >50 different organisms and generating >1.5Tb of Nanopore reads, much of which has been de-novo assembly projects. They then showed a boxplot showing throughput vs flow cell type from R7.3 to R9.4.1 and the increase in throughput over this time, commenting that their best MinION run on R9.4.1 thus far has been 17.6Gb. Jean-Marc showed one yeast MinION run where they had sequenced 2Gb of data with an N50 of 50Gb – with one read, aligning telomere to telomere on chromosome 1!
Moving on to plant genomes, Jean-Marc told us that a lot of plant genomes have already been sequenced, but many are highly fragmented; only 6 plant species have an assembly with a contig N50 > 5Mb. Using 30x MinION nanopore reads, short reads and Bionano data, they have been able to add Brassica rapa, Brassica oleracea and Musa schizocarpa (wild banana) into the contig N50 > 5Mb club, with most of the chromosomes in <= 3 contigs and contigs successfully spanning centromeres. Jean-Marc then discussed the use of PromethION at Genoscope to sequence the banana genome, requiring only 1 flow cell of data to create a similar quality assembly to a previous assembly that previously took 18 MinION flow cells.
Jean-Marc finished with a slide showing the samples that Genoscope have sequenced with the SQK-LSK109 kit (Yeast, Citrus, Carex and Banana), commenting that the kit had provided higher throughputs with less DNA and that they had observed equivalent N50s to LSK108 when they had performed a size selection protocol. Jean-Marc concluded by discussing the large benefit of including nanopore reads in producing high quality assemblies, although the initial DNA extraction optimisation from a new species is still a challenge.
Data analysis tools: Data handling
Matthew Links: Interactive exploration of base calls in nanopore data
In the first presentation from the data handling breakout session, Matthew Links from the University of Saskatchewan showed the audience an audio signal that he created especially for London Calling 2018. He described how one of the conventional ways to look at analysing a signal is through the use of sine waves (a mathematical curve that describes a smooth periodic oscillation); however, because the sine wave varies continuously over time, there is no ability to localise changes in frequency.
He then analysed the same audio file using wavelets – which he described as an odd mathematical function in which the area under the curve sums to 0. The actual analysis involves scaling the wavelet (expanding and contracting) and shifting (taking the wavelet and scanning across time) – the importance is that you can see changes in frequency localised in time. Wavelets have potential as a way to segment nanopore data.
Another way wavelets could be used is a way to remove noise from nanopore signals. Matthew then proceeded to show a raw nanopore signal where the peaks seem unstable and with varying widths. The denoised nanopore signal provided a much more uniform width for each peak and the signal level at any point in time was much more stable.
He then presented nanoView, which is a web-based way of interacting with the data.
He showed a reference plot for some lambda DNA in both nanoraw and nanoView confirming consistency between the tools. After processing with wavelet data, there is a pronounced change which he advocated was more indicative of what you would expect the behaviour of the signal to be as the DNA ratchets through the nanopore.
Matthew believes that analysing nanopore signals with wavelets will provide a clearer view of single nucleotide accuracy.
Andreas Hauser: PromethION: an alpha machine in production
The last speaker for this session, Andreas Hauser from the Laboratory for Functional Genome Analysis (LAFUGA) in Munich presented his lab’s experience of using the PromethION. He explained how they are a relatively small group but with significant experience of running high-throughput short-read sequencing. The lab operates like a core facility and, as such, is highly experienced in sequencing a wide variety of organisms.
The lab was one of the first to receive the alpha version of the PromethION and Andreas expressed how, since early this year, they have been getting highly productive yields. So far, they have sequenced human, raven, cow, pig and sponge using the PromethION. He pointed to the example of their highest flowcell yield of 88.6 Gb for a cancer genome. In addition, Andreas mentioned that the highest read N50 delivered so far is for their pig genome, which was 36 kb.
He then described a new quality control tool that he has developed called LongRead Plots (or lrplot), which combines a number of QC metrics including, sequence length, channel activity cumulative nucleotides and many other useful metrics. The tool is now freely available on GitHub an Andreas welcomes contributions of additional plots.