Nanopore Community Meeting 2019: Day 2
Fri 6th December 2019
The second day of the Nanopore Community Meeting 2019 in New York was no less packed than the first, with tonnes of great talks from a huge range of speakers. Here are just a few of the highlights, but all of the recordings will be online shortly for you to catch up on. We'll be adding more to this write up shortly!
Plenary: Steven Salzberg
Sequencing and assembling the mega-genomes of mega-trees: the giant sequoia and coast redwood genomes
Opening day two of the Nanopore Community Meeting in New York, we were delighted to welcome back Steven Salzberg, Professor and Director of the Centre for Computational Biology at Johns Hopkins University. At the Nanopore Community Meeting in 2017, Professor Salzberg detailed his involvement in an ambitious project to sequence the genomes of the giant sequoia (Sequoiadendron giganteum) and coast redwood (Sequoia sempervirens). Now, just two years later, he returned to share the results.
Not only are the sequoia and redwood trees two of the largest living organisms on the planet, they also possess extremely large genomes. At 8.2 Gb for sequoia and 26.5 Gb for redwood, the genomes of these organisms are, respectively, approximately 2.6 and 8.3 times larger than that of humans — making their assembly a truly monumental undertaking. To tackle this challenge, the team deployed a ‘hybrid’ genome assembly strategy, utilising both short-read sequencing technology and long-read nanopore sequencing. Professor Salzberg described how, at over 10 kb in length, nanopore sequencing reads could span nearly all common repeats, simplifying the assembly process.
The team deployed the MaSuRCA hybrid assembler, an open source tool developed in Professor Salzberg’s lab. Briefly, this uses a k-mer lookup to extend short sequencing reads base by base, at both the 5’ and 3’ ends (as long as the extension is unique), to form much longer ‘super-reads’. The combination of super-reads and long nanopore sequencing reads then enable the generation of even larger ‘mega-reads’.
Starting with the sequoia, Professor Salzberg described how the short-read sequencing was performed on DNA obtained from a single seed (or pine nut) taken from, at 93.3 meters high, the tallest known sequoia in the world. Importantly, the seed is haploid making the subsequent assembly much easier. In total the team generated 135x genome coverage using short-read sequencing technology. For the long-read nanopore sequencing component, they obtained the DNA from needle tissue taken from the same tree, which in itself was fraught with challenges, not least the requirement to get 100 feet in the air to access the first branch of the tree. Using 13 MinION Flow Cells, Winston Timp’s lab at Johns Hopkins University, who undertook all of the sequencing work, generated over 182 Gb of data across approximately 24 million reads — equating to 22x genome coverage — with a read N50 of 9.5 kb.
Assembly using the short-read data alone provided a contig N50 of 12 kb across 2,507,175 contigs; however, addition of the long-read nanopore data increased the contig N50 to 360 kb, while reducing the number of contigs to less than 50,000 (a 30-fold reduction). Professor Salzberg referred to this genome assembly as Sequoia v1.0 and went on to explain how they have now integrated a Hi-C chromosome conformation capture technique, using the HiRise assembly algorithm, to generate the Sequoia v2.0 assembly. The team at Johns Hopkins University recently applied this technique to generate a chromosome-level assembly of the walnut (Juglans regia L) genome. Comparing the walnut and sequoia nanopore sequencing reads, Professor Salzberg noted that the more recent sequoia reads were significantly longer, reflecting the rapid development of the nanopore technology their laboratory’s sequencing workflow.
Assembly using the HiRise algorithm generated 11 ‘enormous’ chromosome-size scaffolds of between 171 Mb and 985 Mb in size. Describing such large scaffolds as ‘spectacular’ and ‘transformative’ Professor Salzberg noted that these are the largest scaffolds ever assembled for any genome. He went on to joke that ‘if your genome doesn’t have a chromosome larger than 1 Gb you can’t break this record’. Recent gene annotation of the Sequoia v2.0 genome has identified 37,963 protein coding genes.
Moving on to the coast redwood, Professor Salzburg explained how, at 27 Gb, this organism not only possesses a much larger genome, but is also hexaploid (i.e. six copies of each chromosome), providing an even sterner computational challenge. In total, the team generated 3.2 trillion bases of short-read data and 582 billion bases of nanopore sequencing data, representing 122x and 21x genome coverage respectively. Sequencing of this giant genome was completed in October 2018, and the subsequent assembly took 5-6 months (or approximately 700,000 CPU hours post error correction). Sharing some of the metrics for this redwood v1.0 assembly, Professor Salzberg stated that the largest contig is 2.4 Mb and the N50 contig size is 110 kb. The Hi-C assembly is still ongoing; however, in closing his presentation, Professor Salzberg suggested that using the resulting data it may be possible to split the redwood genome into its three sub-genomes - and welcomed any help with this challenge!
The redwood sequencing also resulted in the identification of a novel fungal genome (Pestalotiopsis sempervirens), which was covered to 26x depth; however, with the sheer volume of data being generated in the lab, the team have not yet had time to write this up.
It is anticipated that this ground-breaking research will significantly enhance our knowledge of these astounding organisms and support conservation and breeding efforts, preserving them for future generations to enjoy.
Chenchen Zhu – Single-molecule, full-length transcript isoform sequencing reveals disease-associated RNA isoforms
Chenchen opened his talk by discussing how alternative splicing generates increased transcriptional complexity and diversity. In his research, he focuses on the association between aberrant transcription and gene expression in dilated cardiomyopathy. This disease is associated with heart failure, and about 3% of individuals affected have aberrant RBM20 splicing, which is associated with a severe form of the disease.
For his research, Chenchen studied iPS-derived cardiomyocytes. Full-length transcript isoform sequencing using nanopore technology enabled him to identify aberrant splicing at genome scale, identifying transcripts originating from both the positive and negative DNA strands. He also investigated exon skipping and intron retention.
Chenchen next described his FulQuant analysis method of isoform detection which he said was highly specific and sensitive. He explained how he had first noticed difficulties with alignment, and instances of template truncation, sometimes arising from issues with reverse transcriptase procession. Problems were particularly notable for very short exons. His algorithm helped with identifying false positive isoform detection, achieving an accuracy of 98%. On average, Chenchen stated that he found two transcript isoforms per gene.
Next, he displayed plots of differentially-expressed full-length isoforms in the RBM20 mutant R643Q, compared to wild-type cells. Chenchen said that he identified 38 mis-spliced isoforms in genes associated with cardiac function, and there was a significant difference in the level of expression of some genes that are targets of RBM20, such as TPM1 and TPM2. Two IMMT isoforms were also differentially expressed. Long-read nanopore sequencing allowed him to pinpoint exact transcripts.
Seda Mirzoyan - Using SIP and MinION sequencing to uncover active bacterial and eukaryotic microbial communities in blueberry farm and forest soil systems
Seda Mirzoyan (Rutgers University) began her talk by describing how blueberry production is a $700 million industry and that the United Sates is the world's largest producer. Blueberry plants grow in acidic, well-drained soils, which usually have high organic content.
The aim of this research was to uncover the differences between low- and high-productivity soils in terms of microbial composition. In order to identify active microbes, the team at Rutgers employed stable isotope probing (SIP) using carbon-13 (13C) with a negative carbon-12 (12C) control. Following DNA extraction from a caesium chloride gradient, full-length rRNA operons were amplified and screened against a comprehensive 16S database. In total, 19 major eukaryotic clades were detected. Virtually all of the active eukaryotic community and 70% of the resident community was identified as fungi. In forest soil controls it was observed that the resident and active community differed. High-productivity soils were enriched with Glomeromycotan species — beneficial fungi that help provide nutrients to plants. Low-productivity soils were found to be enriched with members of the fungal pathogen phylum Rozellomycota. Examining the bacterial composition of the soils revealed that, in low productivity soils, the active microbial community were enriched with members of the phylum Firmicutes.
In summary, Seda suggested that SIP combined with ribosomal operon profiling using the MinION can differentiate between resident and active microbial communities in agricultural and forest soil systems.
Olufunmilola Ibironke - Species-level evaluation of the human respiratory microbiome
According to Olufunmilola Ibironke from Rutgers University, changes to the lung microbiome is associated with respiratory disease; however, lung microbiome studies traditionally only resolve to the level of phylum or family, obscuring the relative abundance of different bacterial species.
The aim of Olufunmilola’s study was to evaluate the abundance of microbial species in four different regions of the human respiratory tract, namely the throat, nasal cavity, mouth, and bronchial lavage (lung). DNA samples collected at each site from five individuals were subjected to ribosomal operon sequencing using the MinION. Screening against a 16S database allowed the identification of almost 3,600 bacterial species, with the dominant phyla being Firmucutes, Proteobacteria, Actinobacteria, Bacteroidetes, and Fusobacteria.
Examining the relative abundance of bacterial species along the respiratory tract indicated that most microbes (95%) were being passively transported from outside of the body (mouth/nose samples) and into the lung. However, a small percentage (<5%) of bacterial species were at higher abundance within the lung samples. Of the 100 bacterial species that were enriched within the lung, the most predominant were Veillonella dispar and Veillonella atypica. Olufunmilola suggested that these lung-enriched organisms may play a significant role in lung health. The most abundant mouth-associated bacterial species were found to be Streptococcus infantis and Streptococcus mitis.
Summarising the application of nanopore sequencing to this project, Olufunmilola stated that the MinION can be used to provide species-level resolution of the respiratory microbiome.
David VanHoute – Rapid adventitious agent detection using nanopore technology
David opened his talk by stating that, at Regeneron Pharmaceuticals, they have to ensure that the drug products they manufacture are free from any biological contaminants. Discussing how nanopore sequencing fits into the context of QC virology, David stated that it has the potential to rapidly decrease screening time of potential bioreactor contaminants – from up to ~28 days for culture-based screening, to 1 day with MinION de novo sequencing. The aim is to replace such slow methods of screening that are typically employed with these new methods, to reduce testing time as well as cost.
David shared that there are challenges to QC testing with sequencing, such as contamination with host nucleic acids and cell culture reagents, low viral titres, and residual DNA from the bioreactors. Everything sequenced must be accounted for. His team have been working for a while on how to enrich for viral nuclei, and how to characterise everything seen in the background signal.
As an example, David shared a case of identifying intact Bosavirus in gamma-irradiated foetal bovine serum, a cell-culture reagent. Using an enrichment protocol and removing host nucleic acids, followed by library prep with the Rapid Sequencing Kit, and 24 hours of MinION sequencing, the ratio of Bosavirus reads to non-viral reads was greatly increased. However, in this case, the virus would have to be stated as a true positive result, despite it being a false positive
David summaries the nanopore sequencing can greatly improve detection in QC virology as it is fast, enables quantitative analysis, accurate, and inexpensive, which are all contrary to current testing methods. In future, he suggests that RNA sequencing to investigate differential gene expression between healthy an infected cells could be used. He is also exploring how machine learning algorithms could be used to detect infected samples, which would streamline the process, as ultimately "we want something that is faster".
He concluded by saying something that he has "wanted to talk about for a while" - that QC virology detection is advancing as Oxford Nanopore technology itself advances and increases its throughput. He said that he hopes to get a PromethION, and his ideal would be to be able to sequence proteins with nanopore technology.
Chris Mason - Diplotype-resolved, single-molecule telomere sequencing
Chris's studies into twin astronauts started about five years ago, with NASA's longest human twin flight mission. Chris remarked that he was honoured to be able to scientifically contribute to this mission.
As a side note, he introduced "chromosome 3 and Brexit" and genetic consequences of social stratification in Great Britain, by discussing how a recent publication related SNPs to election "phenotype". This showed that two SNPs (in two different genes) on chromosome 3 and one on chromosome 20 were found to be related to Brexit voting. In another example, a positive correlation was found between the percentage of fish and chip shops in an area and the percentage of voters for Brexit.
Chris then started to explore his work with NASA on The Twin Study, investigating an extensive range of phenotypes and their relationship with genotype - from vasculature, to cognition, to immune functioning and the microbiome - "some of the best places to look are everywhere you possibly can"! The Kelly twins were the subjects of this study - Scott had been on the ISS for one year, and Mark was on Earth, acting as the "control twin".
Chris shared one of their interesting results, which was that longer telomeres were observed in CD4+ and CD8+ T-cells, detected by qPCR, and this was only at the in-flight timepoint, not upon return to Earth or prior to flight.
Chris then described "breaking out the [nanopore] long reads", explaining how read mapping can be a challenge in short-read sequencing, especially when there are repeat sequences, which are characteristic of telomeres (TTAGGG). He introduced EdgeCase - an open source analysis software that he used to quantify the length of telomeres, and map them to the genome. Haplotypes could also be distinguished thanks to the long reads. The results were verified against Genome in a Bottle (GIAB) data.
He then displayed a line up of 5-8 kb telomere sequences, that had been measured in the long-read sequencing data, showing how telomeres were longer for the in-flight timepoint.
In fact, these observations were published by media outlets, stating how space had made Scott "taller and younger"!
So what now? Chris shared that he will be looking everywhere at telomeres - investigating telomere sequence length in those involved in climbing Everest, going to Mars, and in the extreme microbiome project.
Plenary: Shruti Iyer
Shruti Iyer (Cold Spring Harbor Laboratory & Stony Brook University) began her plenary talk by discussing how "cancer is a disease of the genome", in which the accumulation of genetic and epigenetic alterations results in a loss of control over normal cell growth. These genetic alterations, she described, can vary from single point mutations up to larger structural variants, which affect one or multiple genes: the field of cancer genomics aims to identify and characterise these variations.
Shruti noted how several thousand tumours have been sequenced via next-generation sequencing, enabling the discovery of different signatures and mutation rates across different cancer types, plus insights into the clonal structure and evolution of tumours. Malignant cells can comprise as little as 10% of a sample; furthermore, heterogeneity is "very much a part of cancer", with subpopulations within this exhibiting different alleles or genomic features. The combination of this intricacy and lack of depth, Shruti explained, means that the ability to detect these variants via whole genome sequencing to the typical depth of coverage of 30x is very low. Shruti described how targeted methods, such as exome capture, have helped this field to move forward by enriching for regions of interest and improve their coverage; this has enabled the detection of many small variants associated with cancers, but there has been a "blind spot" when it comes to detection of structural variants (SVs).
SVs, Shruti explained, are defined as variants spanning over 50 bp; they encompass insertions, deletions, duplications and translocations. Due to their large size, these variants tend to be disruptive. SVs contribute to copy number changes, which can amplify or delete oncogenes and tumour suppressor genes. SVs can also lead to gene fusions, which can modify the sequence and function of the protein produced; for example, by fusing a highly expressed transcript to one with lower expression levels. SVs can therefore act as prognostic indicators: greater genome instability is generally associated with poorer patient outcome. However, Shruti described how, despite the significance of SVs, relatively little about all but large copy number variations is known, and that "this is largely because of the way these variants are studied".
Some methods of analysing SVs, such as cytogenetics and microarrays - can provide a "bird's eye view" of SVs, but lack resolution. High-resolution methods, on the other hand, generally involve short-read sequencing; short reads cannot span SVs, resulting in misalignments and low sensitivity - "up to 80% false positive rates". Shruti quoted that ~700 genes have been identified as "inaccessible to sequencing with short reads", with ~200 of these being medically relevant. An individual human genome, aligned to the human reference genome, has ~20,000 SVs; Shruti highlighted how we are "really missing a lot of things by not looking for them in the right way. How can these important variants be resolved? "Spoiler alert: long reads can help!" Shruti described how SVs can be detected using long reads with a sensitivity and specificity of over 95%.
Shruti described how analysis of the Her2-amplifed breast cell line SK-BR-3 has helped identify several thousand variants with short and long reads. The cell line was sequenced to high coverage using both a short read technology and long nanopore reads; Shruti described how the long reads "helped identify tens of thousands of additional variants in the cancer". She and her team are now focusing their efforts on targeted, long-read sequencing to achieve the depth needed to identify rare variants.
Shruti noted how the use of targeting strategies enable higher-throughput sequencing of the targets of interest, improving their depth of coverage, allowing for the detection of rare alleles. As targeting avoids having to sequence a whole genome to generate sufficient coverage of the regions of interest, this approach is also more cost-effective. However, Shruti described how, until ~1.5-2 years ago, there wasn't really an effective method of long-read target enrichment: methods designed for short-read technologies tend to involve either PCR or target capture, leading to inherent bias and short fragments, meaning that long nanopore sequencing could not be used to its full potential.
Shruti then introduced the CRISPR/Cas9 method of PCR-free, long-read target enrichment. The method begins with dephosphorylation of all the DNA in the sample. Cas9 ribonucleoproteins, or RNPs, with the crRNAs (specific to the ends of the targets of interest)and tracrRNAs (which guide the Cas9 enzyme to this site), are then added. The Cas9 is then guided to the sites flanking the target loci, where it induces double-stranded cuts, excising the region. Sequencing adapters can then be ligated to these exposed phosphorylated ends, enabling sequencing of the target regions. Shruti pointed out that all the DNA - both on target and off target - remains present in the enriched sample through sequencing. The process is entirely PCR-free, and can be used to enrich for very large regions. For their first test, Shruti and her team decided to use this method to enrich the BRCA1 gene in the SK-BR-3 cell line, as the cell line had previously been studied in their lab and this would further help validate their findings. At the time, Shruti explained, groups using CRISPR/Cas9-mediated enrichment hadn't gone beyond targeting and sequencing regions of 5-10 kb; Shruti decided to see how far she could push this - "I started with 200 kb".
Shruti then displayed the result: she successfully managed to capture and sequence the BRCA1 gene end-to-end - in one single, ultra-long nanopore read of 198 kb. As far as she is aware, this stands as the record for the longest read generated with CRISPR/Cas9 enrichment - however, Shruti noted that for this project, she was looking for more than a few very long reads, and described how this began the "chasing BRCA" phase of her dissertation.
Seeing poor enrichment of her target region, she noted how the prevalence of SVs in cancer genomes could mean that the loci she targeted with RNA probes could be affected, meaning that they could not be captured. Shruti then displayed enrichment data for the cell line MCF 10A: whilst this produced more on-target reads, they tended not to span the full length of the region. Next, to improve depth of coverage, she tested the preparation and pooling of multiple libraries - whilst this yielded good results, the reads were still not reaching across the full locus.
Shruti then asked: could the background DNA be competing and inhibiting the sequencing of the ultra-long fragments? To tackle this, she used the Circulomics Short Read Eliminator (SRE) Kit on the CRISPR/Cas9-enriched libraries, prior to preparation for sequencing. In an enriched sample of MCF 10A prepared using this method, one 142 kb read was observed, further bridging the target. However, the team's next step was to enrich multiple targets, some of which were below the length cut-off of the SRE Kit, meaning that they would be removed by the process. To enable the preservation of this enriched DNA whilst effectively removing the background DNA, Shruti and her team developed ACME: Affinity-based Cas9-Mediated Enrichment. This method makes use of the histidine tag present on the Cas9 enzyme, used to purify the protein in its production, enabling the capture of Cas-bound regions on His Dynabeads and pulldown via magents. Shruti displayed how libraries prepared using ACME performed better in terms of depth of coverage, but reads did still not span full, very large targets.
The team then designed a cancer gene panel, targeting multiple genes of different size ranges, to test the upper length limit of the enrichment method; the genes selected were those where SVs had been found in whole genome data. Shruti described how, with more targets, more DNA was pulled out using the ACME process. Shruti displayed alignments showing end-to-end coverage of the 90 kb region targeting the BRCA2 gene in both cell lines, with depth of coverage much improved by ACME. For SK-BR-3, 99-fold enrichment, to a depth of coverage of 100x, was achieved with ACME. Analysis of the different target lengths - spanning tens of kilobases and higher - determined that good end-to-end enrichment was seen with the method up to ~100 kb. Noting that there isn't always enough sample to enable multiple preps and pooling prior to sequencing, Shruti tested ACME using single-prep libraries; this denonstrated improved depth of coverage over the non-ACME libraries. In one example, Shruti displayed aligned data for the enriched TERT gene, a target of ~45-50 kb: though this has not been run through SV callers yet, a structural event appears visible even when viewing in IGV.
Summarising, Shruti described how ACME helps to increase target coverage by ~2-fold, helps to increase the target length to close to 100 kb. Furthermore, she noted how those using the "tiling" method, in which probes are tiled across very large regions to improve their coverage, could use ACME to reduce the number of probes needed and tile even larger loci.
In future, she and her team plan to apply this method to the second version of their cancer panel, encompassing BRCA1, ERBB2, APAF1 and other COSMIC genes with evidence of SVs in the cell line SK-BR-3. They also intend to compare the performance of SV detection between targeted and whole genome strategies. Along with testing other targeted approaches, they will also design a panel of genes from existing diagnostic panels to test on both organoids and tumour samples. Lastly, they would like to explore the use of native barcodes for PCR-free multiplexed sequencing of their enriched libraries.
Plenary: Glennis Logsdon
Plenary: Dylan Maghini
Dylan opened her talk by exploring how the human gut microbiome impacts health and disease. The microbiome has an array of functions: it extracts energy from food, it regulates our metabolism, it protects us from disease-causing organisms, it regulates our immune system, and it produces essential vitamins. Nonetheless, despite its clear importance and extensive involvement in our physiology, our understanding of the microbiome is incomplete - "we know very few species present, let along what function they might be having".
Methods involving isolation and culture, and shotgun metagenomics, for species identification only provide part of the picture. This is because they have limited throughput, and are restricted to those organisms that can be cultured.
Dylan explored how microbial genomes can be de novo assembled directly from sequencing stool samples, providing a more holistic understanding of what bacteria comprise the gut microbiota. She discussed how a lot of effort has been made to increase our understanding of the microbiome by sequencing metagenomic samples from individuals with a diverse range of lifestyle, age etc. This has greatly contributed to our understanding of the human gut microbiome.
However, metagenome-assembled genomes (MAGs) obtained using short-read sequencing often miss mobile genetic elements, such as phages and transposons, because they tend to be repetitive. These elements can also be repeated multiple times throughout the genome, and they may transpose to other bacteria via horizontal transfer; therefore, many genomes are incomplete or contaminated with genomic sequences from other bacterial species. These mobile elements are important for carrying cargo or influencing gene expression, and they mediate diverse phenotypes "that are very important for our health", such as virulence and antibiotic resistance. Therefore, their analysis is fundamental to understanding the microbiome, but in order to resolve these elements, Dylan explained, we need to have complete genomes.
Dylan next discussed how traditional methods of DNA extraction in gut microbiome labs can yield very fragmented DNA. This is because bacterial cells are very hard to lyse, and there is lots of other material present in stool which needs to be removed using vigorous treatment of the sample. Therefore, her team employed a new method of DNA extraction to extract highly intact DNA from stool samples, for subsequent nanopore long-read sequencing.
Their method involved enzymatic digestion of the microbial cell wall (using a mixture of different lytic enzymes), phenol-chloroform extraction, proteinase K and RNase A digestion, gravity column purification, and SPRI bead size selection of high molecular weight DNA. Dylan said that with this method they could get DNA fragments up to 49 kb in length. They also developed a new analysis method for assembling and processing the long nanopore reads which incorporated assembly tools such as Canu and Flye.
For initial validation of their approach, they performed nanopore sequencing of a mock microbial community; this yielded many contiguous genomes. Displaying circular plots of these individual bacterial assemblies, Dylan pointed out that clearly the "nanopore assemblies were much more contiguous than the corresponding short read assemblies." In some instances, the genomes were assembled in single contigs. Next, Dylan described how nanopore sequencing of human stool also closed several bacterial genomes that were originally constructed from short-read sequencing. She stated that their method consistently outperformed other methods used for genome closure. In particular, insertion sequences which were typically challenging for short-read assembly were resolved with nanopore long reads. These insertion sequences were found to vary in both abundance across the genome, and over time. As an example, Dylan looked closer at the assembled genome of Prevotella copri - one of most abundant species in our gut. For this bacterium, the nanopore-based genome assembly was much more continuous compared to the original short-read assembly. Focusing on insertion sequences, there were over 100 of them within this genome, which were in fact 5 different forms repeated numerous times throughout the genome. This was a significant part of the problem in assembling the genome of this species.
In terms of the insertion sequences changing over time, Dylan described how they observed insertion sequences changed in a bacterial genome in one individual over 15 months of study. Why is this interesting? One of the insertion sequences detected was next to a gene involved in multidrug export, another insertion sequence was next to a gene involved in capsule biosynthesis (a mechanism for antibiotic resistance); this means that the insertion sequences might be, for example, impacting drug metabolism over time. As another example, insertion sequences next to genes related to the utilisation of complex sugars, such as Beta-galactosidase, SusC and SusE, were found change over time, and this provides us with an idea of what the bacteria are being exposed to in the gut at different time points.
As well as closing known bacterial genomes, nanopore sequencing data closed several novel bacterial genomes. Dylan described how a genome was assembled that seemed to be a species of Cibiobacter. These species are notoriously hard to culture. Nanopore sequencing confirmed that the organism was similar to Cibiobacter; its genome contained many insertion sequences and 5 phage regions. Dylan discussed how such phage genomes present within assembled bacterial genomes reveal adaptive mechanisms of a bacterium, such as nutrient metabolism, revealing what the organisms have been exposed to.
Dylan turned her attention to discussing the "dark matter" of microbial genomes - the uncharacterised parts of genomes. She discussed how nanopore sequencing has been used to illuminate the genomes of uncharacterised non-Western microbiomes, such as the genomes of novel Treponema species in South Africa. This region is transitioning into a developed state so is particularly interesting to investigate how microbiomes are evolving. Previous short-read sequencing was unable to classify a large proportion of the genomic reads for this species, in comparison to similar species isolated in the Western world. They decided to perform a pilot investigation and applied nanopore sequencing to these metagenomic samples to see if they could gain more insight. What was particularly interesting was that the genome of thisTreponema species was less repetitive, with less insertion sequences, and so Dylan questioned why was the corresponding short-read assembly so poor? Perhaps it was due to other sequences that could not be resolved? They have yet to determine the reason(s).
In the last part of her talk, Dylan asked why do some highly abundant organisms still evade genome circularisation? Stating her frustration at this, she said that "this is annoying for us, we got greedy and we really like circular genomes!" She said that they have observed an inverse correlation between the quality of a genome assembly and the relative abundance of organisms present within a sample. Sometimes extensive gene transfer occurs between closely-related species, and this makes it particularly challenging to assemble their genomes.
In summary, Dylan discussed how new methods for DNA extraction and metagenomic assembly with nanopore long reads have enabled completion of many organism genomes, both known and novel, as well as provided greater insight into mobile genetic elements.