London Calling: Day 1 (afternoon) writeups

A blog summarising presentations from the London Calling Conference, May 2018.

Thanks for visiting our London Calling writeup page. We'll continue to update it over the coming days so please check back!

Plenary:  The beauty and the beast - Hans Jansen

Since its introduction in sixteenth century, the tulip has become synonymous with the Netherlands and its agriculture remains economically important; however, according to Hans Jansen: ‘tulip breeding isn’t without its problems’. Hans explained that going from seed to becoming a commercial product takes 25 years, while the identification of traits that confer resistance to disease is also important in reducing the growing use of pesticides.

Hans and the team at Future Genomics Technologies aim to assist tulip breeding efforts by sequencing the tulip genome, but this is no small feat. With a genome of approximately 34 Gb, which Hans describes as a ‘beast’, it is intractable to existing short-read sequencing technologies.

In his presentation, Hans discussed how they are utilising the advantages of long read nanopore sequencing to tackle this massive genome. The team used a combination of sequencing on the MinION and PromethION, generating 203 Gb data (approximately 6x genome coverage).

Obtaining high-quality DNA from plants can prove challenging and the team tried a number of different samples types, with the best yields being obtained from freshly formed shoots.

Hans also presented data obtained using short read sequencing technology revealing significant variation in repeat content between different species of tulip. Examining the long read nanopore data it was evident that most repeats have stretches of unique sequence in between, which gave the team hope that they could assemble the complete genome, using nanopore sequencing.

Hans then turned his attention to the challenge of reassembling the sequencing reads to generate a complete genome. He stated that the larger the genome, the more difficult it is to assemble as, with most genome assemblers, every read is aligned against all other reads. To address this, the team designed Tulipa-julia, the successor to the long-read scaffolding assembler Tulip, which works on the basis of only using a few informative parts of the long reads for assembly – or as Hans stated: ‘dividing the assembly challenge into several smaller, less complex assemblies. The team tested this new assembler on the NA12878 nanopore data set and are now in the process of using it to assemble the genome of Tulipa gesneriana (Orange Sherpa). In conclusion, Hans stated that ‘there is still a lot of analysis optimisation to be done and this needs to be performed for each different genome but the PromethION is really needed to access any genome’.

Plenary: Nanopore sequencing of cancer genomes - Wigard Kloosterman

Wigard Kloosterman from the University Medical Center, Netherlands, opened his talk by providing an overview of his team’s work on structural variation and how nanopore sequencing with its long reads is making this feasible. His first case study was an individual with congenital disease (an insertion-translocation in chromosome 9), whose genome was sequenced using the MinION. The team observed a very complex pattern of de novo genome rearrangement, called chromothripsis. He directed delegates to his recent publication for more information on this work.

Wigard stated that one question he often hears is ‘what coverage do you need to understand SV in genomes’. In response, he showed data indicating that with 14x coverage they reached near 100% sensitivity for identification of de novo pathogenic breakpoints.

Another question he addressed is how best to capture SV from long read nanopore sequencing data. Wigard stated that there are a number of published tools available, to which his team has contributed NanoSV. This analysis tool uses split read mappings of long read data to identify breakpoint junctions. In his lab, they evaluated four different mappers (LAST, BWA, Minimap2, NGMLR) and discovered LAST to be the slowest and Minimap the fasters; however, in terms of mapping accuracy, LAST provided the best performance. Based on their analyses the combination of LAST and NanoSV is very robust for SV detection.

The long nanopore reads enable the use of phase-informative genetic variation to improve SV calling. Wigard presented data showing 70% of nanopore reads could be phased using overlapping phase-informative SNPs. They used this method to phase de novo chromothripsis breakpoints and discovered that they all occurred on the paternal chromosome.

More recently, the team have been using the PromethION, with their first project being the sequencing and analysis of Wigard’s own genome. The first run provided 76 Gb data; however, after aiming for longer reads using phenol chloroform extraction the yield reduced. In total, they obtained 75x coverage of the genome, which they are now starting to analyse. They first assembled the genome using miniasm which provided an N50 of 3.5 Mb. Wigard then presented his CNV profile, which, as he is no doubt relieved to find, revealed no aneuploidies in his genome. He briefly highlighted the SV in his genome said they can now compare long read nanopore genomes for the purpose of evaluation SV and differences between individuals.

In the final section of his talk, he discussed looking at SV in cancer genomes.

They are involved in a project together with the Hartwig Medical Foundation to study genomes from metastatic cancer patients and are looking to capture SV in these genomes. They started sequencing a reference melanoma and normal cell line using MinION. The reference genome is very heavily rearranged. Comparing this data to that derived from short-read technology, revealed substantial overlap but also many variations that could only be captured using the nanopore sequencing data. They are now in the process of analysing this data. Wigard also said how mixing the data derived from these two samples allowed them to assess the effect of tumour purity on SV detection power. They found that if you go below 50% tumour content there is a dramatic drop off in the SV detection.

For rapid detection of somatic SV, they are using low-pass long-read nanopore sequencing. They developed the SHARC pipeline which should provide tumour specific breakpoints. They tested the pipeline on the melanoma reference sample cell line by subsampling the data at 1x, 2x, 3,x, 4x and total coverage, demonstrating that SHARC accurately predicts somatic SV breakpoints from low-coverage nanopore data.

Application of this pipeline on an ovarian cancer tumour biopsy, demonstrated proof of principle for rapid identification of somatic breakpoints as tumour specific biomarkers for cancer tracking from blood.

Clive Brown's talk will be posted separately