The beauty and the beast — assembling the tulip genome

Since its introduction in sixteenth century, the tulip has become synonymous with the Netherlands and its agriculture remains economically important — with over 2 billion tulip bulbs being exported each year. However, according to Dr. Hans Jansen, Chief Technology Officer at Future Genomics Technologies: ‘tulip breeding isn’t without its problems and it can take 25 years to go from seed to a commercial product’1. One of the aims of Dr. Jansen’s team is to identify traits that confer resistance to disease in order to combat the growing use of pesticides. Sequencing of the whole tulip genome would enable enhanced trait identification and significantly speed up the breeding process through the facility to identify specific markers in the seeds of crossed lines rather than waiting over a year for traditional phenotype-based selection.

At approximately 34 Gb — ten times larger than the human genome — with highly repetitive content, Dr Jansen describes the tulip genome as a ‘beast’, which is intractable to existing shortread sequencing technologies1. To tackle this challenge, the team at Future Genomics Technologies employed the long sequencing reads delivered by the MinION and PromethION to sequence the genome of Tulipa gesneriana (Orange Sherpa). In total, they generated 203 Gb of data (equivalent to 6x genome coverage) utilising both the MinION and PromethION1.

The next challenge faced by the team was to reassemble the sequencing reads to generate a complete genome, as most existing assemblers are not specifically designed for such large genomes.

case study figure 3.PNG

Figure 1: Assembly metrics using Tulipa-julia on the human NA12878 nanopore data set were shown to be optimal using approximately 15x sequence coverage. Figure courtesy of Dr Hans Jansen, Future Genomics Technologies, Netherlands.

Traditional genome assembly tools work by aligning every read with all other reads, which for large genomes vastly increases the number of calculations and CPU hours required by the assembler. To address this, the team designed ‘Tulipa-julia’, the successor to the longread scaffolding assembler ‘TULIP’, which works on the basis of only using a few unique and informative parts of the long nanopore reads for alignment — or as stated by Dr. Jansen: ‘dividing the assembly challenge into several smaller, less complex assemblies’1.

The team tested this new assembler on the human NA12878 nanopore data set, revealing that optimal genome assembly and N50 metrics could be obtained with approximately 15x sequence coverage (Figure 10) and that assembly could be completed in approximately 1-4 hours (Figure 10)1.

The team plan to further optimise Tulipajulia through its application to the tulip genome prior to making the assembler freely available to other researchers. While the tulip genome is still being assembled, the initial results are sufficient to convince Dr. Jansen of the value of long-read nanopore reads for sequencing large plant genomes. Commenting on the performance of the PromethION, Dr. Jansen concludes:

‘The PromethION is needed. It generates such a lot of data in such a consistent way that we can more easily access any genome’1

case study figure 4.PNGFigure 2: Tulipa-julia requires significantly fewer CPU hours when compared with Marvel and Canu assemblers. Figure courtesy of Dr Hans Jansen, Future Genomics Technologies, Netherlands.

This case study is taken from the plant white paper.

  1. Jansen, H. The beauty and the beast. Presentation. Available at: https://nanoporetech.com/resource-centre/talk/beauty-and-beast. [Accessed: 15 June 2018]