Whole-genome assembly of guar (Cyamopsis tetragonoloba [L.] Taub.) by second- and third-generation sequencing approaches
- Home
- Resource Centre
- Whole-genome assembly of guar (Cyamopsis tetragonoloba [L.] Taub.) by second- and third-generation sequencing approaches
Closing the Assembling complex genomes breakout, Elizaveta Grigoreva (Saint-Petersburg State Forestry University) introduced us to the guar plant (Cyamopsis tetragonoloba [L.] Taub.) — a legume native to India and Pakistan, the seeds of which are a rich source of galactomannan polysaccharide (known as guar gum), which is used as a stabilizer (E412) in the food industry, and has further widespread uses across the cosmetics, textile, oil, and gas industries. The guar genome is estimated to be approximately 480 Mb in length but, to date, its sequence has yet to be published. Due to its short photoperiod, guar crop production is unsuitable for countries in northern latitudes with long light days during the growing season (e.g. Russia). As such, molecular markers are required to help breeders; however, this task is complicated by the lack of a published genome, which is where Elizaveta’s work comes in. Their team aim to produce a whole genome guar assembly. There are three strands to their work: genome guided transcriptome assembly; epigenetics study of guar; and genotyping-by-sequencing of the guar population.
Elizaveta first turned to DNA extraction and library preparation for genome assembly, where they used both nanopore sequencing on MinION and a short-read sequencing technology. DNA extraction for nanopore sequencing was performed using a custom method developed for the palm tree, but with additional cleaning steps to account for the high phenol content of guar. In total, five MinION Flow Cell runs were performed, using approx. 2,600 ng of DNA for each library prep, with fragmentation using a COVARIS performed for just one of the preps. As anticipated, the reads from the fragmented prep displayed a shorter N50 than those of the unfragmented preps. Each flow cell was run for between 50-60 hours and yielded between 9 and 13 Gbs of data. In total, the team generated 101X genome coverage of nanopore sequencing reads and an additional 250X coverage of short-reads.
Moving on to their bioinformatics approach, Elizaveta described how they tested three assembly methods: short-read only; long-read only; and hybrid assembly. Their aim was to identify the most accurate algorithm and assembler. According to Elizaveta, ‘assemblers that are just based on short reads have a lot of gaps’, while assemblers based on long reads can have issues with homopolymers. Their solution in this case was to choose a hybrid assembly approach. She showed a detailed table of assembly metrics derived from a range of different assembly algorithms and approaches. The team identified the most informative parameters to assess for their research were N50 length, total assembly length, and the percentage of the assembly comprising scaffolds larger than 200,000 and 300,000 bp. Based on these parameters, the assembly generated by the hybrid assembler Mascurca was selected. The next phase of research includes reassembly with repeat masking and repeat classification, genome annotation, and optical mapping.
In the final part of her talk, Elizaveta described how they have used the genome to inform transcriptome assembly of guar, for which over 100,000 transcripts were identified. Differential gene expression analysis was then performed for guar plants with different flowering times, revealing 1,067 differentially expressed transcripts. The team are using this data to find biomarkers for flowering time. They are also using the direct nanopore sequencing reads to investigate the methylation profile of guar plants exposed to different environmental conditions, for which Elizaveta commented, ‘this is fantastic that nanopore sequencing technology gave us all these possibilities’. Concluding her presentation, Elizaveta stated that ‘this [nanopore] sequencing data was used for whole-genome assembly, trancriptome assembly, and we were able to perform additional research for methylation profiling. It is very inspiring’.