Beware of Ogres: grass pea and the challenges of assembling large legume genomes

Grass pea (Lathyrus sativus) is exceptionally resilient to drought, flooding, and salinity. However, it contains a toxin which, when a lot of the plant is consumed over months, can cause paralysis from the waist down.

The problem of the presence of the toxin can be tackled through plant breeding; this requires an understanding of e.g. the genetics of toxin synthesis, for which a genome assembly is needed.

The grass pea genome is 6.3 Gb and highly repetitive, featuring ‘Ogre elements’, spanning up to 25 kbp.

Short-read assembly of grass pea produced a 6.2 Gb assembly across 1.6 million contigs.

Scaffolding the assembly with paired-end short reads increased contiguity, but introduced 2 billion Ns into the assembly.

Long-read nanopore sequencing on PromethION to 36x coverage + polishing with short reads produced a 6.2 Gb assembly ‘with no Ns’ in 163 contigs, with almost 3-fold improvement in contig N50 vs the scaffolded short-read assembly.

Gene annotation revealed 45k protein-coding genes & >75k transcripts. BUSCO completeness was 82-90%.

Authors: Peter Emmrich