Improvements in the sequencing and assembly of plant genomes

Background Advances in DNA sequencing have reduced the difficulty of sequencing and assembling plant genomes. A range of methods for long read sequencing and assembly have been recently compared and we now extend the earlier study and report a comparison with more recent methods.

Results Updated Oxford Nanopore Technology software supported improved assemblies. The use of more accurate sequences produced by repeated sequencing of the same molecule (PacBio HiFi) resulted in much less fragmented assembly of sequencing reads. The use of more data to give increased genome coverage resulted in longer contigs (higher N50) but reduced the total length of the assemblies and improved genome completeness (BUSCO).

The original model species, Macadamia jansenii, a basal eudicot, was also compared with the 3 other Macadamia species and with avocado (Persea americana), a magnoliid, and jojoba (Simmondsia chinensis) a core eudicot. In these phylogenetically diverse angiosperms, increasing sequence data volumes also caused a highly linear increase in contig size, decreased assembly length and further improved already high completeness. Differences in genome size and sequence complexity apparently influenced the success of assembly from these different species.

Conclusions Advances in long read sequencing technology have continued to significantly improve the results of sequencing and assembly of plant genomes. However, results were consistently improved by greater genome coverage (using an increased number of reads) with the amount needed to achieve a particular level of assembly being species dependant.

Authors: Priyanka Sharma, Othman Aldossary, Bader Alsubaie, Ibrahim Al-Mssallem, Onka Nath, Neena Mitter, Gabriel Rodrigues Alves Margarido, Bruce Topp, Valentine Murigneux, Ardy Kharabian Masouleh, Agnelo Furtado, Robert J Henry