Characterising the chloroplast genome

Through their role in photosynthesis, whereby carbon dioxide and water are converted into carbohydrates and oxygen, chloroplasts are not only essential for the life of plants but for all life on Earth. Chloroplasts contain their own genome, comprising approximately 130 genes, which are involved in photosynthesis and other important metabolic processes1,2. Chloroplast genomes display considerable variation both within and between species, providing insights into phylogeny and evolutionary adaption2. More recently, transgenic chloroplasts have been utilised to enhance plant agronomic traits and to produce high-value agricultural or biomedical products2.

A single contig was created spanning the entire chloroplast genome with few or no detectable errors1.

Although the chloroplast genome is relatively small at 120–160 kb, it contains  a pair of long inverted repeats (10-30 kb)  that can confound sequencing and assembly efforts when using short-read technologies1. Further, assembling chloroplast genomes through alignment with published references may lead to inaccurate results if the genome structure is not conserved (e.g. the chickpea chloroplast genome contains only one inverted repeat region) or if the reference contains errors. To combat this challenge, an international research team compared both short- and long-read sequencing approaches to assemble the chloroplast genome of Eucalyptus pauciflora (snow gum).1

E.pauciflora, which is indigenous to Australia, is of particular interest due to its drought and cold tolerance. The team examined the effect of sequence coverage and the length of long reads to establish an optimised method for chloroplast genome assembly.

Long read sequencing was performed using the MinION on high-molecular weight DNA prepared using a rapid DNA extraction protocol based on that described by Mayjonade et al 3.

The team found that for the long read nanopore sequencing, the Hinge assembly tool, in combination with sequencing polishing using Racon and Nanopolish at 500x coverage delivered the best assembly1. The optimal assembly was achieved using a hybrid approach combining at least 20x coverage of both long- and short- reads — delivering a single contig spanning the entire chloroplast genome with few or no detectable errors. Using this assembly, the team were able to determine the chloroplast genome of E. pauciflora to be 159,942 bp in length and contain 131 genes of known function (Figure).

Summarising their results, the researchers conclude that their method, which combines the simple and cost-effective generation of long-read data with shortread data provides a ‘clear path towards producing multiple highly-accurate chloroplast genome assembles for very low cost’

case study figure 6.PNG

Figure: Annotated E. pauciflora chloroplast genome. The grey region in the inside circle shows the GC content across the chloroplast genome. Image courtesy of Wang et al 1.

This case study is taken from the plant white paper.

  1. Wang, W. et al. Assembly of chloroplast genomes with long- and short-read data: a comparison of approaches using Eucalyptus pauciflora as a test case. bioRxiv 320085 (2018).
  2. Daniell, H., Lin, C.-S., Yu, M. and Chang, W.-J. Chloroplast genomes: diversity, evolution, and applications in genetic engineering. Genome Biology. 17:134 (2016).
  3. Mayjonade, B. et al. Extraction of high-molecular-weight genomic DNA for long-read sequencing of single molecules. Biotechniques. 61(4):203-5 (2016).