Long-reads reveal that the chloroplast genome exists in two distinct versions in most plants

The chloroplast genome usually has a quadripartite structure consisting of a large single copy region and a small single copy region separated by two long inverted repeats. It has been known for some time that a single cell may contain at least two structural haplotypes of this structure, which differ in the relative orientation of the single copy regions. However, the methods required to detect and measure the abundance of the structural haplotypes are labour-intensive, and this phenomenon remains understudied.

Here we develop a new method, Cp-hap, to detect all possible structural haplotypes of chloroplast genomes of quadripartite structure using long-read sequencing data.

We use this method to conduct a systematic analysis and quantification of chloroplast structural haplotypes in 61 land plant species across 19 orders of Angiosperms, Gymnosperms and Pteridophytes. Our results show that there are two chloroplast structural haplotypes which occur with equal frequency in most land plant individuals. Nevertheless, species whose chloroplast genomes lack inverted repeats or have short inverted repeats have just a single structural haplotype. We also show that the relative abundance of the two structural haplotypes remains constant across multiple samples from a single individual plant, suggesting that the process which maintains equal frequency of the two haplotypes operates rapidly, consistent with the hypothesis that flip-flop recombination mediates chloroplast structural heteroplasmy.

Our results suggest that previous claims of differences in chloroplast genome structure between species may need to be revisited.

The Cp-hap pipeline, and detailed instructions for running it, are available on GitHub at https://github.com/asdcid/Cp-hap.

Authors: Weiwen Wang, Robert Lanfear