Duplications drive diversity in Bordetella pertussis on an underestimated scale

Bacterial genetic diversity is often described using solely base pair changes despite a wide variety of other mutation types likely being major contributors. Tandem duplications of genomic loci are thought to be widespread among bacteria but due to their often intractable size and instability, comprehensive studies of the range and genome dynamics of these mutations are rare.

We define a methodology to investigate duplications in bacterial genomes based on read depth of genome sequence data as a proxy for copy number.

We demonstrate the approach with Bordetella pertussis, whose insertion sequence element-rich genome provides extensive scope for duplications to occur. Analysis of genome sequence data for 2430 B. pertussis isolates identified 272 putative duplications, of which 94% were located at 11 hotspot loci. We demonstrate limited phylogenetic connection for the occurrence of duplications, suggesting unstable and sporadic characteristics. Genome instability was further described in vitro using long read sequencing via the Nanopore platform. Clonally derived laboratory cultures produced heterogenous populations containing multiple structural variants. Short read data was used to predict 272 duplications, whilst long reads generated on the Nanopore platform enabled the in-depth study of the genome dynamics of tandem duplications in B. pertussis.

Our work reveals the unrecognised and dynamic genetic diversity of B. pertussis and, as the complexity of the B. pertussis genome is not unique, highlights the need for a holistic and fundamental understanding of bacterial genetics.

Authors: Jonathan S Abrahams, Michael R Weigand, Natalie A Ring, Iain MacArthur, Scott Peng, Margaret M Williams, Barrett Bready, Anthony P Catalano, Jennifer R Davis, Michael D Kaiser, John Oliver, Jay M Sage, Stefan Bagby, M Lucia Tondella, Andrew R Gorringe, Andrew Preston