Ultra-long reads and ultra-long duplications: deciphering the mysteries of the Bordetella pertussis genome

Opening the breakout session on Bacterial genomics, Natalie Ring from the University of Bath presented her research on Bordetella pertussis the causative agent of whooping cough. Natalie showed a chart revealing that, in recent years, whooping cough has undergone a widespread resurgence. The first vaccine for this disease was developed in the 1940’s using whole, dead B. pertussis cells proved very effective. In the 1990’s it was replaced by a new vaccine, comprising up to five of the most important antigens. Since this time, there has been a steady increase in the number of cases of whooping cough. While this could be due to increased awareness or a decrease in vaccine efficacy, another explanation could be down to changes in the B. pertussis genome. The latter of these possibilities is the subject of Natalie’s research.

Numerous short-read sequencing studies have been performed on this pathogen, which showed little variation between strains. While it is known that the B. pertussis genome is highly repetitive – with some species exhibiting up to 300 copies of a 1,000 bp insertion sequencing – the nature of short read sequencing precludes whole genome analysis of this organism. Believing that a whole genome analysis approach may reveal the hidden variation potentially responsible for the increased infection rate, the team at Warwick performed long-read nanopore sequencing on five B. pertussis strains. These samples were barcoded and sequenced on a single MinION flow cell. A hybrid genome assembly strategy using the Unicyler tool allowed four of the five strains to be assembled into single contigs.

Looking at the structural differences between the strains they found three different genome arrangements. Two strains were also shown to have longer genomes than the others, which, through mapping the reads back to the reference genome was attributed to duplicated regions. By comparing their data with all of the B. pertussis strains in the NCBI database, the team identified that these regions corresponded to motility genes. One of the strains being studied also revealed a region with 4-5x additional coverage. Re-examining the raw reads for this strain, the team noticed that individual reads contained different numbers of copies of this region. This had not been seen before in this organism and helps to explain why some B. pertussis genomes do not always resolve. The subsequent examination of other strains which had showed levels of duplication also revealed this phenomenon. In addition, individual cells of the same strain also showed different genome arrangements. Putting this into context, Natalie stated that they found ‘crazy levels of variation within the same strain in an organism where we are not supposed to see differences between strains’.

The next question the team wanted to address was how quickly these SVs are occurring. Presenting data obtained just this week, Natalie stated that they could find genome variation in the cells of a single B. pertussis colony after just four days of growth. They now plan to explore what affect these rearrangements and duplications have on phenotype.

Authors: Natalie Ring