Ultra-long reads and ultra-long duplications: What nanopore sequencing is revealing about Bordetella pertussis

The B. pertussis genome is repetitive. The average B. pertussis genome contains 280 copies of >1,000 bp insertion sequence (IS) elements, representing ~7% of the 4.1 Mb total genome length. The many IS copies mean that closed genome assemblies cannot be produced using short-read sequencing (e.g. Illumina), because each IS element is longer than the short reads. B. pertussis is traditionally described as a monomorphic species: very few base-level differences exist between different strains. The presence of so many mobile IS elements in the genome, however, means that genome-level differences, such as rearrangements, deletions and duplications, are possible. We are using long-read sequencing to identity genome-level differences between otherwise highly similar B. pertussis strains.

Download the PDF