Sequencing of E. coli strain UTI89 on multiple sequencing platforms

Objectives

The availability of matched sequencing data for the same sample across different sequencing platforms is a necessity for validation and effective comparison of sequencing platforms. A commonly sequenced sample is the lab-adapted MG1655 strain of Escherichia coli; however, this strain is not fully representative of more complex and dynamic genomes of pathogenic E. coli strains.

Data description

We present six new sequencing data sets for another E. coli strain, UTI89, which is an extraintestinal pathogenic strain isolated from a patient suffering from a urinary tract infection. We now provide matched whole genome sequencing data generated using the PacBio RSII, Oxford Nanopore MinION R9.4, Ion Torrent, ABI SOLiD, and Illumina NextSeq sequencers. Together with other publically available datasets, UTI89 has a nearly complete suite of data generated on most second- and third-generation sequencers.

These data can be used as an additional validation set for new sequencing technologies and analytical methods. More than being another E. coli strain, however, UTI89 is pathogenic, with a 10% larger genome, additional pathogenicity islands, and a large plasmid, features that are common among other naturally occurring and disease-causing E. coli isolates. These data therefore provide a more medically relevant test set for development of algorithms.

Objective

Control sequencing data across different sequencing platforms is extremely important for validation and effective comparison of sequencing platforms. A commonly sequenced sample that has been extensively used for these purposes is the MG1655 strain of E. coli. However, the MG1655 genome is smaller and less complex than those of some pathogenic E. coli strains. As part of control experiments, we have sequenced UTI89, a uropathogenic E. coli (UPEC) strain originally isolated from a patient suffering from an acute bladder infection, using several different sequencing technologies, including ABI SOLiD, Ion Torrent, PacBio, Oxford Nanopore, and Illumina.

Our new data supplements previously published sequencing data generated using the Roche 454, Illumina HiSeq , and the original Oxford Nanopore Technologies MinION. With the inclusion of these new data sets, E. coli strain UTI89 now has a nearly complete set of raw sequence data generated using most second- and third-generation sequencers. For some of the technologies we have multiple data sets, such as for PacBio, which spans the first iteration of the RSII sequencing chemistry (XL/C2) in 2012 up to the P6-C4 chemistry (which was current in 2018), which led to a more than fivefold increase in mean read length.

Authors: Shannon N. Fenlon, Yuemin Celina Chee, Jacqueline Lai Yuen Chee, Yeen Hui Choy, Alexis Jiaying Khng, Lu Ting Liow, Kurosh S. Mehershahi, Xiaoan Ruan, Stephen W. Turner, Fei Yao, Swaine L. Chen