Chromosome-length genome assembly and structural variations of the primal Basenji dog (Canis lupus familiaris) genome

Background Basenjis are considered an ancient dog breed of central African origins that still live and hunt with tribesmen in the African Congo. Nicknamed the barkless dog, Basenjis possess unique phylogeny, geographical origins and traits make understanding their genome structure relative to more modern dog breeds of great interest.

Here, we report the de novo assemblies of two Basenji: a female, China, and a male, Wags. We conduct pairwise comparisons and report structural variations between assembled genomes of three dog breeds: Basenji (CanFam_Bas), Boxer (CanFam3.1) and German Shepherd Dog (GSD) (CanFam_GSD). We then align representative whole genome sequences from 58 dog breeds and show the importance of genome reference when assessing variation among dog breeds.

Results Here we present two high quality Basenji genome assemblies, CanFam_Bas (China) and Wags. CanFam_Bas is superior to CanFam v3,1 is terms of genome contiguity and comparable overall to the high quality CanFam_GSD assembly. The increasing number of available canid reference genomes allows us to examine the impact the choice of reference genome makes with regard to reference genome quality and breed relatedness.

By aligning short read data from 58 representative dog breeds to three reference genomes, we demonstrate how the choice of reference genome significantly impacts both read mapping and variant detection. Further, we generate a conservative list of structural variant calls using a consensus of both Pacific Bioscience and Oxford Nanopore long reads to identify large structural breed differences. Collectively this work highlights the importance the choice of reference genome makes in canid variation studies.

Conclusions The growing number of high-quality canid reference genomes means the choice of reference genome is an increasingly critical decision in subsequent canid variant analyses. The basal position of the Basenji makes it suitable for variant analysis for targeted applications of specific dog breeds. However, as is increasingly being employed in other model organisms, we believe more comprehensive analyses across the entire family of canids is more suited to a pangenome approach.

Authors: Richard J. Edwards, Matt A. Field, James M. Ferguson, Olga Dudchenko, Jens Keilwagen, Benjamin D. Rosen, Gary S. Johnson, Edward Rice, LaDeanna Hillier, Jillian M. Hammond, Samuel G. Towarnicki, Arina Omer , Ksenia Skvortsova, Ozren Bogdanovic, Robert A. Zammit, Erez Lieberman Aiden, Wesley C. Warren, J. William O. Ballard