Selective single-molecule sequencing and the assembly of human Y chromosomes - Tomas Marques-Bonet


The Mini Theatre sessions are designed to engage and inspire the audience, and the opening talk delivered by Tomas Marques Bonet certainly didn’t disappoint. Thomas, Principal Investigator of the Comparative Genomics group at the University Pompeu Fabra and Institute of Evolutionary Biology, presented his team’s research utilising long-read nanopore sequencing to characterise the human Y chromosome. Describing it at the ‘forgotten chromosome’, Tomas explained how mammalian Y chromosomes are often neglected from genomic analyses due to their inherent assembly difficulties. Challenges include high palindromic repeat content, indels, and translocations, which have been linked to medically-relevant phenotypes. Traditional techniques of sequencing the Y chromosome have included flow sorting and amplification-based short-read sequencing; however, these techniques don’t provide the whole picture due to the introduction of amplification bias and removal of epigenetic modifications. As a result, to date, only a handful of species have had their Y chromosome fully characterised. Indeed, only a single human reference-quality Y chromosome, which is of European ancestry, is currently available. In order to redress this gap in our genomic knowledge, Tomas and his colleagues developed a new methodology to sequence native, unamplified, flow sorted DNA using long-read nanopore sequencing. In a proof-of-concept experiment approximately 9 million Y chromosomes were sorted from the lymphoblastoid cell line (HG02982), whose haplogroup (A0) represents one of the earliest known human lineages. Using a single MinION sequencing run the team were able to generate 25x coverage of the Y chromosome.

After basecalling using Guppy, sequence assembly was performed using Canu, with subsequent polishing using Medaka, Racon, and Pilon. The final Y chromosome assembly revealed a significant improvement over previous, comparable short-read sequencing methods — increasing contiguity by 800%. This technique resulted in the first ever highly-contiguous assembly of a Y chromosome of African origin. The contig N50 for this assembly is 1.5 Mb, which Tomas described as ‘really good, considering the complications [inherent with this chromosome]'.

Based on these successful results the team proceeded to apply the same methodology to ten additional, genetically diverse, human lineages, the majority of which are of African descent. Work is ongoing but initial results from eight of the samples revealed similar, highly-contiguous assemblies, with a median contig N50 of 1.34 Mb and a range of 0.91-4.51 Mb. Further, two of the assemblies contained almost the entire Y chromosome p-arm in a single contig, which, according to Tomas, will facilitate much more in-depth analysis of this previously neglected chromosome.