Efficient de novo assembly of telomere-to-centromere human genomes

50% improvement in NG50 of the nanopore human genome assembly was achieved with current Shasta v0.4 vs. the original Shasta version – from ~20 Mb to ~30 Mb; with ultra-long reads this almost doubled to ~58 Mb.

Human genome assembly time reduced to ~3h with current Guppy basecaller and Shasta v0.4, compared to 6h originally described in their Nature Biotech. publication.

Benedict: “with Shasta and PromethION sequencing, we think that we are achieving efficient, cost-effect, highly contiguous de novo assembly, and making that a practical reality”.

With ultra-long nanopore sequencing, telomere-to-centromere chromosome arm assembly is possible for the majority of chromosome arms.

With their diplotyping pipeline, SNV calling performance ‘was actually better than [on] short-read [data]...which is really exciting’.

Benedict: ‘in regions that are defined as low mappability, we clearly have an advantage’; short-read data maps less well than nanopore data, explaining the poorer SNV calling.

This is the first demonstration that long-read diplotyping can outperform short-read genotyping.

Introducing the Human Pangenome project, Benedict explained: ‘Genomics is failing on diversity – we need to increase the number of complete genomes that we have from a diversity of different human populations, to more fully understand our genetic heritage’.

Authors: Benedict Paten