ntJoin: Fast and lightweight assembly-guided scaffolding using minimizer graphs
- Home
- ntJoin: Fast and lightweight assembly-guided scaffolding using minimizer graphs
The ability to generate high-quality genome sequences is cornerstone to modern biological research. Even with recent advancements in sequencing technologies, many genome assemblies are still not achieving reference-grade.
Here, we introduce ntJoin, a tool that leverages structural synteny between a draft assembly and reference sequence(s) to contiguate and correct the former with respect to the latter.
Instead of alignments, ntJoin uses a lightweight mapping approach based on a graph data structure generated from ordered minimizer sketches. The tool can be used in a variety of different applications, including improving a draft assembly with a reference-grade genome, a short read assembly with a draft long read assembly, and a draft assembly with an assembly from a closely-related species.
When scaffolding a human short read assembly using the reference human genome or a long read assembly, ntJoin improves the NGA50 length 23- and 13-fold, respectively, in under 13 m, using less than 11 GB of RAM.
Compared to existing reference-guided assemblers, ntJoin generates highly contiguous assemblies faster and using less memory.
Availability and implementation: ntJoin is written in C++ and Python, and is freely available at https://github.com/bcgsc/ntjoin.