npInv: accurate detection and genotyping of inversions using long read sub-alignment

Background
Detection of genomic inversions remains challenging. Many existing methods primarily target inversions with a non repetitive breakpoint, leaving inverted repeat (IR) mediated non-allelic homologous recombination (NAHR) inversions largely unexplored.

Result
We present npInv, a novel tool specifically for detecting and genotyping NAHR inversion using long read sub-alignment of long read sequencing data. We benchmark npInv with other tools in both simulation and real data. We use npInv to generate a whole-genome inversion map for NA12878 consisting of 30 NAHR inversions (of which 15 are novel), including all previously known NAHR mediated inversions in NA12878 with flanking IR less than 7kb. Our genotyping accuracy on this dataset was 94%. We used PCR to confirm the presence of two of these novel inversions. We show that there is a near linear relationship between the length of flanking IR and the minimum inversion size, without inverted repeats.

Conclusion
The application of npInv shows high accuracy in both simulation and real data. The results give deeper insight into understanding inversion.

Authors: Haojing Shao, Devika Ganesamoorthy, Tania Duarte, Minh Duc Cao, Clive Hoggart, Lachlan Coin