VIRUSBreakend: viral integration recognition using single breakends

Integration of viruses into infected host cell DNA can causes DNA damage and can disrupt genes. Recent cost reductions and growth of whole genome sequencing has produced a wealth of data in which viral presence and integration detection is possible. While key research and clinically relevant insights can be uncovered, existing software has not achieved widespread adoption, limited in part due to high computational costs, the inability to detect a wide range of viruses, as well as precision and sensitivity.

Here, we describe VIRUSBreakend, a high-speed tool that identifies viral DNA presence and genomic integration recognition tool using single breakend variant calling. Single breakends are breakpoints in which only one side has been unambiguously placed. We show that by using a novel virus-centric single breakend variant calling and assembly approach, viral integrations can be identified with high sensitivity and a near-zero false discovery rate, even when integrated in regions of the host genome with low mappability, such as centromeres and telomeres that cannot be reliably called by existing tools.

Applying VIRUSBreakend to a large metastatic cancer cohort, we demonstrate that it can reliably detect clinically relevant viral presence and integration including HPV, HBV, MCPyV, EBV, and HHV-8.

Authors: Daniel L. Cameron, Anthony T. Papenfuss