stLFRsv: a germline SV analysis pipeline using co-barcoded reads

Co-barcoded reads originated from long DNA fragment (mean length larger than 50Kbp) with barcodes, maintain both single base level accuracy and long range genomic information. We propose a pipeline stLFRsv to detect structure variation using co-barcoded reads. stLFRsv identifies abnormally large gaps between co-barcoded reads to detect potential breakpoints and reconstruct complex structure variations.

The barcodes enabled co-barcoded reads phasing increases the signal to noise ratio and barcode sharing profiles are used to filter out false positives. We integrate the short reads SV caller smoove for smaller variations with stLFRsv. The integrated pipeline was evaluated on the well characterized genome HG002/NA24385 and obtained precision and recall rate of 74.2% and 22.3% for deletion on the whole genome. stLFR found some large variations not included in the benchmark set and verified by means of long reads or assembly.

Our work indicates that co-barcoded reads technology has the potential to improve genome completeness.

Authors: Junfu Guo, Chang Shi, Xi Chen, Ou Wang, Ping Liu, Huanming Yang, Xun Xu, Wenwei Zhang, Hongmei Zhu