GASOLINE: Germline And SOmatic structuraL variants detectioN and gEnotyping
- Home
- GASOLINE: Germline And SOmatic structuraL variants detectioN and gEnotyping
)
Identification of SVs from long-reads data requires complex computational methods, which are based on intra- and inter-alignment SV signatures (gapped/split- read alignments) approaches. These methods allow detection of deletions, inversions and translocations of any size, with insertions and duplications limited by read length. The SV signatures are clustered by reciprocal overlap of genomic coordinates, possibly leading to partial recovery and underestimation of allelic fraction, and genotyping errors. Furthermore, owing to high error rates, alignment of long reads can generate imprecise genomic coordinates of SV signatures with variances of tens of bp, which may prevent identification and clustering of SVs signatures generated by small events (50-500 bp). Large SVs (tens or hundreds of kb) are less affected by error rate, yet they need large reciprocal overlap to prevent inclusion of signatures from other events, possibly leading to the underestimation or loss of small SVs signatures. A further limit of most currently available computational tools is their poor flexibility for analyses of sample pairs. Since they are designed to detect germline variants from single samples, identification of somatic variants, for example, requires separate analyses of paired samples and discarding SVs from germline samples. To overcome these limits, we developed a novel tool, GASOLINE (Germline And SOmatic structuraL varIants detectioN and gEnotyping), which groups SV signatures using a sophisticated clustering procedure based on a modified reciprocal overlap criterion (Normalized Reciprocal Overlap, NRO) and allows detection of both large and small SVs with high accuracy. We extensively tested the new tool on simulated and real cancer datasets and we demonstrated that it outperforms NanomonSV in the detection of small and large somatic variants. Notably, when applied on COLO829 cell line and matched normal sample, GASOLINE identified 6 genuine somatic SVs that were missed by Valle-Inclan et al. by using five different sequencing technologies and state-of-the-art SV calling approaches.