Products
Applications
Store Resources Support
About
Support

Towards detecting low-frequency variants in highly fragmented circulating cell-free DNA

Poster

Date: 5th December 2019

Using a combination of hybrid target enrichment and unique molecular identifiers (UMIs) to enrich and detect low-frequency variants in short fragments of DNA

Download the PDF

Fig. 1 Cell-free DNA a) entering the bloodstream b) laboratory workflow

Short cell-free fragments of DNA from lysed cells are present in the bloodstream

Blood plasma contains short fragments of cell-free DNA, thought to be released through apoptotic cell cleavage. Cell-free DNA is typically highly fragmented, in the range of 100–200 bp. Additionally, in cancer patients, circulating DNA from lysed tumour cells (ctDNA) is present (Fig. 1a). The concentration of ctDNA can be as little as 0.1% of all cell-free DNA in patients with early-stage disease. Targeting specific oncogenes in cell-free DNA from patients allows mutations to be detected and tracked in order to help understand disease progression. To help detect low-frequency variants, UMIs can be added to DNA fragments prior to sequence capture using biotinylated baits. After PCR and sequencing, the UMIs can be used to cluster and polish the reads resulting in a high-accuracy single molecule consensus sequence (Fig. 1b).

Fig. 2 Assay performance a) read length b) mean depth of targeted regions c) specificity

Capturing oncogene-specific fragments using the Roche Avenio ctDNA kit

We mixed fragmented genomic DNA from human genome NA14097, which contains an A -> C SNP in Exon 4 of the BRCA1 gene, with NA12878 to give a final variant frequency of 5%. The DNA was fragmented to around 160 bp to mimic the fragment-length distribution of cell-free DNA (Fig. 2a) and UMIs were added. We then enriched the sample for specific oncogenes using the Roche Avenio ctDNA targeted kit, prior to sequencing on a PromethION, and we aligned reads to the NA12878 reference using minimap2 and filtered for primary alignments. A slight shift towards shorter reads can be seen, as a result of the hybrid capture and PCR processes (Fig. 2a). We achieved high read depth for each targeted region within the 17 oncogenes in the panel (Fig. 2b) and alignments showed high specificity for the enriched regions (Fig. 2c).

Fig. 3 UMIs a) workflow b) % of reads assigned a UMI c) sequence accuracy and cluster size

Obtaining high single-molecule consensus accuracy using unique molecular identifiers

Following quality-score filtering we ran raw nanopore reads through a custom UMI-clustering and polishing pipeline (Fig. 3a). In brief, reads are aligned to the human reference genome Hg38 and separated by target regions. UMIs are identified and extracted from the 5' and 3' ends of reads allowing for a maximum edit distance of three when compared to known regions in the semi-random pattern used to design the UMIs (Fig. 3b). The extracted UMIs are clustered using vsearch. Reads with more than three differences to their respective cluster consensus UMI are discarded. Subsequently, a high-accuracy consensus read is computed for each cluster using Racon and Medaka. When starting from a cluster size of 8 the median consensus read accuracy exceeds 99% and from a cluster size of 18 upwards reaches 100% (Fig. 3c).

Fig. 4 BRCA1 exon 4 a) alternative base frequency b) consensus accuracy c) aligned reads

Identifying low-frequency variants from high-accuracy collapsed consensus reads

The clustered, polished and collapsed consensus reads are each derived from single parent molecules. The process removes random errors and allows quantification of low-frequency without amplification bias. We aligned these collapsed consensus reads to the human reference genome to identify the expected low allele frequency SNP in NA14097. The majority of sequenced bases matched the reference sequence across the BRCA1 exon except at the location of the expected A > C substitution (Fig. 4a). The median accuracy of the polished reads was in excess of 99.9% (Fig. 4b) and the expected A > C substitution could be clearly seen in IGV with the allele frequency threshold for the coverage track set to 4% (Fig. 4c). Future work will focus on detecting lower levels of variants across a wider selection of loci.

© 2019 Oxford Nanopore Technologies. All rights reserved. Oxford Nanopore Technologies' products are currently for research use only. 

Recommended for you