Nanopore sequencing of the CYP2D6 pharmacogene

Yusmiati Liau (University of Otago) began her talk by introducing CYP2D6, an important pharmacogene that encodes a liver enzyme responsible for the metabolism and bioactivation of ~25% of drugs, including codeine and the cancer drug tamoxifen. Variations in the gene between individuals are important in understanding why some people experience adverse reactions to drugs, or why they won't work for others. She described how the 4.4 kb gene is highly polymorphic, despite its small size, and also shares >90% similarity with the pseudogene CYP2D7. Yusmiati also showed how the CYP2D6 gene can be deleted or duplicated; furthermore, conversion of CYP2D7, hybrids of the two genes, or tandem rearrangements can occur. All of these factors make genotyping very challenging. Current methods of CYP2D6 genotyping rely on short-read sequencing, which can result in misalignment with pseudogene CYP2D7, or qPCR assays, which targets pre-aligned variants and can result in mis-genotyping and so mis-phenotyping. Yusmiati noted that there is also a lack of variant phasing in these methods.

Yusmiati set out her team's three aims using nanopore sequencing of CYP2D6: to accurately detect all variants, determine haplotypes, and detect duplicated alleles. Seven reference samples from Coriell and 25 clinical samples from an adverse drug reaction cohort were selected for sequencing, including a broad range of CYP2D6 haplotypes and four with known gene duplications. Long-range PCR was used to enrich for CYP2D6; the amplified libraries were then barcoded and sequenced in multiplex on the GridION device. Reads were basecalled via Flip-Flop, demultiplexed using Porechop, then filtered with NanoFilt. The first run resulted in 3 million reads, but only 4% on-target, due to non-specific PCR products: filtering enriched for target reads up to 30%, and the lowest depth of coverage was at 200x - "more than enough to genotype." Different combinations of mapping tools (minimap2 and NGMLR) and variant calling tools (nanopolish and Clairvoyante) were tested using the 7 reference samples: minimap2 and nanopolish was found to be the best combination, and was used for the entire cohort. Yusmiati described how nanopolish quality scores were found to be more indicative of the trueness of variants, and were used to filter out false positives.

The analysis identified 70 true variants in the 32 samples, 21 of which were assigned to known alleles at the subvariant level. Yusmiati called it "quite amazing" that they were able to match samples completely to specific variants using the nanopore data. Variant phasing was also achieved using the tool WhatsHap - Yusmiati described how the phasing "is very clean, and we are very confident". The remaining 11 samples were found to feature a novel allele, or novel subvariants of known alleles, which have been submitted to PharmVar. Lastly, duplicated alleles were also detected; this is traditionally achieved by targeting a 3.5 kbp region specific to the duplicated copy, as well as the previous 6.6 kbp target, via duplex PCR, but this method can't distinguish which allele is duplicated, which is important in determining phenotype. With nanopore sequencing, this was easily achieved by looking at the allelic ratio between the haplotypes - a 2:1 ratio identifies the duplicated allele. Yusmiati also noted that some variants were missed where they created or disrupted homopolymeric regions, but these could be accurately detected by visually checking BAM files. She also noted that whole-gene deletion is not possible to detect with traditional methods that rely on PCR.

For future directions, Yusmiati and her team plan to use this method to look at further samples from a new cohort. They also intend to explore CRISPR-Cas9 enrichment of native DNA to sequence particularly complex alleles.

Authors: Yusmiati Liau