Nanopore sequencing of cervical cancers uncovers novel genomic, epigenomic, and transcriptomic features associated with HPV integration events

Vanessa (Canada’s Michael Smith Genome Sciences Centre, Canada) introduced how oncogenic HPV types are necessary but not sufficient in the development of cervical cancer. HPV infections are typically cleared in 1-2 years, but persistence of latent infection of a high-risk HPV type over several years eventually leads to the accumulation of mutations associated with high-grade lesions.

The HPV genome is made up of early and late genes, associated with the stage of the viral life cycle that they are expressed. E6 and E7 are key oncoproteins; integration of the viral genome into the host genome occurs in 70% of HPV-16 infected tumours and all HPV-18 infected tumours and this is associated with increased expression of these oncogenes.

Cervical cancer remains endemic in resource-limited countries; in sub-Saharan African females, cervical cancer is the second most prevalent cancer type and the number one cause of cancer mortality.

HPV integration into the host genome

HPV integrates nearby cancer genes (e.g. TP53 and MYC) in the host genome, and often causes dysregulation of host gene expression. Vanessa’s team are using nanopore sequencing to characterise these integration sites in HPV tumours. The complexity of integration has been challenging to resolve accurately using short-read sequencing; nanopore sequencing provides longer reads, and therefore greater breadth of coverage to phase reads into haplotypes, as well as read-specific DNA methylation calling.

The objective of Vanessa’s project is to identify the genomic and epigenomic correlates associated with human and viral gene dysregulation around HPV integration events using nanopore sequencing technology. To achieve this objective, her team aim to generate high-quality whole genome and transcriptome data from >50 cervical cancer samples. From this, they would identify and analyse HPV integrations and their associated structural variants (SVs), and assess the impact of integrations on local methylation profiles. They would also verify expression and the structure of gene fusions resulting from the SVs.

To approach these aims, Vanessa’s team is using the PromethION sequencing platform for both whole-genome and whole-transcriptome (PCR-cDNA) nanopore sequencing of HPV tumours. Their data analysis pipeline consists of mapping, structural variant and CNV calling, de novo assembly, methylation, haplotype phasing, and fusion calling.

The cervical tumour research samples for her project come from two cohorts: the HIV-Tumour Molecular Characterisation Project (HTMCP) in Uganda, and the Cancer Genome Atlas (TCGA) in the USA. Her team had already acquired short-read data for a number of samples in these two cohorts, and they aimed to use these data to supplement the nanopore sequence data. So far, four samples have been sequenced and analysed – two from each cohort; one of these four samples has also undergone transcriptome analyse with nanopore cDNA sequencing, obtaining a ‘particularly impressive yield of over 100 gigabases, and 111 million reads’.

Results: HPV integration and structural variants

Vanessa discussed her results on the identification of HPV integration sites and events; this was achieved using sniffles to call translocations between HPV and human chromosomes (≤ 5 reads), then using a custom code to group HPV sites that co-occurred on the same read into multi-site integration events, and lastly extracting HPV-containing reads for each event for downstream assembly, methylation calling, and transcript mapping.

Cervical cancer tumours tend to have either one clonal integration event, or multiple integrations across the genome, one or several of which can be subclonal. The data suggested that two samples had a multi-integration phenotype, and two had a single-integration phenotype. Vanessa explained that the focus of the rest of her talk would primarily focus on one of the two multi-integrated samples, which displayed 9 HPV integration events and 40 HPV integration sites.

The first integration event of note involved at least three chromosomes and translocations between ERBB2 and UPK1B. UPK1B and B4GALT4 genes on chromosome 3 were heavily amplified, and had the highest density of HPV integration sites. There were translocations that connected ERBB2 on chromosome 17, which had two HPV integration sites. From there, there was a subclonal translocation to a small intronic portion on chromosome 7, which also contained an HPV integration site. Vanessa described a second complex integration event involving three genes on chromosome 12, which involved a rearrangement resulting in an inversion and duplication of the region, causing an inappropriate gene fusion. The third example shown involved two integration sites 50 Mbp from each other in the host genome on chromosome 3; ‘this type of phenomenon would be difficult to find if not for long-read sequencing and our integration site probing technique’. This was found to be extrachromosomal circular DNA.

Another question of interest was whether HPV integration events occurred only on one haplotype. For most of the events, integrations were found to be mono-allelic.

DNA methylation at HPV integration events

Vanessa’s team profiled methylation at HPV integration events. In one sample where they discovered two HPV integration sites on opposite haplotypes, they observed contrasting methylation patterns between the alleles. These integration sites occurred at the TP63 locus; an intragenic site on one allele was highly methylated, and an intergenic site on the other allele was unmethylated. This unmethylated integration site overlapped with several known TP63 enhancers. Interestingly, HPV-containing reads that mapped to the TP63 enhancer were less methylated than human reads mapping to the same location. They therefore hypothesised that HPV integration was contributing to demethylation of regulatory regions on the human DNA, and thereby increased activation of the region.

The team were also interested in how HPV itself becomes methylated at the integration sites. Specifically, they analysed methylation of CpGs in the long control region (LCR) of the HPV genome, which contains early enhancers and promoters that control expression of early genes, including E6 and E7. The LCRs were less methylated than the E6 and E7 genes across the different integration events.

Transcriptional activity at HPV integration events

In the last section of her talk, Vanessa explained how multiple expressed HPV transcript species were identified from de novo assembly of HPV polycistronic transcripts. Five major versions were seen of the early transcript, each with different coding potentials and splice variants of the HPV genes. The most highly expressed transcript is commonly seen to be expressed in cancers; the second most highly expressed transcript had not been described before, to Vanessa’s knowledge.

They also investigated expression of human gene fusions present in the sample. Half of the expressed gene fusions (322/686) were found to also involve HPV integration. Expression of one fusion, B4GALT4:ERBB2, was found to be as high even as some housekeeping genes.


Vanessa concluded firstly that HPV integration sites often involve complex structural variants, and that these can be delineated using nanopore sequencing. Secondly, measuring DNA methylation across HPV integration events reveals how HPV is affecting the epigenome on the affected haplotype. And thirdly, nanopore cDNA sequencing can identify de novo HPV and human transcripts and identify expressed HPV-containing gene fusions.

In future, the team plan to sequence more cervical cancer research samples from the two cohorts using nanopore technology. With these data, they will investigate the overall patterns of how HPV integration in different HPV types affects the structure and regulation of cervical cancer genomes.

Authors: Vanessa Porter