Characterising rAAV integration in human genomic DNA with long nanopore reads

Adeno-associated virus (AAV) is a non-enveloped, single-stranded DNA virus that can be used as a vector, due to its non-pathogenic nature and ability to deliver genetic material — its ‘viral payload’ — into a host genome. In gene therapy, it is possible to use recombinant AAV (rAAV) vectors with integrated engineered transgene cassettes to introduce genetic material into a host genome for the expression of therapeutic genes. On-target integration is essential for gene therapy to have the potential to achieve stable correction of genetic disorders, as well as for the prevention of unwanted mutations that may arise from off-target integration.

Integrated viral DNA can be found in the host genome via homologous recombination (HR) and in extra-chromosomal episomes, both of which express the integrated genes. However, as transduced cells divide, episomal DNA in the cells does not replicate; therefore, it is imperative that the AAV vector is also integrated into the host genome. To characterise all integration sites in the host genome and to determine whether they are on target, broad characterisation across both the viral payload and the unique genomic DNA sequence adjacent to the integration event is required.

A range of methods are available to evaluate AAV integration, including FISH, plasmid rescue, and LAM-PCR. However, detecting AAV integration events with these short-read methods can be difficult, due to the high background of non-integrated viral genomes which can lead to artifacts that may be incorrectly identified as integration events. Cells transduced with AAV also have very high numbers of episomal DNA sequences integrated with the viral genome. This can create challenges in detecting true integration events in the human genome with short sequencing reads, as the episomal DNA and host genome will contain long stretches of identical DNA that are difficult to differentiate. Episomal DNA is also predominantly sequenced over integrated AAV in the host genome due to the excessive copies of episomes per cell. Furthermore, these methods only reveal part of the picture, such as the integration junction sequence or the full-length integrated viral DNA. To thoroughly characterise HR-driven rAAV integration, sequencing reads must span the viral payload, homology arm, and unique genomic sequence adjacent to the integration event.

'proper characterization requires long-read sequencing to ensure that integrated viral DNA is examined and not the more prevalent episomes.'

Phenylketonuria (PKU) is a rare inherited disorder that causes the loss of phenylalanine hydroxylase (PAH) — an enzyme that breaks down the amino acid phenylalanine — causing protein build-up in the blood and brain, leading to brain damage. In a study by Prout et al., the use of rAAV to correct PAH gene expression was investigated1. Codon-optimised human PAH with homology arms was packaged into an AAV vector. The rAAV was then administered into a humanised-liver mouse xenograft model to induce HR-mediated integration in the human PAH locus for gene expression of the enzyme.

Using long nanopore sequencing reads, the team developed a genome-wide integration assay to accurately assess and characterise the HR-driven rAAV integration in the xenograft model. With splinter primers and nested PCR, selective amplification of the wild-type genomic DNA and the integrated DNA was achieved to generate amplicons of 1.4 kb — long enough to span both the viral payload on one end of the homology arm and the non-viral genomic DNA on the other end. These amplicons were sequenced using nanopore technology to generate full-length reads for comparison between integrated and wild-type genomic DNA to detect integration events, overcoming the challenges of short-read sequencing.

'We used Oxford Nanopore to characterize integrated DNA and scan the whole genome for off-target integrations.'

With long nanopore sequencing reads, Prout et al. demonstrated that AAV integration events, both on- and off-target, can be detected; here, no evidence of off-target integration was observed. The team were able to sequence across the whole viral payload, the complete homology arm and unique genomic sequence to distinguish integration events in the host genome, independent of genomic location to characterise target integration. Artefactual false positives resulting from artifacts — a common issue with short-read sequencing due to the high background of non-integrated AAV genomes — were also not observed above the limit of detection with long nanopore reads.

The authors concluded that ’the use of long-read sequencing is critical in these studies to ensure that genomic DNA is being analysed and not the much higher copy number [of] episomal DNA. Confirming the desired integration is a crucial safety assessment for the integrating vectors.’

1. Prout, J. et al. Genome-wide assays to characterize rAAV integration into human genomic DNA in vivo. bioRxiv (2023). DOI: https://doi.org/10.1101/2023.08.22.554338