In cervical tumours the integration of human papilloma viruses (HPV) transcripts often results in the generation of transcripts that consist of hybrids of viral and cellular sequences. Mapping data using a variety of techniques has demonstrated that HPV integration occurred without obvious specificity into human genome. However, these techniques could not demonstrate whether integration resulted in the generation of transcripts encoding viral or viral-cellular sequences. The aim of this work was to map the integration sites of HPV DNA and to analyse the adjacent cellular sequences.
Amplification of the INTs was done by the APOT technique. The APOT products were sequenced according to standard protocols. The analysis of the sequences was performed using BLASTN program and public databases. To localise the INTs PCR-based screening of GeneBridge4-RH-panel was used.
Twelve cellular sequences adjacent to integrated HPV16 (INT markers) expressed in squamous cell cervical carcinomas were isolated. For 11 INT markers homologous human genomic sequences were readily identified and 9 of these showed significant homologies to known genes/ESTs. Using the known locations of homologous cDNAs and the RH-mapping techniques, mapping studies showed that the INTs are distributed among different human chromosomes for each tumour sample and are located in regions with the high levels of expression.
Integration of HPV genomes occurs into the different human chromosomes but into regions that contain highly transcribed genes. One interpretation of these studies is that integration of HPV occurs into decondensed regions, which are more accessible for integration of foreign DNA.
Cervical cancer is the second most common cause of cancer related mortality in women world-wide. Most cervical cancers are squamous cell carcinomas that develop through a distinct pattern of morphological progression. Cervical tumours are associated at a high frequency with infection by human papilloma viruses (HPV) and sequences of the so-called high risk HPV (types 16 and 18 and related) are detected in nearly every tumour examined . Viral DNA persists in tumour cells in episomal and/or integrative forms with the retention of the two of viral transforming genes (E6 and E7) in all tumours analysed. The expression of viral sequences is controlled by sequences within the upstream regulatory region (URR) that is located upstream of the E6 gene. Viral transcripts include a variety of spliced RNA. In cases where the episomal form of HPV predominates, the full expression of E6 and E7 genes occurs (few splicing forms), while in the case of integrative form – the expression of cellular sequences downstream to 3' of viral sequences also can be detected in the form of fused viral-cellular RNAs . It is also possible that, in some cases, after integration viral sequences become "silent" [3, 4].
Analysis of integration sites based on different techniques and their mapping in the human genome revealed that DNA integration of different HPV types occurred in different chromosomal sites without visible specificity [5–12]. It was demonstrated that in some case sites of viral genome integration mapped to regions of human genome that often underwent chromosomal rearrangements and deletions. In other cases HPV integration sites were mapped to so called fragile sites, or in regions where genes that are directly or indirectly involved in the control of cell proliferation have been localised. Not all of these methods give precise and adequate data and in addition they can not discriminate between "silent" and integrated viral DNA. Furthermore, in many cases these methods do not allow precise physical mapping of integrative viral sequences. The use of the so-called APOT techniques have greatly simplified the analysis of expressed integration sites and allowed characterisation of a large number of integration sites through analysis of expressed joint viral-cellular sequences. The major conclusion from these studies was that integration is non-specific .
Although analysis of many integration sites has already been described, a detailed examination using different techniques seemed important, since it provides additional information concerning the interaction of viral and host genomes and the role of this process in genetic program of cancer cell.
The primary aim of this work was the physical mapping of the integration sites of HPV 16 DNA in chromosomes of human cervical squamous cell carcinomas by isolation of integration sites by APOT technology. This was followed by mapping of the expressed virus-cellular sequences generated by integration using PCR screening of a panel of radiation hybrids of somatic cells as well as database analysis of cellular sequences located adjacent to integration sites.
All tumour samples were collected during surgery in the clinics of Cancer Research Centre (Moscow) and were kept frozen in liquid nitrogen. DNA and RNA isolation and HPV typing were performed according to techniques described earlier . Only squamous cell carcinomas containing HPV 16 sequences were analysed.
Amplification of fusion transcripts
Human genomic sequences adjacent to integrated HPV 16 DNA from squamous cell cervical carcinomas have been isolated with the use of the APOT technique  based on reverse transcription with further two step amplification of RT-product [3, 2] (HPV integration sites – INT). This technique was described earlier in detail [2, 3].
Briefly reverse transcription was performed using an adapter linked oligo(dT)-primer  followed by semi-nested PCR using E7-specific 5' primed primers and specific oligo(dT)/adapter primers (3' primed) [2, 12, 3]. PCR products were transferred onto nylon membranes and hybridised with HPV E7 and E4 specific probes to discriminate episomal from integration derived transcripts. PCR products containing integration sites were excised from the gel and extracted using the QIAGEN Gel extraction kit (Qiagen, Hilden, Germany). Sequencing reactions were performed using Big-Dye terminator DNA-sequencing Kit (Perkin-Elmer, Boston, USA) and an ABI Prism 310 Genetic analyzer (Applied Biosystems, Foster City, USA). Sequencing results were analysed using BLASTN-program provided by the National Cancer Insitute. The sequences of the 12 studied INTs were submitted in EMBL Database (AccN: AJ431608 – INT254, AJ431609 – INT259, A431610 – INT431, AJ431611 – INT407, AJ631612 – INT290, AJ431614 – INT505, AJ431615 – INT477, AJ431616 – INT467, AJ431617 – INT466, AJ431618 – INT421, AJ431619 – INT475, AJ431620 – INT423).
Physical mapping of HPV 16 DNA on human chromosomes
To localise INTs, PCR-based screening of a somatic cell radiation hybrid (human/hamster) GeneBridge4-RH-panel (Research Genetics Inc., USA) was applied as described earlier .
PCR was performed in 12.5 μl reactions, using a PTC-100™ thermocycler (MJ Research Inc., USA). Some modifications were introduced, including the "hot start" technique, as recommended in the instructions for Maxi-Taq™, manufactured by Biokom Inc. (Russia). The amount of DNA per tube was 8–10 ng. The nucleotide sequences of the primers used in these studies are presented in Table 1. Conditions for annealing, priming, and other parameters were optimised using total human and hamster DNAs (Table 1). The analysis of PCR products using agarose and polyacrylamide gels was performed following standard protocols.
The data on RH-mapping are presented in March 2001 and the data on mapping of markers on chromosomes and genomic contigs – to September 2001. Information about search of homologies corresponds to the public databases for January 2002.
Detection of DNA HPV 16 integration sites
APOT technique was used to isolate human genomic sequences adjacent to integrated HPV 16 DNA (HPV integration sites – INTs) expressed in squamous cell cervical carcinomas. This technique is based on a modification of RT-PCR methods using specific primers, one of which was localised in E7 region of viral genome and a second one which contains specific adapter sequences joined with oligo(dT) [2, 3]. As a result we generated fusion transcripts that encompassed parts of the human genomic DNA. Some of the transcripts may be spliced into non-coding sequences within the 3' untranslated part of the respective genomic mRNAs and others – into intronic regions close to the transcription initiation site of the respective gene as was recently shown by Wentzensen et al. . We have not analysed in detail these types of fusion transcripts, but in the all cases our methods allowed discrimination between expressed episomal and integrative forms of HPV DNA.
Twelve INT sequences have been characterised from different individual squamous cell cervical carcinomas. The length of the cellular sequences containing polyA tails and fused to viral sequences varied from 140 to 450 nucleotides. The sequences were submitted in EMBL Database (see Methods section for AccN) and used for searching homologies to human genomic sequences, genes and ESTs. Corresponding primers were created to these cellular sequences at integration sites and were used for screening of RH-panel of somatic cell hybrids (Table 1).
Analysis of nucleotide sequences homologous to integration sites
Searches for human nucleotide sequences that were homologous to INT markers were carried out using BLASTN program and public databases as described in Materials and Methods. Homologies greater than 90% for sequences not less than 100 nucleotides in size were considered significant.
Homologies with cDNA clones were found for only 5 of the 12 analysed INT clones and this may in part be due to the relatively small size of the markers examined (150–350 base pairs). Homologous genomic sequences were found for 11 INT markers (Table 2, see Additional file 1). This allowed us to extend significantly the search for EST homologies to longer cellular sequences (up to 2.000 bp) that flanked integrated viral genome. For almost all markers (excluding INT431, INT254 and INT407) homologies with cDNAs that are highly similar to cDNAs for known genes or with non-identified cDNAs were found.
In the cases where homology with the same gene or EST was detected both with INT markers and with adjacent cellular sequences, one may conclude that (markers INT290, INT505, INT466 and INT423) HPV DNA had integrated into a human gene in this cervical carcinoma. The integration of viral DNA occurred into the terminal exon of GLS gene and provides a clear example of integration into an actively transcribed gene (marker INT423).
An integration site marked as INT466 is of special interest as it is highly likely that viral DNA is incorporated into the exon 5 of the interferon/beta receptor like gene (LOC152028). One part of the marker is homologous to the exon 5 and the other part is homologous to the exon 6 of this gene.
Marker INT290 was found to be homologous to two genes: to the WASF2 gene and the gene for the protein similar to WASF2 protein (LOC158537) located on the different chromosomes (1p36.11-34.3 and Xp11.3 accordingly).
The corresponding human genomic sequence was not found for one of the markers (INT467) although a high level of homology was found with a cDNA similar to mRNA of multicopy gene 40S ribosomal protein S27 (MPS1), which is localised on several human chromosomes (1, 2, 3, 4, 5, 6, 7, 11, 12, 15, 18, 19).
In four cases homologies were not found between INT markers (INT259, INT477, INT421 and INT475) and any genes or ESTs. But the sequences flanking these markers from 5'- or 3'-ends were homologous to ESTs (Table 2 see Additional file 1). For three INT markers (INT254, INT407 and INT431) and their adjacent cellular sequences homologies to genes or ESTs were not found. For this reason, the physical locations of these markers were determined using RH-mapping. All variants of cDNAs that are homologues to INT markers and flanking genomic sequences are present at high levels in expression databases (10–50 clones) indicating a high level of expression. For this study, clones were obtained from different human normal organs and tissues, as well as from different pathologies (including tumours).
RH-mapping of INT markers
The known locations of the cDNAs and human genomic sequences identified above allowed us to localise nine INT markers, which are highly homologous to these cDNAs (Table 2, see Additional file 1). Two INT markers were found to have multiple chromosomal localisation (INT467 and INT290). In seven cases the locations of the integration sites are specific for each tumour and the markers are present as a single copy in one of the chromosomes. RH-mapping has been used to localise INT254, INT431 and INT407, for which there were no homologies to genes or ESTs and INT290. In addition, the location of the INT259 was also determined using RH-mapping because the lengths of the homologous sequences were too short (104 bp and 59 bp).
Altogether five INT markers were localised on the radiation hybrid map of Whitehead Institute (WI-RH-Map) as a result of our screening (Table 3, see Additional file 2). To allow for a convenient comparison of our data with the public databases, we converted the INT marker positions in Whitehead Institute RH-map (WI-RH-Map) into the GeneMap99-GB4 (GM99-GB4) (Table 3, see Additional file 2). This recalculation should not generate any errors because the physical locations for the majority of framework markers are known for both RH-maps. This procedure allows for a more complete analysis of information from the public databases. Physical intervals and subloci of chromosomes in which INT sequences are located were determined. YAC-contigs and genomic contigs, to which mapped markers belong, were identified and the physical localisation on chromosomes was defined (Table 3, see Additional file 2). The physical locations of five INT markers mapped by us were in a good agreement with the locations of the corresponding contigs of human genomic sequences. The locations of these integration sites are also specific for each tumour and all of the markers are present as a single copy in one of the chromosomes including INT290 located on Xp11.21. The maps of chromosomes in which the sites of INT markers and genes located in the same regions of the chromosomes are indicated are presented on Fig. 1.
The analysed HPV integration sites were found to be located on different human chromosomes and no obvious degree of specificity of the integration sites is present. Nevertheless, it is interesting to note that one of INT sites (INT431) mapped in the region 13q14-q21 that is saturated with tumour suppressers genes such as CKAP2, LEU1 and CLL-4. Some other genes, that are located around other INT positions, may also belong to the group of tumour suppressers. The genes that participate in the processes of development and differentiation of tissue (NUMB, LTBP2, PGF andEDG2), genes encoding for signal proteins (similar to WASF2, ZNF 174), and proteins of cytoskeleton (PCH8, CTNNAL1) are also present in these loci (Fig. 1).
The analysis of the physical locations of integrated HPV 16 DNA expressed in squamous cell carcinomas of cervix using RH-technology and examination of the cellular sequences adjacent to the INTs allows us to confirm previous studies indicating that viral integration sites are randomly distributed in human genome [11, 12]. Integration of viral DNA occurs into different regions of chromosomes and does not seem to be site-specific. Interestingly, many important genes participating in processes of cellular growth and differentiation were found to be located around these sites of viral integration (Fig. 1). These included three markers (INT254, INT431 and INT505) that are present in the area of known fragile sites and mapped to 14q23 (INT254 – 14q23.2), 13q21.2 (INT431 – 13q21.23) and 10q23.3 or 10q24.2 (INT505 – 10q23.32).
Analysis of ESTs homologous to INT sequences allowed us to obtain additional information about the putative genes into which viral DNA is incorporated. As a rule these are sequences are similar to genes encoding for proteins that are important for cell division, differentiation and cell viability. For instance membrane protein myoferlin (gene MYOF – marker INT505) participates in the development and differentiation of muscle tissue [; OMIM: 604603]. Ribosomal protein S27 (marker INT467) contains a predicted zinc finger domain of the C4 type and can bind to DNA [; OMIM: 603702]. The protein responsible for Wiskott-Aldrich syndrome (mRNA similar to WASF2 gene – marker INT290) belongs to the family of GTP-ases that transduce signal to actin from cytoskeleton. It was shown for WASF2 gene, that its expression induces the abnormal accumulation of actin [; OMIM: 605875]. Interferon (marker INT466) belongs to the protein factors that are associated with cellular response to viral infections [OMIM: 107450]. Glutaminase (GLS – marker INT423) participate in the synthesis of glutaminate, which appear to be a neurotransmitter [[20, 21]; OMIM: 138280]. Nuclear gene RTN4IP1 (marker INT475) encoded reticulon 4 interacting protein 1, which function is unknown [LocusLink ID: 84816].
All these data are in a good correlation with other recently published studies [10, 11]. The conclusions of these additional studies, which used other techniques of analysis of integration sites, indicated that the sites of integration may be associated with fragile sites as well as a different spectrum of genes or EST sequences. It is also necessary to point out that we as well as Wentzenzen et al.  analysed only expressed sequences from integration sites in which cellular sequences localised downstream to viral sequences.
These data address the question about the possible role of HPV DNA integration in tumour development. Among cervical tumours around 50% contain viral DNA in episomal form [3, 22, 23]. This may indicate that persistence of viral transforming genes E6 and E7 products that inactivate products of tumour-suppresser genes p53 and Rb105 and some of cyclins [24–26] is important and that integration does not play a crucial role in cervical tumour progression.
In our study we also demonstrated that cDNAs, that are homologous to INT markers, have high levels of expression in cells. These data were also confirmed by our additional experiments not presented in this manuscript analysing amplified sequences transcribed from total RNA isolated from different normal and tumour cells of epithelial origin. This suggests that the regions with actively transcribed genes are in decondensed form, forming accessible regions for integration of foreign genetic material. It is possible that viral DNA integration into actively transcribed regions of cellular genome is a safety mechanism to secure of viral genetic information.
Twelve human genomic sequences adjacent to integrated HPV 16 DNA (HPV integration sites – INTs) expressed in squamous cell cervical carcinomas have been characterised. A BLASTN homology search was performed for the viral DNA integration sites and their surrounding sequences against the HGMT and EST databases. 11 INT markers were found to be homologous human genomic sequences and 9 of them had significant homologies to known genes or ESTs. Locations of the 6 INTs were determined on the basis of the known locations of the corresponding cDNAs. The RH-mapping technique has been used to physically localise of five HPV INTs: INT254, INT431 and INT407 (no homologies to genes or ESTs), INT290 (homologies to two genomic sequences localised on different chromosomes) and INT259 (short length of homologous sequences).
The physical locations of five INT markers mapped by us were in a good agreement with the locations of the corresponding contigs of human genomic sequences. The locations of these integration sites are also specific for each tumour and all of the markers are present as a single copy in one of the chromosomes including INT290 located on Xp11.21. All of the INTs (mapped by us and localised using the data of genomic positions of homologous cDNAs) are distributed in the regions with the high level of expression.
Integration of HPV genome occurs into the different human chromosomes but into the regions that contain highly transcribed genes important for cell viability. One possible interpretation of this phenomenon is that the regions with actively transcribed genes are in extended chromatin configuration, forming accessible regions for integration of foreign genetic material. It is also necessary to point out that we analysed only expressed sequences from integration sites in which cellular sequences localised downstream to viral sequences. We cannot exclude the possibility that after integration certain viral sequences become "silent" and this proposal has been confirmed by Kiselev et al  and Van Tine et al .
De Villiers E: Human pathogenic papillomavirus types: an upgrade. Curr Top Microbiol Immunol. 1994, 186: 1-12.
Klaes R, Woerner S, Ridder R, Wentzetzen N, Duerst M, Schneider A, Lotz B, Melscheimer P, von Knebel Doeberitz M: Detection of high-risk cervical intraepithelial neoplasia and cervical cancer by amplification of transcripts derived from integrated papillomavirus oncogenes. Cancer Research. 1999, 59: 6132-6136.
Koopman L, Szuhai K, van Eendenburg J, Bezrookove V, Kenter G, Schuuring E, Tanke H, Fleuren G: Recurent integration of human papillomavirus 16, 45 and 67 near translocation breakpoints in new cervical cancer cell lines. Cancer Research. 1999, 59: 5615-5624.
Sastre-Garau X, Schneider-Maunoury S, Couturier J, Orth G: Human papillomavirus type 16 DNA is integrated into chromosome region 12q14-q15 in a cell line derived from a vulvar intraepithelial neoplasia. Cancer Genet Cytogenet. 1990, 44: 243-251. 10.1016/0165-4608(90)90053-D.
Wilke C, Hall B, Hoge A, Paradee W, Smith D, Glover T: FRA3B extends over a broad region and contains a spontaneous HPV16 integration site: direct evidence for the coincidence of viral integration sites and fragile sites. Hum Mol Genet. 1996, 5: 187-195. 10.1093/hmg/5.2.187.
Cannizzaro L, Durst M, Mendez M, Hecht B, Hecht F: Regional chromosome localization of human papillomavirus integration sites near fragile sites, oncogenes, and cancer chromosome breakpoints. Cancer Genet Cytogenet. 1988, 33: 93-98. 10.1016/0165-4608(88)90054-4.
Thorland E, Myers S, Persing D, Sarkar G, McGovern RM, Gostout B, Smith DI: Human papillomavirus type 16 integrations in cervical tumors frequently occur in common fragile sites. Cancer Res. 2000, 60: 5916-5921.
Wentzensen N, Ridder R, Klaes R, Vinokurova S, Schaefer U, von Knebel Doeberitz M: Characterization of viral-cellular fusion transcripts in a large series of HPV16 and 18 positive anogenital lesions. Oncogene. 2002, 21: 419-426. 10.1038/sj.onc.1205104.
Udina IG, Baranova AV, Kompaniitsev AA, Sulimova GE: Evolutionarily-conserved gene CKAP2, located in region 13q14.3 of the human genome, is frequently rearranged in various tumors. Genetika. 2001, 37: 120-123.
Suetsugu S, Miki H, Takenawa T: Identification of two human WAVE/SCAR homologues as general actin regulatory molecules which associate with the Arp2/3 complex. Biochem Biophys Res Commun. 1999, 260: 296-302. 10.1006/bbrc.1999.0894.
Peitsaro P, Johansson B, Syrjanen S: Integrated human papillomavirus type 16 is frequently found in cervical cancer precursors as demonstrated by a novel quantitative real-time PCR technique. J Clin Microbiol. 2002, 40: 886-891. 10.1128/JCM.40.3.886-891.2002.
Watts K, Thompson C, Cossart Y, Rose B: Sequence variation and physical state of human papillomavirus type 16 cervical cancer isolates from Australia and New Caledonia. Int J Cancer. 2002, 97: 868-874. 10.1002/ijc.10103.
Klimov E. carried out the RH-mapping for INT markers, the bioinformatics part of studies and drafted the manuscript. Vinokourova S. carried out the APOT experiments and sequencing. Moisjak E. participated in the RH-mapping. Rakhmanaliev E. participated in the bioinformatics part of studies. Kobseva V. carried out the sequencing of the integration sites. Laimins L. participated in design of the study and writing of manuscript. Kisseljov F. and Sulimova G. conceived of the study, participated in design of the study and writing of manuscript.
All authors read and approved the final manuscript.
Eugene Klimov, Svetlana Vinokourova contributed equally to this work.
Klimov, E., Vinokourova, S., Moisjak, E. et al. Human papilloma viruses and cervical tumours: mapping of integration sites and analysis of adjacent cellular sequences.
BMC Cancer2, 24 (2002). https://doi.org/10.1186/1471-2407-2-24