Detection of DNA HPV 16 integration sites
APOT technique was used to isolate human genomic sequences adjacent to integrated HPV 16 DNA (HPV integration sites – INTs) expressed in squamous cell cervical carcinomas. This technique is based on a modification of RT-PCR methods using specific primers, one of which was localised in E7 region of viral genome and a second one which contains specific adapter sequences joined with oligo(dT) [2, 3]. As a result we generated fusion transcripts that encompassed parts of the human genomic DNA. Some of the transcripts may be spliced into non-coding sequences within the 3' untranslated part of the respective genomic mRNAs and others – into intronic regions close to the transcription initiation site of the respective gene as was recently shown by Wentzensen et al. [12]. We have not analysed in detail these types of fusion transcripts, but in the all cases our methods allowed discrimination between expressed episomal and integrative forms of HPV DNA.
Twelve INT sequences have been characterised from different individual squamous cell cervical carcinomas. The length of the cellular sequences containing polyA tails and fused to viral sequences varied from 140 to 450 nucleotides. The sequences were submitted in EMBL Database (see Methods section for AccN) and used for searching homologies to human genomic sequences, genes and ESTs. Corresponding primers were created to these cellular sequences at integration sites and were used for screening of RH-panel of somatic cell hybrids (Table 1).
Analysis of nucleotide sequences homologous to integration sites
Searches for human nucleotide sequences that were homologous to INT markers were carried out using BLASTN program and public databases as described in Materials and Methods. Homologies greater than 90% for sequences not less than 100 nucleotides in size were considered significant.
Homologies with cDNA clones were found for only 5 of the 12 analysed INT clones and this may in part be due to the relatively small size of the markers examined (150–350 base pairs). Homologous genomic sequences were found for 11 INT markers (Table 2, see Additional file 1). This allowed us to extend significantly the search for EST homologies to longer cellular sequences (up to 2.000 bp) that flanked integrated viral genome. For almost all markers (excluding INT431, INT254 and INT407) homologies with cDNAs that are highly similar to cDNAs for known genes or with non-identified cDNAs were found.
In the cases where homology with the same gene or EST was detected both with INT markers and with adjacent cellular sequences, one may conclude that (markers INT290, INT505, INT466 and INT423) HPV DNA had integrated into a human gene in this cervical carcinoma. The integration of viral DNA occurred into the terminal exon of GLS gene and provides a clear example of integration into an actively transcribed gene (marker INT423).
An integration site marked as INT466 is of special interest as it is highly likely that viral DNA is incorporated into the exon 5 of the interferon/beta receptor like gene (LOC152028). One part of the marker is homologous to the exon 5 and the other part is homologous to the exon 6 of this gene.
Marker INT290 was found to be homologous to two genes: to the WASF2 gene and the gene for the protein similar to WASF2 protein (LOC158537) located on the different chromosomes (1p36.11-34.3 and Xp11.3 accordingly).
The corresponding human genomic sequence was not found for one of the markers (INT467) although a high level of homology was found with a cDNA similar to mRNA of multicopy gene 40S ribosomal protein S27 (MPS1), which is localised on several human chromosomes (1, 2, 3, 4, 5, 6, 7, 11, 12, 15, 18, 19).
In four cases homologies were not found between INT markers (INT259, INT477, INT421 and INT475) and any genes or ESTs. But the sequences flanking these markers from 5'- or 3'-ends were homologous to ESTs (Table 2 see Additional file 1). For three INT markers (INT254, INT407 and INT431) and their adjacent cellular sequences homologies to genes or ESTs were not found. For this reason, the physical locations of these markers were determined using RH-mapping. All variants of cDNAs that are homologues to INT markers and flanking genomic sequences are present at high levels in expression databases (10–50 clones) indicating a high level of expression. For this study, clones were obtained from different human normal organs and tissues, as well as from different pathologies (including tumours).
RH-mapping of INT markers
The known locations of the cDNAs and human genomic sequences identified above allowed us to localise nine INT markers, which are highly homologous to these cDNAs (Table 2, see Additional file 1). Two INT markers were found to have multiple chromosomal localisation (INT467 and INT290). In seven cases the locations of the integration sites are specific for each tumour and the markers are present as a single copy in one of the chromosomes. RH-mapping has been used to localise INT254, INT431 and INT407, for which there were no homologies to genes or ESTs and INT290. In addition, the location of the INT259 was also determined using RH-mapping because the lengths of the homologous sequences were too short (104 bp and 59 bp).
Altogether five INT markers were localised on the radiation hybrid map of Whitehead Institute (WI-RH-Map) as a result of our screening (Table 3, see Additional file 2). To allow for a convenient comparison of our data with the public databases, we converted the INT marker positions in Whitehead Institute RH-map (WI-RH-Map) into the GeneMap99-GB4 (GM99-GB4) (Table 3, see Additional file 2). This recalculation should not generate any errors because the physical locations for the majority of framework markers are known for both RH-maps. This procedure allows for a more complete analysis of information from the public databases. Physical intervals and subloci of chromosomes in which INT sequences are located were determined. YAC-contigs and genomic contigs, to which mapped markers belong, were identified and the physical localisation on chromosomes was defined (Table 3, see Additional file 2). The physical locations of five INT markers mapped by us were in a good agreement with the locations of the corresponding contigs of human genomic sequences. The locations of these integration sites are also specific for each tumour and all of the markers are present as a single copy in one of the chromosomes including INT290 located on Xp11.21. The maps of chromosomes in which the sites of INT markers and genes located in the same regions of the chromosomes are indicated are presented on Fig. 1.
The analysed HPV integration sites were found to be located on different human chromosomes and no obvious degree of specificity of the integration sites is present. Nevertheless, it is interesting to note that one of INT sites (INT431) mapped in the region 13q14-q21 that is saturated with tumour suppressers genes such as CKAP2, LEU1 and CLL-4[16]. Some other genes, that are located around other INT positions, may also belong to the group of tumour suppressers. The genes that participate in the processes of development and differentiation of tissue (NUMB, LTBP2, PGF andEDG2), genes encoding for signal proteins (similar to WASF2, ZNF 174), and proteins of cytoskeleton (PCH8, CTNNAL1) are also present in these loci (Fig. 1).