Characterization of human mesothelin transcripts in ovarian and pancreatic cancer

Background Mesothelin is an attractive target for cancer immunotherapy due to its restricted expression in normal tissues and high level expression in several tumor types including ovarian and pancreatic adenocarcinomas. Three mesothelin transcript variants have been reported, but their relative expression in normal tissues and tumors has been poorly characterized. The goal of the present study was to clarify which mesothelin transcript variants are commonly expressed in human tumors. Methods Human genomic and EST nucleotide sequences in the public databases were used to evaluate sequences reported for the three mesothelin transcript variants in silico. Subsequently, RNA samples from normal ovary, ovarian and pancreatic carcinoma cell lines, and primary ovarian tumors were analyzed by reverse transcription-polymerase chain reaction (RT-PCR) and nucleotide sequencing to directly identify expressed transcripts. Results In silico comparisons of genomic DNA sequences with available EST sequences supported expression of mesothelin transcript variants 1 and 3, but there were no sequence matches for transcript variant 2. Newly-derived nucleotide sequences of RT-PCR products from tissues and cell lines corresponded to mesothelin transcript variant 1. Mesothelin transcript variant 2 was not detected. Transcript variant 3 was observed as a small percentage of total mesothelin amplification products from all studied cell lines and tissues. Fractionation of nuclear and cytoplasmic RNA indicated that variant 3 was present primarily in the nuclear fraction. Thus, mesothelin transcript variant 3 may represent incompletely processed hnRNA. Conclusion Mesothelin transcript variant 1 represents the predominant mature mRNA species expressed by both normal and tumor cells. This conclusion should be important for future development of cancer immunotherapies, diagnostic tests, and gene microarray studies targeting mesothelin.


Background
Mesothelin is a glycosylphosphatidylinositol (GPI)anchored cell-surface glycoprotein expressed at low levels by a restricted set of normal adult tissues but aberrantly expressed by ~70% of human ovarian epithelial tumors including up to 100% of serous papillary ovarian cancers, as well as by significant proportions of pancreatic adenocarcinomas, endometrioid uterine adenocarcinomas, mesotheliomas, and squamous cell carcinomas of the esophagus, lung and cervix [1][2][3][4][5][6][7]. Full length mesothelin (~69 kD) can be proteolytically cleaved to release a ~33 kD soluble protein corresponding to megakaryocyte potentiating factor (MPF) [8,9]. The biologic functions of mesothelin and MPF remain speculative. Mutant mice with targeted mesothelin gene inactivation are normal, exhibiting no apparent anatomic, hematologic or reproductive abnormalities [10]. Analysis of the mesothelin protein sequence yields no strong homologies to known protein functional domains, beyond a C-terminal GPIanchor motif. Mesothelin has been very recently reported to bind to CA125/MUC16 also commonly expressed on the surface of ovarian tumor cells [11], suggesting that mesothelin might play a role in heterotypic cell adhesion and metastatic spread of ovarian cancer, but data to support this idea are lacking for most human tumors.
Despite limited understanding of mesothelin's biological function, its restricted expression by normal tissues combined with frequent abundant expression by tumors has suggested several applications in clinical oncology. Circulating mesothelin/MPF may have diagnostic potential in mesothelin-positive malignancies [12], and tumor cellassociated mesothelin is being used as a target in ongoing clinical trials of passive immunotherapy using immunotoxins [13,14]. Mesothelin may also be a viable antigen for tumor vaccine therapies (unpublished data presented by E. Jaffee, 10th Annual SPORE Investigators Workshop, July 15, 2002). Further, mesothelin appears to be an important target in the evolving arena of cancer genetic profiling for several major tumor types [4,5,[15][16][17]. Rational design and development of these potential clinical applications would be facilitated by a clear understanding of mesothelin gene and protein expression by both normal cells and tumors.
To date, three variants of human mesothelin transcripts have been reported: variant 1 (GenBank accession NM_005823) encoding MPF [8,9]; variant 2 (NM_013404) encoding mesothelin [18]; and variant 3 (AF180951), a partial alternatively spliced cDNA with a disrupted GPI-anchor motif [19]. Of note, each transcript variant has regions of unique oligonucleotide sequences that could result in differential results in genetic-based expression studies and further encode proteins with unique peptides. It is of particular relevance for the present study that all three mesothelin transcript variants have been reported to be expressed by human cancer cells. Nevertheless, there is a remarkable lack of studies directly investigating relative expression levels of mesothelin transcript variants in normal and malignant tissues, even though multiple gene expression studies of tumor expression profiles conducted to date have included putative mesothelin-specific cDNA or oligonucleotide probes [4,5,[15][16][17]. The present study provides evidence for expression of human mesothelin transcript variants 1 and 3, but not variant 2, in a variety of human tissues and cell lines. Moreover, variant 1 appears to be the predominant mature mesothelin transcript in all studied specimens, including ovarian and pancreatic adenocarcinomas.

Tumor specimens
Ovarian surgical specimens were obtained after informed consent under protocols approved by the Institutional Review Board for Human Experimentation of the University of Alabama at Birmingham. Normal ovary was collected at the time of hysterectomy and bilateral salpingooophorectomy. All tissues were collected under sterile conditions and immediately frozen at -70°C. The 7 ovarian tumor samples used in the present study included 6 designated as endometroid/papillary serous and 1 as papillary serous.

RNA isolation
Total RNA was isolated from frozen tissues and cultured cell lines by RNA STAT-60 reagent (TEL-Test, Friendswood, TX) according to the manufacturer's protocol. Contaminating DNA was removed with RQ RNase-free DNase I (Promega, Madison, WI) and RNA concentration measured spectrophotometrically (GeneQuant II RNA/DNA Calculator, Pharmacia Biotech Ltd., Cambridge, United Kingdom). RNA integrity and quality was analyzed by electrophoresis on a 1.2% denaturing agarose gel. For some experiments, nuclear and cytoplasmic RNA were isolated separately. Briefly, cultured cells were suspended in TKM buffer (10 mM Tris-HCl pH 7.4, 1 mM KCl, 1 mM MgCl 2 ). After 5 min on ice, 10% Triton-X 100 was added and the suspension mixed gently. Nuclear and cytoplasmic fractions were separated by centrifugation. The nuclear pellet was lysed by suspension in RNA STAT-60 and processed as above. Cytoplasmic RNA was isolated from the supernatant fraction by phenol/chloroform extraction and ethanol precipitation.

Oligonucleotides and RT-PCR
RT-PCR was performed using the Gene Amp PCR kit (Roche Molecular Systems, Branchburg, NJ) as described in the manufacturer's protocol, using 1 µg of total RNA as template with random hexamers for RT priming. Negative controls without the addition of RT enzyme were performed for each RNA samples. For PCR, oligonucleotide primers were added at a concentration of 0.3 µM each and amplification carried out for 35 cycles with an annealing temperature of 65°C. Negative controls without template were performed for each PCR amplification. RT-PCR products were analyzed by electrophoresis on a 1.5% NuSieve GTG (Cambrex Bio Science Rockland, Rockland, ME) plus 1% agarose gel, and products visualized by ethidium bromide staining. PCR primers (Table 1) were synthesized by Invitrogen Life Technologies (Carlsbad, CA).

DNA sequencing
Amplified PCR products were either directly sequenced or first cloned into the pCR4-TOPO vector (Invitrogen) and subjected to automated dideoxy nucleotide sequencing in the UAB Center for AIDS Research Sequencing Core Facility, using fluoresence-based cycle sequencing and the ABI BigDye Terminator V 3.1 Cycle Sequencing kit with the ABI Prism 3100 Genetic Analyzer (Applied Biosystems, Foster City, California, USA). T3 and T7 primers were used for sequencing genes cloned into pCR4-TOPO: 5'-ATTAACCCTCACTAAAGGGA-3' and 5'-TAATACGACT-CACTATAGGG-3'. Sequence data were analyzed by BLAST nucleotide-nucleotide search (National Center Biotechnology Information, http://www.ncbi.nlm.nih.gov/) against non-redundant and expressed sequence tag (EST) databases.

In silico analysis of mesothelin transcripts
The NCBI database contains three reported transcripts for mesothelin/MPF: variant 1 (GenBank accession NM_005823) encoding MPF/mesothelin [8,9], variant 2 (NM_013404) encoding mesothelin [18], and variant 3 (AF180951) partial cDNA encoding mesothelin with an alternatively spliced C-terminus [19] (Figure 1). Transcript variant 1 was originally reported as MPF derived from a human pancreatic tumor cell line; the published sequence is 2085 bp with an opening reading frame of 1869 bp [8,9]. Transcript variant 2 was isolated from a HeLa cDNA library [18], and the reported sequence is 2107 bp long with an open reading frame of 1887 bp. The sequences of mesothelin variants 1 and 2 are 98% identical at the nucleotide level and 95% identical at the amino acid level. Differences include several amino acid variations near the N-terminus (amino acids 4-56) and a small insert in variant 2 compared to variant 1 after amino acid 410 of MPF (NP_005814). Full-length mesothelin/MPF protein is predicted to be encoded by 17 exons, occupying approximately 8 kb of human chromosome 16p13.3.
In silico comparison of the reported mesothelin transcripts with genomic sequences (contig clone NT_037887.3 and genomic clone AL031258) suggests that the basis for differences in the reported transcript 1 and 2 N-terminal region may have resulted from sequencing errors. Specifically, differences between amino acids 4-56 are most readily explained by incorrect sequences generated in GC rich areas (Figure 2A). The available genomic sequence matches variant 1, suggesting this variant is correct. Consistent with this, 25 of 25 EST sequences obtained by BLAST search matched variant 1 in this region, whereas no EST sequences were found to be identical to variant 2 in this region. Amino acid sequences of mesothelin transcript variants 1, 2 and 3 Figure 1 Amino acid sequences of mesothelin transcript variants 1, 2 and 3. Amino acids in single-letter code are displayed for maximal sequence alignment of translated mesothelin variant 1 (tr1, GenBank accession NM_005823), variant 2 (tr2, NM_013404) and variant 3 (tr3, AF180951). (.) indicates amino acid identity, and (-) represents gaps for optimal sequence alignment. Note that tr3 is a partial cDNA lacking N-terminal sequence. Human genomic DNA sequence alignment with mesothelin transcript variants Figure 2 Human genomic DNA sequence alignment with mesothelin transcript variants. Selected portions of human genome sequences corresponding to the mesothelin gene (AL031258) were aligned with corresponding sequences of reported mesothelin transcript variants. (-) represents sequence gaps for optimal alignment. A. The 5'-ends of mesothelin transcript variants 1 and 2 (tr1, NM_005823; tr2, NM_013404) aligned with the genomic DNA sequence. Lower case letters represent presumed untranslated nucleotide sequences. B. Nucleotide sequence at the exon 13 5'-boundary using the same reference database sequences as in (A). Lower case letters in the genomic sequence indicate presumed introns for transcript variant 1. Splice acceptor sites potentially utilized by transcript variants 1 and 2 are underlined, and the arrowhead indicates a third splice acceptor detected by sequence analysis in this study. C. Genomic sequence for the intron 16 region are aligned with sequences for transcript variants 1 (tr1, NM_005823) and 3 (tr3, AF180951). Lower case letters represent nucleotides presumed to be untranslated for tr1. Expected amino acid sequences are shown above each transcript, highlighting the intron 16 insert and reading frame shift predicted for tr3.  The other difference between variants 1 and 2 is an eight amino acid insertion following amino acid 410 (NP_005814). Analysis of the genomic sequence is consistent with the use of an alternative splice acceptor site in exon 13 ( Figure 2B), resulting in an additional 24 bp insert in variant 2. Based on BLAST searches of expressed sequences, the 24 bp insert does not appear to be highly expressed in cancer cells, as only one perfect match was found in the dbEST database (CB266931); this contrasts with the adjacent sequence in exon 13 which was identical to more than 50 ESTs. Thus, analysis of existing sequence data is consistent with the existence of variant 1, and with the very rare use of an alternative splice acceptor site in exon 13 that corresponds to variant 2.
The third reported variant of mesothelin is a partial cDNA that includes an 82 bp insert near the C-terminus (AF180951). This sequence, which has not yet been reported as a full length transcript, is predicted to encode an alternative C-terminus with a hydrophilic tail that may produce a secreted form of mesothelin [19]. Genomic sequence data suggest that this 82 bp insert represents an unspliced intron 16 ( Figure 2C). BLAST search of the dbEST database with the 82 bp insert sequence indicates that it is rarely expressed, because no ESTs were found that were identical to the entire 82 bp sequence.

RT-PCR analysis of mesothelin transcripts
Mesothelin/MPF transcripts expressed in human tissues and cell lines were analyzed by RT-PCR. Initial studies used PCR primers corresponding to the 5'-untranslated region and the 3' stop codon region of both transcript variants 1 and 2 (Table 1, full length), and a single PCR amplification product was detected in all studied cell lines and tissue samples (Figure 3). Complete nucleotide sequences were obtained for amplification products from the OVCAR-3 cell line, and partial nucleotide sequences obtained for the 5' regions (≥ 650 nucleotides) of products from HeLa, SKOV-3 and AsPC-1 cells, one normal ovary and two ovarian tumor samples. These demonstrated sequence identity to transcript variant 1 (NM_005823), with the exception of a single nucleotide change (A to G at nucleotide 222, conservative for amino acid) in HeLa, SKOV-3 and one ovarian tumor sample. In addition, for HeLa cells and one of the ovarian tumor samples, nucleotide sequences were also obtained on amplification products generated using a different set of mesothelin-specific primers, with identical results. None of the amplification products corresponded to transcript 2, consistent with the in silico analysis.
To further investigate possible use of the alternative splice acceptor site in exon 13 of mesothelin, as reported in variant 2, we designed primers in exons 12 and 13 (Table 1, 24 bp insert) and amplified RNA from various cell lines and tissues. Amplification of a transcript that utilizes the alternative splice acceptor site (variant 2) should result in a PCR product of 201 bp. However, only a 177 bp amplification product consistent with the size expected from variant 1 was detected ( Figure 4B). Nucleotide sequence analysis of amplification products confirmed use of the exon 13 splice acceptor site corresponding to transcript variant 1. We conclude that utilization of the alternative splice donor site, as reported for variant 2, is rare in both normal ovary and in pancreatic and ovarian cancer. Interestingly, we also detected a minor amplification product that was smaller than expected (147 bp, Figure 4B). Cloning and sequence analysis demonstrated that this transcript, estimated to comprise less than 5% of the total mesothelin message, utilized an internal splice acceptor site in exon 13 (indicated by the arrowhead in Figure 2B), predicted to encode a protein with an in-frame deletion of 10 amino acids (VATLIDRFVK).
We next investigated expression of mesothelin transcript variant 3 (AF180951), predicted to encode a form of mesothelin with a disrupted GPI-anchor region (exons 16 and 17). PCR primers corresponding to sequences in exons 15 and 17 (Table 1,  Because only a partial cDNA sequence for mesothelin variant transcript 3 has been submitted to the public database, RT-PCR analysis was performed using a 3' primer specific for the 82 bp insert of transcript 3 (intron 16) and a 5' primer located immediately before the start codon of transcript 1 (Table 1,

B
To test this possibility, cultured cell lines were fractionated to obtain cytoplasmic RNA (representing fully processed mature transcripts) and nuclear RNA (which should include both mature and incompletely processed transcripts). RT-PCR amplification of using the above primers for mesothelin transcripts containing intron 16 (variant 3) with three cell lines indicated that the intron 16 containing transcripts were primarily amplified from the nuclear pool of RNA, and only small amounts of product were recovered from the cytoplasmic RNA ( Figure 5C). Thus, although transcript variant 3 appears to be present in low levels in both normal and tumor cells, it seems to be very rare among the cytoplamic fraction of mature, completely processed RNA.

Discussion
Mesothelin is a promising target for cancer diagnostics and therapy [12][13][14]19]. In normal cells, expression is primarily restricted to mesothelial cells in the pleural, pericardial, and peritoneal membranes. Limited mesothelin immunoreactivity has also been reported in trachea, tonsil, fallopian tube and kidney [1]. In contrast, mesothelin is abundantly expressed in certain tumors, including up to 100% of serous papillary ovarian tumors [5] and up to 100% of pancreatic tumors [4]. Frequent and high level expression by tumors suggests that mesothelin might play a role in tumorigenesis. Immunohistochemical studies demonstrate that mesothelin expression is directly correlated with progressive development of metastatic pancreatic cancer [20]. Recent evidence that mesothelin may promote cell-cell adhesion in ovarian cancer through heterotypic binding to CA125/MUC16 further supports a role in tumor progression [11]. Circulating mesothelin may serve as a serum marker for early detection and monitoring of mesothelin-positive tumors [12]. However, in order to fully exploit mesothelin as a target for clinical applications, it is important to understand mesothelin expression with regard to the transcript variants that have been reported to date.
Various investigators have referred to this gene and its products as mesothelin [18], MPF [8,9], soluble mesothelin [19], mesothelin family members or soluble mesothelin related proteins (SMR) [12]. The in silico and in vitro studies reported here support mesothelin variant 1 as the major transcript expressed by both normal and malignant cells. Mesothelin variant 2 (NM_013404) was not detected. While it remains possible that nucleotide differences in variant 2 are the result of allelic variation, alignment of the transcript variant 2 sequence with available genomic sequences suggests that the differences in amino acids 4-56 between variants 1 and 2 may be due to sequencing errors. Another feature of variant 2, a 24 bp insertion after base pair 1313 (transcript 2 NM_013404), appears to arise from use of an alternative splice acceptor site upstream of the usual splice acceptor site, and was very rare in our studies. The third reported mesothelin transcript variant, encoding an alternative C-terminus, appears to be largely restricted to the nuclear fraction of RNA and to be infrequent in the cytoplasmic pool of mature mRNA, suggesting that this variant may represent incompletely processed hnRNA. Although definitive studies to detect the soluble isoform of mesothelin protein predicted to be encoded by variant 3 will be needed to rule out the possibility of low level expression, data presented here suggests that variant 1 represents the predominant form of mesothelin expressed by both normal and tumor cells.
Despite the presence of a single major transcript for mesothelin in the samples evaluated here, immunohistochemical studies of mesothelin protein have reported distinct expression patterns in tumor samples. In human pancreatic tumors, some samples exhibited focal staining, and mesothelin was commonly detected in single tumor cells surrounded on all sides by stroma; malignant glands were generally less well stained, but expression was accentuated at luminal borders and luminal contents were frequently positive, suggesting secretion or shedding of mesothelin from tumor cells [4]. Other analyses of a variety of tumors, including ovarian cancer, report mesothelin staining at the cell membrane, in the cytoplasm, or both, depending on the sample; staining was sometimes accentuated at apical cell surfaces and in intracytoplasmic lumens, with extracellular luminal contents again noted to be frequently labeled [5,6,19]. Our own immunohistochemical studies of ovarian tumors (unpublished data) are consistent with the available literature and are notable in that the pattern of localization of mesothelin is consistent within a tumor specimen, with some showing apical localization and apparent secretion of mesothelin, while cytoplasmic staining is prominent in others.
Given that mesothelin transcript variant 1 is widely and highly expressed, the basis for differential localization of mesothelin protein in different tumor samples is most likely a consequence of post-translational processing. Mesothelin variant 1 encodes a GPI anchor motif, and proteins attached to cell plasma membranes by GPI anchors can be readily released in vivo. A relevant example is carcinoembryonic antigen (CEA), a tumor-associated GPI-anchored glycoprotein that is commonly shed by CEA-positive malignant cells [21]. A furin-like protease cleavage site within mesothelin that is probably responsible for the release of the N-terminal MPF portion [9] could conceivably contribute to mesothelin-immunoreactive extracellular contents observed in tumor tissue samples and in serum from tumor patients, but evidence to date suggests that mesothelin-specific antibodies used for both types of studies react with epitopes in the mem-brane-proximal C-terminal portion of mesothelin. Additional studies delineating the post-translational processing and secretion of mesothelin are warranted.
The basis of the reported cross-reactivity of mesothelin antibodies with other proteins in serum or ascites [19] is not clear, but is not likely to be due to cross-reactivity with endogenous homologous proteins. Current databases contain only one vertebrate gene with significant homology to mesothelin, otoancorin (NM_170664), which has been implicated in autosomal recessive deafness and has 21-22% amino acid identity to mesothelin in the C-terminal region [22]. Interestingly, otoancorin is also located on human chromosome 16p, but its specific function is unknown and its expression is apparently limited to the inner ear.

Conclusions
Mesothelin appears to be predominantly encoded by a single transcript corresponding the previously reported mesothelin/MPF variant 1 [8,9]. Although alternative splicing of the transcript may occur at low levels, the vast majority of message in primary ovarian tumors, ovarian and pancreatic tumor cell lines, and normal ovary appears to be represented by variant 1. Targeting mesothelin transcript variant 1 products for future clinical applications is predicted to be the most valid approach for developing novel therapies and diagnostic assays for mesothelinexpressing malignancies.