Epidermal growth factor receptor gene polymorphisms are associated with prognostic features of breast cancer

Background The epidermal growth factor receptor (EGFR) is differently expressed in breast cancer, and its presence may favor cancer progression. We hypothesized that two EGFR functional polymorphisms, a (CA)n repeat in intron 1, and a single nucleotide polymorphism, R497K, may affect EGFR expression and breast cancer clinical profile. Methods The study population consisted of 508 Brazilian women with unilateral breast cancer, and no distant metastases. Patients were genotyped for the (CA)n and R497K polymorphisms, and the associations between (CA)n polymorphism and EGFR transcript levels (n = 129), or between either polymorphism and histopathological features (n = 505) were evaluated. The REMARK criteria of tumor marker evaluation were followed. Results (CA)n lengths ranged from 14 to 24 repeats, comprehending 11 alleles and 37 genotypes. The most frequent allele was (CA)16 (0.43; 95% CI = 0.40–0.46), which was set as the cut-off length to define the Short allele. Variant (CA)n genotypes had no significant effect in tumoral EGFR mRNA levels, but patients with two (CA)n Long alleles showed lower chances of being negative for progesterone receptor (ORadjusted = 0.42; 95% CI = 0.19–0.91). The evaluation of R497K polymorphism indicated a frequency of 0.21 (95% CI = 0.19 – 0.24) for the variant (Lys) allele. Patients with variant R497K genotypes presented lower proportion of worse lymph node status (pN2 or pN3) when compared to the reference genotype Arg/Arg (ORadjusted = 0.32; 95% CI = 0.17–0.59), which resulted in lower tumor staging (ORadjusted = 0.34; 95% CI = 0.19-0.63), and lower estimated recurrence risk (OR = 0.50; 95% CI = 0.30-0.81). The combined presence of both EGFR polymorphisms (Lys allele of R497K and Long/Long (CA)n) resulted in lower TNM status (ORadjusted = 0.22; 95% CI = 0.07-0.75) and lower ERR (OR = 0.25; 95% CI = 0.09-0.71). When tumors were stratified according to biological classification, the favorable effects of variant EGFR polymorphisms were preserved for luminal A tumors, but not for other subtypes. Conclusions The data suggest that the presence of the variant forms of EGFR polymorphisms may lead to better prognosis in breast cancer, especially in patients with luminal A tumors.


Background
Breast cancer is the most frequent type of cancer in women both in the developed and the developing world [1]. It is a very heterogeneous disease with regards to its molecular profile [2], and clinical course, which presents great interpatient variability. Although conventional histopathological characteristics remain the most important prognostic determinants of survival [3], there is a continuous search for new biomarkers or stage models that could help predicting clinical evolution [4], or improving therapy selection. In this regard, genetic variations in carcinogenesis-related processes are natural candidates for exploring new prognostic factors or potential targets for specific therapies [5,6].
The epidermal growth factor receptor (EGFR) is a transmembrane tyrosine kinase (TK) receptor of the ErbB family, whose activation leads to mitogenic signaling [7]. EGFR is frequently overexpressed in many tumors, including breast cancer, and its activation contributes to unrestricted proliferation, advanced stages of disease, resistance to conventional treatments, and poor prognosis [8]. Despite the recognition that EGFR overexpression in breast tumors may affect disease progression [8], the responses of anti-EGFR therapies in breast cancer are not fully satisfactory [9], and the reasons for this clinical variability are not fully understood.
The EGFR gene, located at 7p12.3-p.1, contains multiple polymorphisms [10], two of which are recognized for their functional effects: a dinucleotide (CA)n repeat sequence polymorphism in intron 1 (rs72554020) affects gene transcription [11], and appears to modulate EGFR expression in breast tumors [12], and a single nucleotide change (G → A) in exon 13 leads to an Arginine (Arg) → Lysine (Lys) substitution in codon 497 (rs11543848), resulting in attenuated TK activity, with consequent reductions in ligand binding, growth stimulation, and induction of proto-oncogenes myc, fos, and jun [13].
In the present work, we aimed to describe the frequency of these two EGFR polymorphisms among Brazilian breast cancer patients, and to evaluate their impact on breast cancer prognosis, exploring the effects of (CA)n polymorphism on EGFR transcript levels, and the associations of both polymorphisms with histopathological features and prognostic estimates.

Subjects and study design
The study population consisted of a prospective cohort of Brazilian women with first diagnosis of unilateral breast cancer and no distant metastases, admitted at the Brazilian National Cancer Institute (INCA) during the period from February 2009 to April 2011, and who were assigned for tumor resection as their first therapeutic approach. The recruitment occurred before surgery, but the inclusion was only completed after diagnosis confirmation by histopathological evaluation of the resected tumor. The study protocol was approved by the Ethics Committee of the Brazilian National Cancer Institute (INCA #129/08), and all patients gave written consent to participate. The REMARK guidelines (REporting recommendations for tumor MARKer prognostic studies) were followed [14].

Histopathological characterization
The histopathological evaluation of resected tumors was performed following institutional routine procedures, and all individual data were obtained from electronic medical records. The histopathological characterization was based on the TNM classification by the American Joint Committee on Cancer [15] and on the Elston Ellis histological grading system [16].
The data on hormone receptors, i.e. Estrogen Receptor (ER), and Progesterone Receptor (PR), and on the Human Epidermal growth factor Receptor 2 (HER2) status were used for biological classification of the tumors, as proposed by Huober et al. [17]. The Estimated Recurrence Risk (ERR) was inferred by a combination of all histopathological features, as proposed by the Early Breast Cancer Trialists' Collaborative Group [18], with the following categories: "Low Risk", characterized by the presence of [age ≥ 35 years, N0 (absence of tumor cells in lymph nodes), G1 (histological grade 1), T1 (tumor size lower than 2 cm), (ER+ or PR+), HER2-], and absence of peritumoral vascular invasion; "Intermediate Risk", characterized by N0 in the presence of [age < 35 years, or T ≥ 2, or G ≥ 2, or (ER-and PR-), or HER2+], or by N1 (presence of tumor cells in 1 to 3 lymph nodes) in the presence of [HER2-, and (ER + or PR+)]; and "High Risk", characterized by N1 in the presence of [HER2+, or (ER-and PR-)], or by N ≥ 2 (presence of tumor cells in more than 3 lymph nodes).

Genotyping analyses
Peripheral blood samples (3 mL) were collected from the subjects, and DNA was extracted using the Blood Genomic Prep Mini Spin Kit (GE Heathcare, Buckinghamshire, UK), following the procedures recommended by the manufacturer. The genotyping analyses were performed using PCR-RFLP for the SNP R497K (rs11543848) or by capillary electrophoresis for the (CA)n repeat polymorphism in intron 1 (rs72554020). The PCR amplifications were performed with the following primers (Life Thechnology, Carlsbad, CA, USA): 5′-AGGTCTGCCATGCCTTGT-3′ (sense) and 5′-CAACGCAAGGGGATTAAAGA-3′ (antisense) for R497K; or 5′-TTCTCCTCAAAACCCGGAGAC-3′ labeled with 6-FAM™ (sense) and 5′-GTCACGAAGCCAGACT CGCT-3′ for (CA)n repeat (antisense). The R497K PCR products (5 μL) were digested with 5U of BstN1 restriction enzyme (New England BioLabs, Northbrook, IL, USA) at 60°C for 3 hours, and the digestion products were resolved on 2% agarose gel and stained with ethidium bromide for visualization under UV light. The digestion of the homozygous G alleles (Arginine) produced two fragments (100 bp and 56 bp), whereas the homozygous A alleles (Lysine) remained intact (156 bp). The method was validated by direct sequencing of four samples of each genotype.
The (CA)n repeat PCR products (0.5 μl) were denatured at 95°C for 3 min in the presence of 0.5 μl of the GeneS-can™ 400HD ROX molecular weight standard (Applied Biosystems, Foster City, CA, USA) and 9.0 μl of Hi-Di™ Formamide (Life Thechnology, Carlsbad, CA, USA), refrigerated to 4°C for 2 min, and then submitted to separation by capillary electrophoresis in ABI Prism® 3130 Genetic Analyzer, using POP7™ polymer (Applied Biosystems, Foster City, CA, USA). The analyses were performed using the GeneMapper® Software v.3.7 (Applied Biosystems, Foster City, CA, USA). The PCR products identified as homozygous, i.e. those presenting a single retention time at the capillary electrophoresis, were submitted to direct sequencing, using the BigDye® Terminator Kit (Applied Biosystems, Foster City, CA, USA), in order to establish a correspondence between each retention time and the respective number of CA repeats (or allele length).

Quantification of EGFR mRNA
Fresh specimens of breast tumors were dissected by clinical pathologists after tumor resection, frozen in liquid N 2 , and stored at the Brazilian National Bank of Tumors (BNT-INCA). Frozen sections of breast specimens (with approximately 2 mm) were used for RNA isolation, which was performed using the RNeasy Mini Kit (Qiagen, Valencia, CA, USA), following the manufacturer's instructions. The RNA samples were stored in RNAse-free distilled water at -80°C, and the corresponding cDNA was synthesized using 2 μg of RNA, with High Capacity cDNA Reverse Transcription Kit (Applied Biosystems, Foster City, CA, USA), according to the manufacturer's instructions.
The relative quantification of EGFR transcripts was performed using quantitative real-time RT-PCR (TaqMan) assays, in an ABI PRISM 7500 Sequence Detector System (Applied Biosystems, Foster City, CA, USA). Each reaction contained: cDNA templates (approximately 40 ng), 10 μl of reaction mix containing 5 μl Taqman® Gene Expression Master Mix, and Taqman® probes, which were as follows: EGFR Hs01076078_m1 (with FAM), PPIA 4326316E (with VIC) (Applied Biosystems, Foster City, CA, USA). The thermal cycling conditions comprised an initial denaturation step at 95°C for 10 min, followed by 40 cycles of 95°C denaturation for 15 sec, and annealing at 60°C for 1 min. The experiments were carried out in 96-well plates, including a nontemplate control, and a reference control, consisting of cDNA obtained from a commercial Human Mammary Gland (HMG) total RNA (Clontech Laboratories, Mountain View, CA, USA). The relative quantification of EGFR mRNA was calculated as the average 2 -ΔΔCt , where ΔΔCT = ΔCT EGFR -ΔCT HMG, and ΔCT EGFR = Ct EGFR -Ct PPIA , and ΔCT HMG = Ct HMG -Ct PPIA . All data were generated in triplicates and expressed as median +/− SD with the 25-75 percentiles.

Statistical analyses
A descriptive study of the cohort was conducted, presenting measures of central tendency and dispersion for continuous variables, or relative frequencies for each categorical variable. Allelic and genotypic frequencies were derived by gene counting. The histopathological features were dichotomized for better and worse prognostic values, and their associations with EGFR genotypes were evaluated by the Chi-square or Fisher's exact tests. In the cases of significant associations between EGFR genotypes and independent histopathological variables, the odds ratios (OR) and their respective 95% confidence intervals (95% CI) were tested for linear-by-linear associations, with calculation of trend significances (P trend ), and definition of phenotypic inheritance models. The odds ratios between EGFR phenotypic groups and histopathological categorical features were adjusted for all other independent clinical variables (OR adjusted ) using multiple regression analyses. The comparison of the relative quantities of EGFR mRNA as a function of histopathological features or EGFR genotypes was performed with the GraphPad Prism 5.0 software (GraphPad Software, La Jolla, CA, USA), using the non-parametric Mann-Whitney U-test for comparison of two groups, or the Kruskal-Wallis test for comparison of multiple groups. All other statistical analyses were conducted using SPSS 13.0 for Windows (SPSS Inc., Chicago, Illinois). The threshold for significance was set at P < 0.05.

Characterization of the cohort
A total of 576 patients were recruited when admitted for surgery, and 528 had the diagnosis and inclusion criteria confirmed after pathological evaluation of their resected tumor. Blood samples were available for 511 cases, and 508 of them had good DNA quality for genotyping assays. Table 1 presents the main clinical and histopathological characteristics, as well as the genotypic distribution of EGFR polymorphisms for the 508 patients evaluated. The median age was 59 years old, ranging from 27 to 92.

Characterization of EGFR polymorphisms
The genotyping of the R497K polymorphism was obtained for 505 patients, whereas the characterization of the number of (CA)n repeats by electrophoresis was conclusive in 477 cases ( Table 1). The frequency of the variant R497K allele (Lys) was 0.21 (95% CI = 0.19-0.24).
The evaluation of the (CA)n lengths indicated a range of 14 to 24 repeats (Figure 1), comprehending 11 alleles and 37 genotypes. The most frequent allele was (CA) 16 (0.43; 95% CI = 0.40-0.46), which was taken as the cut-off length to group Short alleles. All the other variant alleles, with more than 16 (CA) repeats were considered as Long alleles. Thus, the genotypic distribution used for further analyses was: Short/Short (reference homozygous genotype), Short/Long (heterozygous) and Long/Long (variant homozygous genotype).
Characterization of EGFR mRNA expression in breast tumors, and evaluation of the influence of (CA)n genotypes and of histopathological characteristics The EGFR mRNA expression levels were evaluated in fresh-frozen tumor samples from 129 patients. Table 2 shows the main clinical and histopathological characteristics, as well as the genotypic distribution of EGFR polymorphisms in this subcohort. The data are presented in comparison with those described for the general population ( Table 1). The results indicate that the subcohort whose tumors were used for expression analyses is similar to the general population, except for tumor size, and nodal status, which tend to be higher in the former. This difference is caused by the institutional biobank policy, which restricts collection of tumors with less than 1 cm for non-diagnostic purposes.
The relationship between EGFR mRNA expression levels and (CA)n genotypes or prognostic categories of breast tumors were explored ( Figure 2). The results indicate no differences related to (CA)n genotypes (Figure 2A), whereas the lymph node status ( Figure 2B) and the biological subclassification ( Figure 2C) showed significant influences. The EGFR mRNA expression levels were significantly higher for patients with worse lymph node status, as well as for triple-negative tumors when compared to all other subgroup classifications (p = 0.003). As a consequence of these two associations, patients with higher ERR presented higher EGFR mRNA expression levels ( Figure 2D). Association between EGFR genotypes and prognostic variables Table 3 presents the distribution of R497K and (CA)n genotypes according to prognostic categories. The distribution of R497K genotypes was statistically different as a function of the lymph node status, whereas the distribution of (CA)n genotypes was statistically different as a function of the PR status. The association between R497K genotypes and lymph node status, or between (CA)n genotypes and the PR status is further explored in Figure 3. Figure 3A shows that patients with the heterozygous genotype Arg/Lys presented lower proportion of the worse lymph node status (pN2 or pN3), when compared to the reference homozygous genotype Arg/Arg (OR = 0.42; 95% CI = 0.23-0.76), whereas among patients with the homozygous variant genotype Lys/Lys (n = 21), there was only 1 case of pN2 or pN3 (OR = 0.21; 95% CI = 0.028-1.60). These results indicate that the magnitude of the association between R497K polymorphism and lymph node status depends on the number of variant Lys alleles (P trend = 0.001). Similarly, Figure 3B shows the impact of the number of variant (CA) n alleles on the proportion of negative PR status. The results indicate an apparently progressive effect of the number of long (CA)n alleles (P trend = 0.008). Thus, patients with the Short/Long genotype showed a slightly lower proportion of negative PR status when compared to the reference Short/Short genotype (OR = 0.72; 95% CI = 0.44-1.19), and a significant protective effect was observed for the variant Long/Long genotype (OR = 0.46; 95% CI = 0.26-0.83).

Interaction between EGFR polymorphisms
The above trend analyses suggested an inheritance model of codominance for the association between R497K polymorphism and lymph node status and of recessiveness for (CA)n Long allele and PR status. Thus, the genotypes Arg/ Lys and Lys/Lys were grouped for evaluation of their impact on lymph node status, whereas the (CA)n Long/ Long genotype was evaluated in comparison with the combined Short/Short and Short/Long genotypes for its effect on the PR status. The two EGFR polymorphisms were also evaluated in a combined analysis in order to investigate a possible interaction between them on the distribution of breast cancer prognostic features ( Table 4).
The results indicate a significantly protective effect of the Lys allele on the proportion of the worse lymph node status after adjustment for other independent individual prognostic variables. As a consequence, patients carrying the Lys allele showed lower TNM status and lower ERR. With regards to the (CA)n polymorphism, the association between (CA)n Long/Long genotype and PR negative status also remained significant after adjustment for other independent individual prognostic variables (OR adjusted = 0.42; 95% CI = 0.19-0.91), but did not affect TNM status or the ERR. When the two EGFR polymorphisms are present, there is lower TNM status (OR adjusted = 0.22; 95% CI = 0.07-0.75) and lower ERR (OR = 0.25; 95% CI = 0.09-0.71).
The stratification of breast tumors according to their biological classification indicates that the association between combined variant EGFR polymorphisms and better lymph node status occurs for tumors classified as luminal A, but not for the other biological subtypes.

Discussion
The distribution of the two EGFR functional polymorphisms in the Brazilian population was not known before the current study. Our data indicate a frequency of 0.21 (95% CI = 0.19 -0.24) for the 497 K (Lys allele), and of 0.43 (95% CI = 0.40 -0.46) for the (CA) 16 . These results are similar to the frequencies reported for Europeans and North-Americans (including African-Americans), either for R497K polymorphisms [19,20] or (CA)n [12,21]. Asian populations, however, appear to have higher frequencies of the Lys allele [22,23], and different patterns of (CA)n alleles [12,21,24,25]. One difficulty of evaluating the effects of (CA)n polymorphism in gene transcriptional activity in vivo is the vast distribution of the number of (CA) repeats, with various possible heterozygous genotypes, and no clear model on how the two alleles interact for the final cell phenotype. Amador et al. [26] considered the sum of CA repeats of both alleles and showed an inverse correlation between this combined length and the levels of EGFR mRNA in head and neck cancer cell lines. Buerger et al. [12], studying breast tumors, considered the length of the smaller allele, and showed a non-significant tendency for lower EGFR protein expression with increasing allele length. Accordingly, Buerger et al. [27] showed that breast tumors from Japanese patients, who present high frequencies of (CA) 20 and other long alleles, had lower amounts of EGFR protein than tumors from German patients, who have a predominance of (CA) 16 and other short alleles. Other authors, however, found no correlation between the length of the (CA)n region and the relative quantification of EGFR mRNA [28] or EGFR protein expression [29].
Our data confirm the great dispersion of (CA) lengths and indicate great variability on the expression of EGFR mRNA, with no apparent inverse correlation between the number of (CA) repeats, considering either the smaller allele or the combined length within each genotype (data not shown). In order to investigate a possible effect of somatic mutations on the tumoral (CA)n genotype, we evaluated a set of 40 tumor samples. The number of CA repeats was preserved in relation to genomic DNA in all cases (data not shown). Although we did not extend such analyses to all patients, it appears that mutational events, such as loss of heterozigosity, are not affecting the EGFR locus of breast tumors. Nevertheless, an accurate characterization of the impact of EGFR polymorphisms on the gene transcriptional activity in vivo would ideally include quantification of gene amplification in the tumors [27]. In addition, there are two other EGFR polymorphisms (−216G/T or rs712829 and -191C/A or rs712830), located in the promoter region, which might have functional impact on EGFR transcriptional activity [30]. Finally, epigenetic variations may also interfere with EGFR expression [31].
The evaluation of the impact of EGFR polymorphisms on histopathological and molecular characteristics of breast cancer indicated significant association between R497K variant genotypes and better lymph node status, corroborating the findings of Kallel et al. [32], and between Long/ Long (CA)n genotypes and positive PR status. These two associations seem protective in relation to breast cancer evolution, since a greater number of affected lymph nodes increases the risk of systemic metastasis [33], and the lack of PR expression increases the risk of disease progression, especially in post-menopausal women [34].
With regards to the molecular mechanisms underlying lymph node metastases, EGFR appears to activate integrins [35] and metaloproteinases [36], favoring cell differentiation towards an invasive phenotype. The association between the variant allele (Lys) and better lymph node status appear to corroborate the notion of reduced signaling with the variant EGFR isoform [13], leading to lower invasiveness, which reinforce the role of EGFR in breast cancer pathogenesis. The interaction between the EGFR activity and the PR status might occur via a cross-talk mechanism between steroid and growth factor receptors [37], resulting in activation of the PIK3-Akt-mTOR pathway, which appears to negatively modulate the transcriptional activity of the PR [38]. This negative modulation of ER-mediated functions in breast cancer via EGFR signaling may underlie the mechanism of resistance to hormone therapy observed in tumors with high EGFR expression [39]. Taken together, the association between EGFR polymorphisms and lymph node metastases and negative PR status appear to corroborate the role of EGFR in breast cancer pathogenesis.
The combined presence of Long/Long (CA)n genotypes and Lys R497K alleles appears to favor better prognostic estimates in breast cancer. Other studies involving different types of cancer also point to an interaction between the two EGFR polymorphisms, with a combined protective effect in relation to disease progression. Zhang et al. [40], evaluating pelvic recurrence in patients with rectal cancer treated with chemoradiation, showed that the highest risk for local recurrence was seen in patients with the reference genotypes, i.e., both 497 Arg alleles and <20 CA repeats. Bandrés et al. [41], studying head and neck cancer, showed that patients with at least one 497 Arg allele and both (CA)n repeats ≤ 16 presented higher risk of death. Press et al. [42], studying metastatic colon cancer, found that men with the Arg/ Arg genotype and two short alleles (< 20 CA repeats)  had shorter overall survival than men with the Lys/Lys or Arg/Lys variant genotypes and any long allele (≥ 20 CA repeats). The stratification of breast tumors according to their biological subtypes suggests that the apparently protective effects of EGFR polymorphisms are characteristic of luminal A tumors. This apparently selective effect of EGFR polymorphisms might be due to the lower genomic instability of luminal A tumors in relation to other subtypes, which present more aggressive phenotypes due to superposed molecular alterations [43]. Nevertheless, the small number of non-luminal A tumors limits the statistical power of the analyses, and the confidence of this assumption. In addition, the apparently favorable associations of   EGFR polymorphisms with prognostic features at diagnosis cannot be considered as actually predictive of disease progression or therapy response,

Conclusions
In conclusion, the current results indicate a potential benefit of EGFR polymorphisms as independent prognostic factors, especially in early-stage luminal A tumors, as they might contribute to identify patients at higher risk of progression. We propose that EGFR genotyping should be further evaluated for their prognostic value in prospective studies of breast cancer survival.

Ethical standards
The study was conducted following the international precepts of ethics in research and of good clinical practice. The authors complied with the Brazilian regulation of clinical research. The protocol was approved by the Ethics Committee of the Brazilian National Cancer Institute (INCA #129/08), and all patients gave written consent to participate.