Study population
We collected breast cancer cases from the Samsung Medical Center in Seoul, Korea. Inclusion criteria for this study were: 1) histologically verified incident breast cancer; 2) female; 3) between 20-80 years of age; 4) enrolled between January 1, 1995 and December 31, 2002; 5) breast tissue samples available for study. All of the subjects were diagnosed stage I to III primary breast cancer and underwent surgical treatment and adjuvant chemotherapy according to standard treatment protocols. Radiation therapy was performed in the cases of breast conserving surgery or on patients with stage III breast cancer. All ER-positive patients underwent hormonal therapy with tamoxifen. No patients underwent anti-HER-2 therapy. We reviewed each patient's medical record for clinical information, including follow-up status and outcome information. Breast cancer stage was classified according to the American Joint Committee on Cancer (AJCC) TNM criteria (6th edition). In the case of live subjects, the last date of follow-up was June 30, 2007. The protocol for the study was approved by the institutional review board (IRB) of the Samsung Medical Center.
Immunohistochemistry and fluorescent in situ hybridization
The original H&E-stained slides from the patients in the retrospective cohort were reviewed, and representative tumor regions without secondary change, such as hemorrhage, necrosis, and fibrosis, were marked by a pathologist (YL Choi). Corresponding 1290 FFPE breast cancer tissue blocks were obtained. Two 2 mm cores from each case were obtained, and two sets of tissue microarray (TMA) paraffin blocks were made. The sections were deparaffinized with xylene, hydrated in serial dilutions of alcohol, and then immersed in 3% hydrogen peroxide solution to neutralize endogenous peroxidase activity. Next, sections were microwaved in citrate buffer for antigen retrieval. Slides were incubated with monoclonal antibodies against CK5/6 (1:100, M7237, DAKO, Carpinteria, CA, USA), HER2 (1:250, A0485, DAKO), and EGFR (1:30, M7239, Novocastra) for 1 hour at room temperature. After washing, the tissue section was reacted with the biotinylated anti-mouse secondary antibody, followed by incubation with streptavidin-horseradish-peroxidase complex. Slides were washed, and the chromogen was developed for 5 minutes with liquid 3,3'-diaminbenzidine (DAKO). HER2 fluorescent in situ hybridization assay was performed with the PathVysion HER2 DNA Probe Kit (Abbott Molecular, Inc.) according to the manufacturer's instructions. The average copy number for each probe was determined and the amplification ratio was calculated as a ratio between the average copy per cell for HER2 and the average copy number for centromere 17.
Two pathologists (YL Choi, JS Choi) were blinded to the clinical outcomes of the patients, and independently scored the results of the staining. ER and PR stain data were acquired from the pathologic report. The staining studies were scored using the Allred score (AS), a method that semi-quantitates the proportion of positive cells (scored on a 0 to 5 scale) and staining intensity (scored on a 0 to 3 scale), with a maximum score of 8; an AS > 2 was considered positive [14]. The CK5/6, EGFR and HER2 immunohistochemical results were from TMA and considered positive with the following criteria in at least one core. CK5/6 stains were considered positive if any cytoplasmic and/or membranal staining was observed. Immunostaining for EGFR was interpreted as positive when at least 10% of the tumor cells showed moderate to strong membranal staining [15]. HER2 positivity was defined as an intensity of 3+ by IHC or as gene amplification ratio of ≥ 2.0 by FISH in the case of an intensity of 1+ or 2+ by IHC [16].
Definition of breast cancer subtypes by immunohistochemistry
The immunohistochemical surrogate panel (ER, PR, HER2, EGFR, and CK5/6) used to define the breast cancer subtypes has been previously published [2, 17]. In this study, we used two subtyping schemes. Each case was classified as one of five IHC-based subtypes: luminal A (ER+ and/or PR+, HER2−), luminal B (ER+ and/or PR+, HER2+), HER2 (ER−, PR−, and HER2+), and TNBC (ER-, PR-, and HER2-) [17]. TNBCs were further divided into BLBCs and QNBC/5NPs according to the basal-markers. TNBC expressing either EGFR or CK5/6 was defined as BLBC (ER−, PR−, HER2−, CK5/6+, and/or EGFR+). Breast tumors which were TN and expressed neither CK5/6 nor EGFR were defined as 'quintuple-negative breast cancer' (QNBC/5NP) (ER−, PR−, HER2−, CK5/6−, and EGFR−).
Statistical Analysis
Disease free survival (DFS) was defined as the time from the date of diagnosis to the date of the documentation of relapse, including locoregional recurrence and/or distant metastasis. Overall survival (OS) was expressed as the number of months from diagnosis to the date of death. Differences in the frequencies of basic characteristics, clinical parameters, and subtypes were statistically analyzed using the chi-square test, or Fisher's exact test in the case of less than five expected cases. For multiple statistical comparisons, chi-square test was corrected by Bonferroni's correction. Survival curves were constructed using the Kaplan-Meier method, and the log-rank test was used to compare mean survival rates across subtypes. For multivariate analysis, Cox regression models were built to estimate the adjusted hazard ratios (HRs) of breast cancer subtypes with tumor size, lymph node involvement and adjuvant chemotherapy. To test the statistical significance between model 1 (5-subgrouping) and model 2 (4-subgrouping), a likelihood ratio test of the differences was used. The null hypothesis was that the model 2 did not predict survival differently than model 1. Statistical significance was defined as P < 0.05. All statistical analyses were performed using SPSS 15.0 and SAS 9.1 statistical software packages.
Microarray Analysis
We classified TNBC into BLBC and QNBC/5NP with two public independent gene expression datasets, Vijver et al.(316 samples) and Wang et al.(286 samples) [18–20] Vijver et al. was generated with 2-color oligo chips (Agilent, Hu25K) and Wang et al. was 1-color oligo chips (Affymetrix, U133X3P). Each dataset consists of a large number of random breast cancer patients. Vijver et al. is available at http://www.rii.com/publications/2002/nejm.html, and Wang et al. can be downloaded from the NCBI GEO data repository (GSE2034). For Wang et al, gene expression values were centered by subtracting the mean value of each probe set across the samples from each measured value. Both datasets included only ER IHC information, so the other four IHC results (PR, HER2, CK5/6 and EGFR) were dichotomized into 'positive (+)' and 'negative (-)' by the mRNA expression levels of the corresponding genes on the microarray chips. Under the assumption that Cheang et al.'s cohort with 4,046 breast samples was representative of a random sample of breast cancer population and the proportion of ER IHC result (ER+: 70.5%) was similar to the ER IHC results (Vijver et al.: 76%, Wang et al.: 72%) in the selected microarray data sets, we used the proportion of the status of the IHC results of each marker in his dataset to determine the cut-off for the surrogate mRNA expression for the selected microarray datasets [13]. For instance, the cut-off for KRT5 which corresponds to CK5/6 was determined at the point where the proportion of '+' to '-' was the same as the proportion of 'CK5/6 +' to 'CK5/6-' in Cheang et al.'s IHC results (Additional File 1, Figure S1). The cut-offs for PGR, KRT5 and EGFR were determined by synchronizing the proportion of their statuses in Cheang et al.'s IHC results and in each microarray dataset (Additional File 1, Figure S1). The cut-off for ERBB2 was determined from the clear bimodal distribution, by assigning '+' for right side and '-' for left side (Additional File 1, Figure S1). The only available IHC result, ER, was not replaced by the expression of ESR1. Each sample from the microarray datasets was assigned to one of the five subtypes according to the status of the five markers.