Proteomic characterization of paired non-malignant and malignant African-American prostate epithelial cell lines distinguishes them by structural proteins

While many factors may contribute to the higher prostate cancer incidence and mortality experienced by African-American men compared to their counterparts, the contribution of tumor biology is underexplored due to inadequate availability of African-American patient-derived cell lines and specimens. Here, we characterize the proteomes of non-malignant RC-77 N/E and malignant RC-77 T/E prostate epithelial cell lines previously established from prostate specimens from the same African-American patient with early stage primary prostate cancer. In this comparative proteomic analysis of RC-77 N/E and RC-77 T/E cells, differentially expressed proteins were identified and analyzed for overrepresentation of PANTHER protein classes, Gene Ontology annotations, and pathways. The enrichment of gene sets and pathway significance were assessed using Gene Set Enrichment Analysis and Signaling Pathway Impact Analysis, respectively. The gene and protein expression data of age- and stage-matched prostate cancer specimens from The Cancer Genome Atlas were analyzed. Structural and cytoskeletal proteins were differentially expressed and statistically overrepresented between RC-77 N/E and RC-77 T/E cells. Beta-catenin, alpha-actinin-1, and filamin-A were upregulated in the tumorigenic RC-77 T/E cells, while integrin beta-1, integrin alpha-6, caveolin-1, laminin subunit gamma-2, and CD44 antigen were downregulated. The increased protein level of beta-catenin and the reduction of caveolin-1 protein level in the tumorigenic RC-77 T/E cells mirrored the upregulation of beta-catenin mRNA and downregulation of caveolin-1 mRNA in African-American prostate cancer specimens compared to non-malignant controls. After subtracting race-specific non-malignant RNA expression, beta-catenin and caveolin-1 mRNA expression levels were higher in African-American prostate cancer specimens than in Caucasian-American specimens. The “ECM-Receptor Interaction” and “Cell Adhesion Molecules”, and the “Tight Junction” and “Adherens Junction” pathways contained proteins are associated with RC-77 N/E and RC-77 T/E cells, respectively. Our results suggest RC-77 T/E and RC-77 N/E cell lines can be distinguished by differentially expressed structural and cytoskeletal proteins, which appeared in several pathways across multiple analyses. Our results indicate that the expression of beta-catenin and caveolin-1 may be prostate cancer- and race-specific. Although the RC-77 cell model may not be representative of all African-American prostate cancer due to tumor heterogeneity, it is a unique resource for studying prostate cancer initiation and progression.


Background
Prostate cancer continues to be a substantial burden in the American population. It remains the second leading cause of cancer death among American men, and model-based estimates continue to predict prostate cancer to be most frequently diagnosed among new cancer cases in American men [1]. Prostate cancer is particularly intriguing because of the striking racial health disparity between African-American and Caucasian-American patients. In the most recent data, African-American men have had the highest prostate cancer incidence and mortality of any race and ethnicity in the United States [1]. Race is a significant risk factor for prostate cancer: African-American men are more likely to receive a prostate cancer diagnosis, with a reported incidence rate between 1.5 and 1.86 times higher in African-American men than in Caucasian-American men [1][2][3]. African-American men are also more likely to receive that diagnosis at a younger age, 3 years younger than Caucasian-American men [4,5]. Furthermore, prostate cancer mortality is twice as high in African-American men compared to Caucasian-American men [1,6].
Prostate cancer racial disparities between African-American and Caucasian-American patients often reflect more advanced or aggressive cancer in African-American men. African-American men present with higher grade tumors, report more treatment-related side effects, and have shorter progression-free survival [5]. Men with high-risk prostate cancer were more likely to be African-American, even in patients with low prostate-specific antigen levels [7]. Tumor volumes were reported to be larger in African-American men compared to matched Caucasian-American specimens [8]. Higher Gleason scores and cancer volumes were also reported in African-American men compared to Caucasian-Americans [9]. Gene and microRNA profiling of African-American and Caucasian-American tumor tissue have demonstrated racial variation [10][11][12][13][14][15][16][17]. In light of this, it is increasingly important to study prostate cancer in the context of race, as tumor characteristics have been shown to vary by race. Although socioeconomic factors, treatment choices, comorbidities, and quality of medical care factor into higher incidence and mortality rates, increased prostate cancer-specific mortality is largely attributed to tumor characteristics [18].
One approach to exploring the mechanisms of prostate cancer development and progression is the use of prostate cancer-derived cell lines as in vitro models of the disease. PC-3, DU145, and LNCaP cell lines are popular, well-established, and well-characterized prostate cancer research models [19][20][21]. The gene and protein expression profiles of these cell lines and their derivatives have also been outlined [19][20][21][22][23][24][25]. According to American Type Culture Collection data sheets, PC-3, DU145, and LNCaP cell lines were established from Caucasian prostate cancer patients aged 59 to 69 years old. The PC-3 cell line was established from a prostatic adenocarcinoma metastatic to bone, and PC-3 cells have features common to neoplastic cells and do not respond to androgen [23]. The DU145 cell line was established from a brain metastasis of human prostate carcinoma, and DU145 cells do not express androgen receptors [19,21]. The LNCaP cell line was established from a supraclavicular lymph node metastatic lesion of prostate adenocarcinoma. While LNCaP cells express androgen receptors and grow in response to androgen, they lose this requirement for growth in later passages [23]. Cell lines derived from non-African-American backgrounds may be less beneficial in providing an understanding of the factors leading to high prostate cancer risk in African-American men. They may also be inadequate for explaining the aggressiveness of prostate cancer in African-American men. However, few prostate cancer models have been established from African-American patients. E006AA is an epithelial cell line with low tumorigenicity derived from cancerous tissue of an African-American patient diagnosed with clinically localized T2aN0M0 prostate cancer [26]. Another cell line, E006AA-hT, which was derived from E006AA cells, is highly tumorigenic [27]. The non-neoplastic RC-165 N cell line was derived from benign tissue of an African-American patient and immortalized by telomerase [28]. MDA PCa 2a and MDA PCA 2b cell lines were derived from a bone metastasis of an androgen-independent cancer from an African-American patient [29]. These cell lines are tumorigenic but have deviated from the androgen insensitive phenotype from which they were derived (i.e., the cells behave differently in vivo and in vitro). None of the above-mentioned models is a malignant and non-malignant pair.
The human malignant and non-malignant immortalized prostate epithelial cell lines RC-77 T/E and RC-77 N/E were established previously from prostate tissue from an African-American patient [30]. This primary tumor was a stage T3c poorly differentiated adenocarcinoma of Gleason score 7. RC-77 cell lines have epithelial character, have functioning androgen receptors, are immortalized, and form a malignant and non-malignant pair. There are few studies on RC-77 cell lines. To date, the RC-77 cell lines have been characterized in terms of miRNA expression, ATP-binding cassette sub-family D member 3 (ABCD3) gene expression, roundabout homolog 1 (ROBO1) mRNA and protein expression, and B lymphoma Mo-MLV insertion region 1 homolog (BMI1) protein levels [17,[31][32][33][34]. This work is the only comprehensive proteomic characterization of RC-77 T/E and RC-77 N/E cell lines.

Cell culture and lysis
Both RC-77 N/E and RC-77 T/E cell lines were cultured in Keratinocyte-SFM medium supplemented with bovine pituitary extract and recombinant epidermal growth factor (Life Technologies, Inc., Gaithersburg, MD) in a fully humidified incubator containing 95% air and 5% CO 2 at 37°C. After aspirating culture medium, cells were washed twice with phosphate-buffered saline. The washed cells were collected and lysed on ice for 10 min in NP-40 lysis buffer (50 mM Tris-HCl pH 7.2; 150 mM NaCl; 1% Triton X-100; 0.1% sodium dodecyl sulfate; 0.2% sodium deoxycholate in water) containing an EDTA-free protease and phosphatase inhibitor cocktail (Thermo-Pierce, Rockford, IL) at a ratio of 20 μL buffer/ 500,000 cells. Cell lysates were spun at 14,000 rpm at 4°C for 10 min. The supernatant was collected and the pellet discarded.

Mass spectrometry
Cell lysates were desalted on Zeba™ Desalt Spin Columns (Thermo-Pierce, Rockford, IL). Using a ProteoExtract™ All-in-One Trypsin Digestion Kit (Calbiochem, Darmstadt, Germany), vacuum-dried cell lysates were re-suspended, and proteins were extracted into a mass spectrometrycompatible buffer then digested with trypsin. Protein expression was analyzed by high-resolution electrospray tandem mass spectrometry (MS/MS) with an externally calibrated Thermo LTQ Orbitrap Velos mass spectrometer. For each of three biological replicates, nanospray liquid chromatography-MS/MS was run in technical triplicate, and all measurements were performed at room temperature. Technical details of the mass spectrometry analyses can be found in the Additional Files (see Additional file 1). The threshold for peptide identification was set at 95% confidence and the stringency for protein identification was set at 99% confidence with at least 2 peptide matches.

Data processing and analysis
Protein expression data was captured in the form of spectral counts, and any non-integer values were rounded up to the nearest whole integer. Each identified protein was mapped to a single gene symbol and Entrez ID. For protein isoforms, expression counts were summed to generate a single dataset for each gene. Such 1:1 mapping was required in downstream analyses. The R programming environment (version 3.2.1) [35] was used to process the spectral count data as described above, to perform statistical calculations, and to plot data. Differential protein expression between RC-77 T/E and RC-77 N/E cell lines was assessed using the processed spectral count data by an unpaired Wilcoxon rank-sum test with an applied continuity correction and two-sided alternative hypothesis via a built-in R function. Differentially expressed proteins (DEPs) were defined as those proteins whose mean spectral count differed between the two comparison sets with at least 90% confidence after adjusting for the false discovery rate using the Benjamini-Hochberg function. Next, fold changes in protein expression levels between RC-77 T/E and RC-77 N/E cell lines were calculated by taking the base 2 logarithm (log 2 ) of the ratio of the mean spectral count of RC-77 T/E samples to the mean spectral count of RC-77 N/E samples. In this way, proteins downregulated in RC-77 T/E showed negative fold changes, whereas proteins upregulated in RC-77 T/E showed positive fold changes. For samples with zero means, the data was transformed by adding one to both means, which did not substantially affect the results of downstream analysis. A MA plot was constructed to confirm that variance remained stable (see Additional file 2).

Overrepresentation analysis
To reveal any patterns in the classes or functions of proteins differentially expressed between RC-77 T/E and RC-77 N/E cell lines, DEPs were subjected to overrepresentation analysis using Protein ANalysis THrough Evolutionary Relationships (PANTHER) analysis tools [36]. The list of DEPs was loaded into the PANTHER Classification System data analysis tool (version 11.1), which sorted the DEPs by PANTHER protein class and Gene Ontology (GO) annotations. Using the same list of DEPs, the PANTHER statistical overrepresentation tool (release 20,161,024) was used to assess the probability that the number of DEPs belonging to each protein class or GO category was greater than the number expected in each category picked at random based on a reference human genome. Additionally, the overrepresentation of entire pathways among DEPs was assessed using the National Cancer Institute-Nature Pathway Interaction Database [37]. The list of DEPs was uploaded and searched against this database, and the overrepresentation of pathways was calculated, adjusting probabilities for multiple-hypotheses testing. To determine if the results obtained for DEPs were due to random chance, the same overrepresentation analyses were conducted for 1000 random sets containing the same number of proteins as DEPs sampled from the remaining non-differentially expressed proteins and from the total number of identified proteins detected by mass spectrometry.

Gene set enrichment analysis
Gene Set Enrichment Analysis (GSEA) (version 2.2.0), which is a type of correlation analysis that uses expression data to associate gene sets with a particular phenotype [38], was used to identify groups of genes associated with either RC-77 T/E or RC-77 N/E cells. So as not to bias against small changes in expression, the processed protein spectral count data were inputted into the software without filtering for differential expression, and the log 2 fold change was ignored. Proteins that could not be mapped to an Entrez ID were excluded from this analysis. Gene sets containing a minimum of 5 genes and up to a maximum of 500 genes were pulled from BioCarta and Reactome databases (downloaded from the GSEA's Molecular Signatures Database, version 5) and from a customized database of relevant KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways (see Additional file 3). The GSEA software interrogated each gene set against a list of the protein data ranked by correlation to RC-77 T/E or RC-77 N/E samples to determine which proteins from the ranked list appeared in a given pathway and whether they were randomly distributed or clustered among a phenotype. Enrichment (relative to RC-77 N/E) was based on the number of highly correlated genes from the ranked list that appeared in the pathway with a chosen FDR cut-off of q < 0.25.

Signaling pathway impact analysis
Signaling Pathway Impact Analysis (SPIA) was used to provide a system-level assessment of pathway significance by incorporating overrepresentation, a function of differential expression and the magnitude of expression change (as a log 2 ratio), and topology, the position of the protein in a pathway [39]. Pathway topology is important because it distinguishes genes or proteins that may be at trigger, regulatory, divergent, or end positions. SPIA was completed using the "SPIA" R package (version 2.18.0). The processed protein spectral count data including the results of the differential expression analysis and log 2 fold changes were uploaded. Proteins that could not be mapped to an Entrez ID were excluded from this analysis. The threshold for differential expression was set to q < 0.1. The same relevant KEGG pathways used in GSEA were used for SPIA (see Additional file 3). KEGG pathways were chosen because they contain information about pathway topology. SPIA calculated the overrepresentation and perturbation probabilities and combined them into a global probability that a pathway was activated or inhibited in RC-77 T/E cells. The overrepresentation probability reflects the likelihood the number of DEPs observed in a pathway was larger than that observed by random chance. The perturbation probability reflects whether the positions of DEPs in a particular pathway were at crucial junctions that could perturb the pathway. The false discovery rate-adjusted global probability was the metric used to rank the significance of the pathways.

Analysis of DEPs relevance in human prostate cancer patient specimens
Using The Cancer Genome Atlas (TCGA) prostate adenocarcinoma (PRAD) cohort, a dataset of 12 age-and stage-matched African-American and Caucasian-American specimen pairs (24 specimens total) was created. These specimen pairs were used to investigate how the protein and RNA expression of the 63 DEPs differed by race. To generate the dataset, TCGA protein data was downloaded from CBioportal, and TCGA RNA expression data was downloaded from FireBrowse.org. Both are repositories for TCGA data. The protein data available from the TCGA PRAD cohort was obtained via Reverse Phase Protein Array and was limited to 219 proteins. TCGA RNA expression data was obtained through Illumina HiSeq (RNA sequencing) and comprised over 20,000 gene transcripts. Only DEPs present in both datasets were carried forward for further analysis. Because the RC-77 T/E cell line was generated from an early stage primary tumor, only tumors with a Gleason score of 6 or 7 were included (see Additional file 4). Data frames of extracted protein and RNA expression data were created with Microsoft Excel.
Because protein data for non-malignant PRAD specimens was not available in TCGA data and non-malignant PRAD tissue was not collected from all patients, direct tumor-to-non-malignant comparisons could not be performed. In order to compare expression distributions, the average of the race-specific non-malignant PRAD RNA expression was subtracted from the age-and stagematched tumor specimens (see Additional file 4). Of the 499 individuals in TCGA PRAD patient cohort, 51 had non-malignant PRAD tissue RNA expression data. After filtering for Gleason score (≤ 7), 34 (4 African-American and 30 Caucasian-American) non-malignant prostate tissue specimens were included in the non-malignantexpression-normalized analysis (see Additional file 4). The statistical significance of differences between African-American and Caucasian-American patient specimens were analyzed using the "t.test" function in R.

Results
Overall, 843 proteins were identified by mass spectrometry, and 833 proteins remained in the dataset after processing to consolidate isoforms (see Additional files 5 and 6, respectively). These 833 proteins formed the dataset used in GSEA and SPIA analysis. Between RC-77 T/ E and RC-77 N/E cell lines, 744 proteins were shared, 74 proteins were detected in RC-77 T/E cells but not RC-77 N/E cells, and 15 proteins were detected in RC-77 N/ E but not RC-77 T/E cells. In total, expression levels of 200 proteins varied between RC-77 T/E and RC-77 N/E cells (p < 0.05, Wilcoxon rank-sum test); but after correcting for the false-discovery rate, only 63 proteins retained significance (q < 0.1). These 63 proteins formed the list of DEPs: 17 proteins downregulated in RC-77 T/ E cells and 46 proteins upregulated in RC-77 T/E cells (Table 1). A full listing of protein expression changes between RC-77 N/E and RC-77 T/E cells is found in the Additional files (see Additional file 6). The distribution of log 2 fold changes for all proteins was plotted in a 1-D scatter plot (Fig. 1). DEPs tended to have greater than two-fold changes in expression levels, and most log 2 fold changes clustered around −2.0 and +1.5. The reproducibility among biological replicates was good (see Additional files 7 and 8).

Overrepresentation analysis
For each of the 63 DEPs, PANTHER protein class and GO annotations were pulled from the PANTHER database, 14 Upregulated *Carries a "Structural" or "Cytoskeletal" annotation in PANTHER. P-value is the probability the protein differs between RC-77 N/E and RC-77 T/E as calculated by an unpaired Wilcoxon rank-sum test, and q-value is the probability adjusted for multiple hypotheses testing using the Benjamini-Hochberg method. The log 2 fold change was calculated using the RC-77 T/E to RC-77 N/E ratio. Significant pathway or gene set involvement reflects the results of Gene Set Enrichment Analysis and Signaling Pathway Impact Analysis and the number of annotations in each category were counted (Fig. 2). No annotations were found for 12 DEPs; however, a pattern of nucleic acid binding and structural proteins emerged among the annotations for the 51 remaining DEPs. "Nucleic Acid Binding" was the most populated PANTHER protein class category with 15 DEPs, while 10 DEPs were classified as "Structural" and/or "Cytoskeletal Proteins", and another 6 DEPs were classified as hydrolases ( Table 2). The remaining DEPs were spread nearly evenly across 20 other categories (Fig. 2a). When DEPs were sorted by GO Molecular Function notation (Fig. 2c), the "Binding" and "Catalytic Activity" GO Molecular Function labels each covered over 40% (21 of 51 DEPs) of the annotated DEPs, and the "Structural Molecule Activity" label was also highly populated (13 of 51 DEPs) ( Table 3). Overrepresentation analysis supported the pattern of structural/cytoskeletal proteins among proteins differentially expressed between RC-77 T/E and RC-77 N/E cells (Table 4). Only the "Cytoskeletal Protein" PANTHER protein class category (q = 0.033) was statistically overrepresented among the DEPs compared to the reference human genome/proteome (20,814 genes/proteins). Because structural and cytoskeletal proteins are highly abundant, we verified the results of the enrichment and overrepresentation of this protein class by comparing the results to those obtained using an equivalent number of randomly sampled proteins. We repeated the overrepresentation analysis on 1000 subsets of 63 proteins (the number of DEPs identified) randomly sampled from the 770 non-differentially expressed proteins and from all 833 proteins identified by mass spectrometry compared to the reference human genome/proteome. Among the repeated sets of proteins pulled from the 770 non-DEPs, structural/cytoskeletal proteins protein were significantly overrepresented in only 2 sets; there were no sets from the proteins sampled from all 833 proteins with significant overrepresentation of the structural/cytoskeletal protein class (Table 4). Therefore, we conclude with high probability (99.8%) that the overrepresentation of the structural/cytoskeletal protein class among the 63 DEPs is not by random chance. In contrast, many DEPs were labeled with the "Catalytic Activity" GO Molecular Function; however, enzyme protein classes were not overrepresented according to the enrichment test and were more frequent among the random samples. These results verified that the differences between RC-77 T/E and RC-77 N/E cell lines are specifically linked to structural/cytoskeletal proteins because none of the 1000 random subsets of proteins from 770 non-DEPs were enriched in structural proteins relative to the genome/ proteome.
There was a deviation from the pattern of structural/ cytoskeletal protein overrepresentation when DEPs were analyzed by GO Biological Process annotations. Metabolic and cellular processes were the most common GO Biological Process annotation, with 37 and 23 proteins, respectively ( Fig. 2B and Table 5). The GO Biological Process category "Metabolic Process" encompasses carbohydrate, lipid, protein, amino acid, and nucloeobasecontaining compound metabolism; and the GO Biological Process term "Cellular Process" is an umbrella heading for cell communication, cell cycle, cytokinesis, and cellular component movement. The GO Biological Process categories "Biological Regulation", "Developmental Process", and "Cellular Component Organization or Biogenesis" were evenly populated (Fig. 2b).
In addition to grouping by PANTHER protein class or GO annotations, pathway overrepresentation among the DEPs was also assessed using the National Cancer Institute-Nature Pathway Interaction Database. Again, structural molecules featured prominently in these pathways, including integrin alpha-6, integrin beta-1, and beta-catenin (Table 6).

Gene set enrichment analysis
Although overrepresentation analysis showed that structural proteins and pathways related to structural proteins differed between RC-77 T/E and RC-77 N/E cells, this Fig. 1 Magnitude of protein expression changes between RC-77 T/E and RC-77 N/E cell lines. In this one-dimensional scatter plot, the magnitude of protein expression changes is represented by log 2 fold ratio. Red diamonds represent differentially expressed proteins. Black squares represent other identified proteins that were not significantly different analysis did not link these differences directly to either of the cell lines. GSEA identified groups of genes specifically associated with either RC-77 T/E or RC-77 N/E cells. For this analysis, all protein data were used as the input, not just data for the 63 DEPs. Multiple gene sets were enriched in RC-77 T/E and RC-77 N/E cells (Table  7). A complete listing of GSEA results is presented in the Additional files (see Additional file 9). An enriched gene set contained a significant number of proteins whose expression most correlated with either RC-77 T/E or RC-77 N/E cells. The most significantly enriched gene set in RC-77 T/E cells was the KEGG "Tight Junction" gene set. Additionally, the KEGG "Adherens Junction" gene set was highly enriched in RC-77 T/E cells. The most significant gene set enriched in RC-77 N/E cells was the KEGG "Cell Adhesion Molecules", and the KEGG "ECM-Receptor Interaction" gene set was also highly enriched in RC-77 N/E cells. Interestingly, structural proteins contributed to the enrichment of each of these gene sets in their respective cell lines. While alpha-actinin-1 and beta-catenin were associated with RC-77 T/E cells, integrin alpha-6, integrin beta-1, laminin subunit gamma-2, and CD166 antigen were associated with RC-77 N/E cells. These results corroborate the overrepresentation of structural proteins in these cell lines. Furthermore, this enrichment analysis differentiates which structural protein was associated with each cell line.

Signaling pathway impact analysis
SPIA was conducted to address both the overrepresentation and pathway topology of DEPs to determine whether the DEPs found in a pathway have a meaningful impact within that pathway. SPIA differs from GSEA in two key ways. First, it considers the magnitude of expression and establishes a difference in impact between small and large fold changes. Second, by including a measure of perturbation, SPIA more fully captures the interactions of proteins, which can be lost in overrepresentation analyses and correlation analyses like GSEA. Four KEGG pathways were significantly impacted in the RC-77 T/E cell line: "Focal Adhesion" (false discovery rate-adjusted global probability [pGFdr] = 0.00934), "Small Cell Lung Cancer" (pGFdr = 0.0246), "Proteoglycans in Cancer" (pGFdr = 0.0246), and "ECM-Receptor Interaction" (pGFdr = 0.0246) ( Table 8). Based on the expression pattern of the DEPs found in the pathway, SPIA predicted these four pathways were inhibited in RC-77 T/E cells. In corroboration, "ECM-Receptor Interaction" and "Small Cell Lung Cancer" were enriched in RC-77 N/E cells according to GSEA results. Pathway images with DEPs highlighted can be found in the full SPIA results presented in the Additional files (see Additional file 10). Note that not all components of Function Gene Ontology terms. Note: No annotations were found for 12 DEPs (laminin subunit gamma-2, SH3 domain-binding glutamic acid-rich-like protein 3, serine/arginine-rich splicing factor 1, CD44 antigen, tRNA-splicing ligase RtcB homolog, ribosome-binding protein 1, scaffold attachment factor B1, nucleoprotein TPR, integrin alpha-6, protein PML, squalene synthase, and X-ray repair cross-complementing protein 5). DEP = differentially expressed protein; PANTHER = PANTHER: Protein ANalysis THrough Evolutionary Relationships  • Myosin heavy chain-9 • 60S ribosomal protein L6 • 40S ribosomal protein S11 • alpha-actinin-1 • vimentin • lamin-B1 • caveolin-1 the significantly impacted pathways were differentially expressed.

Differentially expressed proteins with recurring pathway involvement
Many of the significant pathways featured a small recurring group of DEPs: beta-catenin, alpha-actinin-1, integrin beta-1, integrin alpha-6, caveolin-1, filamin-A, laminin subunit gamma-2, and CD44 antigen (Table 1). Betacatenin and alpha-actinin-1 contributed to the significance of the "Tight Junction", "Adherens Junction", "Hippo Signaling Pathway", and "Focal Adhesion" pathways. Integrin beta-1 and integrin alpha-6 were included in the "Cell Adhesion Molecules", "Small Cell Lung Cancer", and "ECM-Receptor Interaction" pathways. Caveolin-1 and filamin A were included in the "Focal Adhesion" and "Proteoglycans in Cancer" pathways. Laminin subunit gamma-2 appeared in the "ECM-Receptor Interaction", "Small Cell Lung Cancer", and "Focal Adhesion" pathways. Finally, CD44 antigen appeared in the "Proteoglycans in Cancer" and "ECM-Receptor Interaction". Experimental, co-expression, co-occurrence, and homology interactions between DEPs were visualized using STRING (Search Tool for the Retrieval of Interacting Genes/Proteins) [40] (Fig. 3). This plot displays direct interactions between DEPs. Nodes were centered on integrin beta-1, beta-catenin, and caveolin-1, suggesting these proteins have the potential to affect other proteins and may be involved in functional networks.

Differentially expressed proteins and genes in human prostate cancer patient specimens
To determine the relevance of the 63 DEPs identified in the RC-77 cell line series in human prostate cancer specimens, we extracted protein and RNA expression data from TCGA PRAD cohort. We compared the protein and RNA expression of the 63 DEPs between African-American and Caucasian-American prostate cancer specimens; only caveolin-1, beta-catenin, myosin heavy chain-9, serine/arginine-rich splicing factor 1/splicing The PANTHER overrepresentation analysis was run on the subset of 63 DEPs and on 1000 subsets of 63 proteins (the number of DEPs identified) randomly sampled from the 770 non-differentially expressed proteins and from all 833 proteins identified by mass spectrometry. Overrepresentation was based on comparison to the reference human genome/proteome. DEP differentially expressed protein, PANTHER PANTHER: Protein ANalysis THrough Evolutionary Relationships factor 2, double-stranded RNA-specific adenosine deaminase, and X-ray repair cross-complementing protein 5 had both protein and RNA data. X-ray repair crosscomplementing protein 5 protein levels were significantly higher in African-American prostate cancer specimens than in Caucasian-American prostate cancer specimens (p < 0.05) (Fig. 4a). The RNA expression of caveolin-1 and myosin heavy chain-9 were significantly downregulated in African-American prostate cancer specimens compared to Caucasian-American prostate cancer specimens (p < 0.01 and p < 0.05, respectively) (Fig. 4b). After subtracting mRNA expression levels of non-malignant specimens from human prostate cancer specimens, caveolin-1 and beta-catenin mRNA expression levels were significantly higher in African-American prostate cancer patient specimens compared to Caucasian-American prostate cancer specimens (Fig. 5). As indicated by the negative RNA expression value, caveolin-1 was downregulated in African American prostate cancer specimens compared to African American non-malignant control specimens; on the contrary, beta-catenin was upregulated. Therefore, the reduction of caveolin-1 protein levels and the increased protein levels of beta-catenin seen in the tumorigenic RC-77 T/E cells were mirrored in the downregulation of caveolin-1 mRNA and upregulation of beta-catenin mRNA in African-American prostate cancer specimens.

Discussion
The paired non-malignant and malignant African-American prostate epithelial cell lines RC-77 T/E and RC-77 N/E represent one of only a few cell lines derived from African-American prostate cancer patients [30]. E006AA, RC-165 N, and MDA-PCa 2a/2b are other African-American patient-derived cell lines. E006AA also has a highly tumorigenic derivative, E006AA-hT, and an associated stroma cell line, S006AA [27]. While the E006AA-hT model can be used to examine the differences between less and more highly tumorigenic cancers, it does not have a non-malignant paired epithelial cell line. The RC-165 N cell line is unique because it was derived from benign prostate tissue of an African-American male and was immortalized by telomerase [41]. This cell line is useful for understanding the functions of the androgen receptor in prostate epithelial cells. MDA-PCa 2a/2b cells are tumorigenic but differ in vivo and in vitro. These cell lines are a useful androgen sensitive model, but, unlike RC-77 cells, they do not have a paired non-malignant cell line from the same patient [29]. As RC-77 cell lines have epithelial-like characteristics, have functioning androgen receptors, and are immortalized with both a malignant and non-malignant pair, they represent a promising model for studying prostate cancer.
Here, we report the global proteomic characterization of RC-77 T/E and RC-77 N/E cell lines. Since RC-77 T/ E cells are tumorigenic and RC-77 N/E cells are not, we analyzed DEPs between the two phenotypes. In overrepresentation analysis, GSEA, and SPIA, we consistently found that beta-catenin, alpha-actinin-1, integrin beta-1, integrin alpha-6, caveolin-1, laminin subunit gamma-2, CD44 antigen, and filamin-A expression levels contributed to the significance of the pathways highlighted in this report. Each of these proteins has structural roles or roles in cell adhesion, which explains why structural proteins were more prevalent among DEPs than could be expected by random chance and why many overrepresented pathways were related to cell adhesion (cell-cell or cell-matrix) or integrin signaling. Beta-catenin forms a complex with E-cadherin at adherens junctions to mediate cell-cell adhesion [42]. Alpha-actinin-1 forms focal adhesions, adherens junctions, tight junctions, and hemidesmosomes; forms cell-cell or cell-matrix contacts; and plays a scaffolding role for the cytoskeleton in a variety of signaling pathways [43]. Integrins interact with extracellular matrix (ECM) components to form cellmatrix attachments and propagate extracellular signals [44]. Caveolin-1 is an important component of caveolae, which are involved in molecular transport, cell adhesion, motility, and signal transduction [45,46]. Laminin forms part of the basement membrane in some epithelial tissues and functions in adhesion, migration, invasion, and differentiation [47]. The glycoprotein CD44 antigen mediates cell adhesion and cytoskeleton binding through interactions with other proteins such as ankryin and ezrin, radixin, and moesin (ERM) proteins [48] and mediates hyaluronan-stimulated proliferation, apoptosis inhibition, cell motility, invasion [49]. Filamin-A cross-links actin filaments and serves as a scaffolding protein to organize the actin cytoskeleton [50], which affects cell motility, migration, and signaling [51]. Furthermore, expression levels of beta-catenin, caveolin-1, integrin beta-1, integrin alpha-6, CD44 antigen, and alpha-actinin-1 have been shown to differ by race. Here, we have shown higher beta-catenin protein levels in malignant RC-77 T/E cells compared to RC-77 N/E cells and that its mRNA is upregulated in African-American prostate cancer specimen compared to Caucasian-American specimen after subtracting the mRNA expression of race-specific non-malignant controls. These results are consistent with previous reports that betacatenin is highly elevated in African-American prostate tumors compared to Caucasian tumors [16,52]. Integrin alpha-6 and integrin beta-1 were downregulated in RC-77 T/E cells compared to RC-77 N/E cells, and integrins have been shown to be downregulated in African-American prostate cancer tissue compared to Caucasian specimens [12]. Thus, in these aspects, RC-77 T/E cells reflect in vivo characteristics of African-American prostate cancer and may be useful in the study of malignant transformation in African-American prostate tumors. While alpha-actinin-1 was upregulated in malignant RC-77 T/E Positive enrichment scores correspond to enrichment in RC-77 T/E samples. Negative enrichment scores correspond to enrichment in RC-77 N/E samples. Bolded proteins were differentially expressed (q < 0.1, Wilcoxon rank-sum test). *Carries a "Structural" or "Cytoskeletal" annotation in PANTHER. ChREBP2 carbohydrate responsive element binding protein, ECM extracellular matrix, KEGG Kyoto Encyclopedia of Genes and Genomes, NES normalized enrichment score (normalized to size of the pathway); p-value = probability of significance after permutation, q-value = false discovery rate-adjusted p-value; size = total number of genes in pathway ECM extracellular matrix, NDE number of differentially expressed elements, pG global probability, pGFdr false discovery rate-adjusted global probability, pNDE overrepresentation probability, pPERT, perturbation probability cells, it was downregulated in African-American prostate cancer tissue compared to Caucasian specimens [12].
Our results also showed that caveolin-1 protein level was lower in malignant RC-77 T/E cells than nonmalignant RC-77 N/E cells and that its mRNA expression was downregulated in African-American prostate cancer patient specimen compared to non-malignant African-American prostate specimen. After subtracting race-specific non-malignant RNA expression, caveolin-1 mRNA expression was higher in African-American prostate cancer patient specimens than in specimens from Caucasian-American patients. This result is in agreement with another study reporting elevated caveolin-1 protein expression in African-American prostate cancer specimens compared to Caucasian-American specimens [53]. African-American prostate cancer patients were also found to have higher rates of methylation of the CD44 gene [54], which was downregulated in malignant RC-77 T/E cells in this study.
To understand how differential expression of betacatenin, caveolin-1, integrin beta-1, integrin alpha-6, CD44 antigen, and alpha-actinin-1 in RC-77 T/E cells may be related to phenotypic differences between RC-77 T/E and RC-77 N/E cell lines, we looked at the interactions between the DEPs using both STRING (to visualize direct interactions) and pathway analyses. First, the STRING network map revealed beta-catenin, integrin beta-1, and caveolin-1 in nodal positions, meaning these proteins may interact with several other DEPs in our dataset and may be a key regulator of the pathways highlighted in our results. For example, interaction between filamin-A and integrin beta-1 or caveolin-1 promotes migration, cell spreading, or metastasis, while interaction with other proteins results in inhibition of metastasis [51]. While filamin-A was upregulated in RC-77 T/E cells, integrin beta-1, caveolin-1 and vimentin, three of its binding partners that promote metastasis, were significantly downregulated. This is congruent Fig. 3 Functional associations between differentially expressed proteins in RC-77 T/E and RC-77 N/E cell lines. STRING (Search Tool for the Retrieval of Interacting Genes/Proteins) was used to visualize a network of functional associations between differentially expressed proteins. Interactions were limited to only those supported by experimental evidence, co-expression or co-occurrence data, and gene homology data. See Table 1 for the full names of proteins abbreviated here. Nodes centered on integrin beta-1, beta-catenin, and caveolin-1, suggesting these proteins have the potential to affect other proteins and may be involved in functional networks with our knowledge that RC-77 T/E cells are derived from early stage primary prostate cancer (Gleason score 7) and are not metastatic [30]. Second, pathway analyses revealed that a common thread among the significant pathways was the inclusion of structural proteins, which could each be linked to invasion or migration of cells. "Tight Junction" and "Adherens Junction" pathways were enriched specifically in RC-77 T/E cells. Tight junctions are composed of claudin proteins, junctional adhesion molecules, integral membrane proteins, and cytoplasmic proteins, while adherens junctions are formed of cadherins and catenins [55]. Both hold together adjacent cells and help with structural and mechanical cell-cell integrity. The disruption of cell adhesion can facilitate the metastasis of tumor cells to secondary locations and lead to cell growth unchecked by contact inhibition [56]. The "Focal Adhesions" and "Proteoglycans in Cancer" pathways were significantly inhibited in RC-77 T/E cells. Proteoglycans in the tumor microenvironment associate with ECM proteins and affect proliferation, adhesion, and metastasis [57]. While the significance of the "Small Cell Lung Cancer" KEGG pathway may seem odd, it was highlighted in this dataset because of the role of ECM-receptor interactions and focal adhesions in cancer progression (see Additional file 10). "Cell Adhesion Molecules" and "ECM-Receptor Interaction" pathways, which were enriched in RC-77 N/E cells according to GSEA and shown to be significant by SPIA, were primarily flagged because of integrin expression. Fig. 4 Expression of differentially expressed proteins by race in age-and stage-matched human prostate cancer specimens. In 12 age-and stage-matched prostate cancer specimen pairs extracted from TCGA without subtracting the non-malignant controls, (A) XRCC5 protein was found to be significantly different (p < 0.05) between African-American and Caucasian-American prostate cancer specimens and (B) RNA expression of CAV1 and MYH9 were found to be significantly different (p < 0.01 and <0.05, respectively) between African-American and Caucasian-American prostate cancer specimens. The p-values were generated using the "t.test" function in R. AA = African-American; ADAR = double-stranded RNA-specific adenosine deaminase; CA = Caucasian-American; CAV1 = caveolin-1; CTNNB1 = beta-catenin; MYH9 = myosin heavy chain-9; SRSF1 = serine/arginine-rich splicing factor 1; TCGA = The Cancer Genome Atlas; XRCC5 = X-ray repair cross-complementing protein 5 Additional file 8: Additional Analysis on Reproducibility of Protein Fold Changes between Paired Malignant and Non-Malignant Replicates. The differential expressions are stable across different pairs of tumor and nonmalignant cell lines. (PNG 695 kb) Additional file 9: Complete Gene Set Enrichment Analysis Results. This table lists the enriched gene sets identified from KEGG, BioCarta, and Reactome databases using Gene Set Enrichment Analysis. Positive enrichment scores correspond to enrichment in the malignant samples (RC-77 T/E). Negative enrichment scores correspond to enrichment in the non-malignant samples (RC-77 N/E). SIZE = total number of genes in pathway, ES = enrichment score, NES = normalized enrichment score, NOM p-val = unadjusted probability of enrichment, FDR q-val = false discovery rate-adjusted probability. Additional file 10: Complete Signaling Pathway Impact Analysis Results. This table presents the complete results of Signaling Pathway Impact Analysis. For each pathway, a link to a pathway diagram highlighting differentially expressed proteins in red is provided. ID = KEGG ID, pSize = pathway size, NDE = number of differentially expressed proteins in pathway, pNDE = probability of overrepresentation, tA = total accumulated perturbation, pPERT = probability of perturbation, pG = combined global probability of overrepresentation and perturbation, pGFdr = false-discovery rate-adjusted global probability, pGFWER = familywise error rate-adjusted global probability.
Abbreviations DEP: Differentially expressed protein; ECM: Extracellular matrix; FDR: False discovery rate; GO: Gene Ontology; GSEA: Gene Set Enrichment Analysis; KEGG: Kyoto Encyclopedia of Genes and Genomes; PANTHER: Protein ANalysis THrough Evolutionary Relationships; pGFdr: False discovery rate- Fig. 5 Tumor-to-non-malignant comparison of RNA expression of the differentially expressed proteins by race in age-and stage-matched human prostate cancer specimens. Race-specific non-malignant mRNA expression levels of PRAD specimens were subtracted from 12 pairs of age-and stage-matched prostate cancer specimens extracted from TCGA, respectively. CAV1 and CTNNB1 mRNA expressions were found to be significantly higher in African-American compared to Caucasian-American specimens (p < 0.05 and <0.01, respectively). The p-values were generated using the "t.test" function in R. As indicated by the negative RNA expression value on the y-axis, CAV1 was downregulated in African American prostate cancer specimens compared to African American non-malignant control specimens. On the contrary, CTNNB1 was upregulated. AA = African-American; ADAR = double-stranded RNA-specific adenosine deaminase; CA = Caucasian-American; CAV1 = caveolin-1; CTNNB1 = beta-catenin; MYH9 = myosin heavy chain-9; PRAD = prostate cancer adenocarcinoma; SRSF1 = serine/ arginine-rich splicing factor 1; TCGA = The Cancer Genome Atlas; XRCC5 = X-ray repair cross-complementing protein 5 adjusted global probability; PRAD: Prostate adenocarcinoma; SPIA: Signaling Pathway Impact Analysis; STRING: Search Tool for the Retrieval of Interacting Genes/Proteins; TCGA: The Cancer Genome Atlas