Screening differentially expressed genes of pancreatic cancer between Mongolian and Han people using bioinformatics technology

Background To screen and analyze differentially expressed genes in pancreatic carcinoma tissues taken from Mongolian and Han patients by Affymetrix Genechip. Methods: Pancreatic ductal cell carcinoma tissues were collected from the Mongolian and Han patients undergoing resection in the Second Affiliated Hospital of Nanchang University from March 2015 to May 2018 and the total RNA was extracted. Differentially expressed genes were selected from the total RNA qualified by Nanodrop 2000 and Agilent 2100 using Affymetrix and a cartogram was drawn; The gene ontology (GO) analysis and Pathway analysis were used for the collection and analysis of biological information of these differentially expressed genes. Finally, some differentially expressed genes were verified by real-time PCR. Results Through the microarray analysis of gene expression, 970 differentially expressed genes were detected by comparing pancreatic cancer tissue samples between Mongolian and Han patients. A total of 257 genes were significantly up-regulated in pancreatic cancer tissue samples in Mongolian patients; while a total of 713 genes were down-regulated. In the Gene Ontology database, 815 differentially expressed genes were identified with clear GO classification, and CPB1 gene showed the highest increase in expression level (multiple difference: 31.76). The pathway analysis detected 28 signaling pathways that included these differentially expressed genes, involving a total of 178 genes. Among these pathways, the enrichment of differentially expressed genes in the FAK signaling pathway was the strongest and COL11A1 gene showed the highest multiple difference (multiple difference: 5.02). The expression of differentially expressed genes CPB1, COL11A1、ITGA4、BIRC3、PAK4、CPA1、CLPS、PIK3CG and HLA-DPA1 determined by real-time PCR were consistent with the results of gene microarray analysis. Conclusions The results of microarray analysis of gene expression profiles showed that there are a large number of differentially expressed genes in pancreatic cancer tissue samples comparing Mongolian and Han population. These genes are closely related to the cell proliferation, differentiation, invasion, metastasis and multi-drug resistance in pancreatic cancer. They are also involved in the regulation of multiple important signaling pathways in organisms.


Introduction
Pancreatic cancer has the highest mortality among all the digestive system malignancies. Its onset is occult, followed by rapid progress, so most patients are diagnosed late and lost the best treatment opportunities. Therefore, the search for specific and sensitive early diagnostic tools for the pancreatic cancer is particularly important to improve the prognosis of patients with pancreatic cancer and prolong their survival time [1]. Recent years, the completion of Human Genome Project and the maturation in gene microarray technology provide a large amount of information for high-throughput analysis of the occurrence and development of tumors, making genetic diagnosis an ideal early diagnostic method for most tumors. By analyzing genes involved in the generation and maintaining of malignant biological characteristics of pancreatic cancer, this method [2] can elucidate the molecular mechanism of pancreatic cancer. In this study, Affymetrix Gene Expression Profiling microarray was used to screen differentially expressed genes in pancreatic cancer samples of Mongolian and Han patients for further gene function analysis, in order to provide important reference data and experimental evidence for the elucidation of the molecular mechanism of pancreatic cancer development.

Reagents and instruments
The required reagents Agilent RNA 6000 Nano Kit, GeneChip 3'IVT Express Kit, GeneChip Hybridization Wash and Stain Kit, Trizol Kit, and QIAGEN RNeasy Total RNA Isolation kit were provided by Gekkai Gene Technology Co., Ltd. The main instruments used in the study included the Thremo Nanodrop 2000, Agilent 2100, GeneChip Hybridization Oven 645, GeneChip Fluidics Station 450, and GeneChip Scanner 3000.

Sample information
A total of 96 fresh tumor tissue specimens from patients with pancreatic malignancies surgically resected in the Second Affiliated Hospital of Nanchang University and the First Affiliated Hospital of Nanchang University from March 2015 to May 2018 were collected. 66 of them were initially selected according to the following inclusion and exclusion criteria and these specimens were used for follow-up gene expression microarray assays.

Inclusion criteria
(1) Pathologically confirmed pancreatic ductal cell carcinoma; (2) New cases diagnosed by our hospital for the first time; (3) Patients aged ≥18 years; (4) Patients and their families agree to provide tissue specimens for scientific research and allow the publication of research data.

Exclusion criteria
(1) Pathologically confirmed local vascular invasion and regional lymph node metastasis; (2) distant metastasis found by relevant examinations or during operation in our hospital; (3) with other malignant tumors or diabetes, hypertension, systemic diseases such as cardiovascular or cerebrovascular diseases; (4) patients who have received radiotherapy, chemotherapy, or other antineoplastic drugs before surgery; (5) with a history of surgery in the past 3 years.

General information of patients
Of the 66 patients, 32 were Mongolian and 34 were Han; 34 were males and 32 were females; the average age was 60.12; the youngest patient was 42 years-old and the oldest was 73 years-old; 24 cases were welldifferentiated, 38 cases were moderately differentiated, and 4 cases were poorly differentiated.

Sample Total RNA extraction and quality assurance
The collected samples were extracted and purified by Trizol kit and QIAGEN RNeasy Total RNA Isolation kit to obtain the total RNA. The samples were tested using Nanodrop 2000 and Agilent 2100 instruments and evaluated according to RNA concentration, A260/A280 value, RIN value and 28S/18S value. When the A260/ A280 was between 1.7 and 2.2, RIN value was smaller than A26 and 28S/18S value were bigger than 0.7, the samples were qualified. The above procedures were strictly carried out in line with the reagent and instrument instructions.

Gene microarray hybridization
The double-stranded cDNA template was synthesized after mixing the purified total RNA with the internal reference poly-A RNA according to the instructions of the GeneChip 3′ IVT Express Kit. Then, the biotin-labeled amplified RNA (aRNA) was obtained by reverse transcription in vitro. The obtained aRNA was purified using the purification reagent in the kit and fragmented to prepare a hybridization reaction solution. The hybridization reaction solution was heated at the 98°C waterbath for 10 min until the temperature of solution rose up to 45°C. Then the waterbath was retained at 45°C for more than 3 mins. Meanwhile, 130 μL of the pre-hybridization solution in the kit was injected into the chip and kept in the hybridization oven at 45°C for 10 min. After that, the pre-hybridization solution was discarded, and 130 μL of the hybridization reaction solution was injected into the chip at 45°C and the hybridization was performed at 60 rpm for 16 h to complete the hybridization. Afterwards, automatic washing and dyeing was performed using the GeneChip Fluidics Station 450 instrument, then the chip was scanned to obtain data.

Real-time quantitative PCR verification
The top 10 differentially expressed genes in Han and Mongolian pancreatic cancer patients were selected and verified. Primers were synthesized according to the PCR primer information provided by the Primer Bank database (Table 1). GAPDH was used as an internal reference and a two-step method was used. The expression of GAPDH was detected by qPCR. Using the expression level of GAPDH as the standard value "1", the relative expression levels of each differential gene in Mongolian and Han pancreatic cancer tissues and adjacent tissues were calculated. The real-time PCR kit was used to detect the expression of these genes in pancreatic cancer tissues and their adjacent normal tissues and to draw statistical charts. The reaction procedure was: Hol d (pre-denaturation): 95°C, 30 s, 1 cycle; Two-step PCR: 95°C, 5 s, 60°C, 30 s, 40 cycles; Dissociation: 95°C, 15 s, 60°C, 30 s, 95°C, 15 s, 1 cycle.

Data processing and statistical analysis
The data obtained by scanning was analyzed using the R-Project software in line with the following protocol: (1) Filtering the background noise with the 20% lowest signal strength among all microarray probes; (2) Using the software to plot the signal intensity distribution of the probes and the quadrantal diagram of the relative logarithm of the probe signal intensity, in order to evaluate the reliability and repeatability of the results of the differential expression profile microarray; (3) (3) Gene differential expression analysis was performed using the GCBI online analysis tool (https://www.gcbi.com.cn/gclib/html/index). GCBI (Gene-Cloud of Biotechnology Information) is a Rbased online analysis software developed in China, which can conveniently process microarray data. Through GCBI software, the microarray data was first processed by logarithmic standard to facilitate analysis, and differential genes were screened by statistical methods. Using the linear model based on empirical Bayesian distribution to calculate the P value of the significant difference in gene expression between the two groups, the screening criteria for gene expression with significant differences were defined as fold change> 3.0 and P-value< 0.05; (4) Using software, scatter plots were drawn based on the signal intensity of the two sets of sample chips. The volcano plots are drawn based on the fold change and the P value of the difference test between the two groups of samples. (5) Hierarchical cluster analysis was used to preliminarily classify the above microarray results from two dimensions -sample and gene differential expression patterns. The GO functional annotation refers to the description of biological functions using standard expression terms for gene and protein functions in different databases. This project was established by the Gene Ontology Consortium. The GO annotation currently includes three aspects of biological content: Biological Process, Cellular Component, and Molecular Function. This study used the DAVID online analysis website (The Database for Annotation, Visualization and Integrated Discovery, https://david.ncifcrf.gov/) to perform GO functional annotation analysis of DEGs [3]. Through GO enrichment analysis, we can clearly understand the biological function, pathway or cell location of differentially expressed genes enriched [4]. By performing GO analysis on the differentially expressed genes uploaded, the most significant differences in functions were classified into three aspects -biological process (BP), molecular function (MF) and cellular component, CC. Fisher's fine test was used to evaluate the enrichment degree (α = 0.05) of these differentially expressed genes in each classification; KEGG (Kyoto Gene and Genome Encyclopedia, http://www.genome.jp/kegg/) was established by Kanehisa Lab at the Bioinformatics Center of Kyoto University in Japan and is based on information on genome, chemistry, and system functions. A database of biological information and biological information included in the cell, which predicts the role of proteins in various cellular activities and maps them into networks. By analyzing the differentially expressed genes in the signal pathway, one can understand the metabolic pathways that are significantly altered under disease conditions, which is of great significance for the exploration of experimental mechanisms. Based on the analysis of gene signal pathway enrichment of differentially expressed genes based on the KEGG database, the most significant differences in functions were ranked and analyzed. Both the P value and the false discovery rate (FDRs) ≤ 0.05 are considered to be statistically significant. By searching KEGG and BioCarta database, Fisher fine testing calculated the enrichment significance of differentially expressed genes in each signaling pathway, in order to evaluate the significantly influenced signaling pathway (a = 0.05).

Sample total RNA quality inspection
Through total RNA quality testing, the total RNA of 55 out of 66 samples were qualified for follow-up study. 20 samples in Han patients and 20 in Mongolian patients were randomly selected. 12 male and 8 female Han patients and 11 male and 9 female Mongolian patients were selected. Among Han samples, 6 were welldifferentiated, 10 were moderately differentiated and 4 were poorly differentiated; among Mongolian patients, 12 were well-differentiated, and 8 were moderately differentiated.
Differentially expressed gene test results and quality assess ment (1) After noise reduction, 36,866 out of 49,395 probes included in the chip are selected for subsequent analysis.
(2) After analysis, a total of 970 genes were differentially expressed in the above 36,866 probes, and the differential expression rate was 2.69%. Compared with the Han and Mongolian patients, the differentially expressed genes in the pancreatic cancer samples showed 257 genes were significantly up-regulated. A total of 713 genes were significantly down-regulated.
(3) Data quality evaluation: The signal intensity distribution curve of each sample microarray probe fitted well, confirming the reliability of data obtained from microarray analysis.
The relative logarithm signal intensity distributions of each probe of the box plot are close to each other, confirming the repeatability of the data.
Analysis of significant differences in sample gene expression (1) The scatter plot is shown in Fig. 1. The ordinate represents the signal intensity of specimen probes from Mongolian patients with pancreatic cancer, and the abscissa represents the signal intensity of specimen probes from Han patients with pancreatic cancer. The points in the figure represent the strength of a probe in Han and Mongolian specimens. The dots within the interval between the green lines represent genes that are not significantly differently expressed. The extra-line regions are genes that are significantly differently expressed, among them the red dots indicate genes that are up-regulated, and the green dots are genes that are down-regulated. (2) Figure 2 is a volcano plot drawn by the software based on the gene expression differential multiples and the P value of the significance test. The ordinate is the P value, and the abscissa is the logarithmic transformed difference multiplier value (base = 2). The red points indicate differentially expressed genes that meet the above significant differential expression screening conditions. (3) Figure 3 is a hotspot plot of hierarchical cluster analysis showing the gene expression profiles of the two groups with significant expression level differences between the two samples. The columns represent samples and the rows represent differentially expressed genes. The results show that most of the samples in the same ethnic group have similar differential gene expression profiles; according to the left-hand dendrogram in the figure, some genes have similar expression patterns. These genes may have similar functions or participate in the same biological process. Genes with the highest differential expression times were named as a gene cluster. For example, the RASA2 gene cluster includes BIRC2, RASA2, ADAM17, RECQL, LYAR, and SDHD genes, of which the differential expression of RASA2 is the highest (2.36 times).

Differentially expressed genes bioinformatics analysis
(1) GO analysis: Based on the results of the differentially-expressed gene screening, 793 differentially expressed genes were retrieved from the Gene Ontology database. Among the three major categories of GO analysis, the BP classification contains a total of 597 differentially expressed genes. In MF classification, a total of 613 genes were included. The CC classification contained a The results showed significant differences between the two groups of samples. Among the expressed genes, the genes involved in biological processes mainly encode proteins that control the body's immune system, immune response, cell signal transduction, cell response to stimuli, and multiple feedback regulation; the molecular function-related genes mainly code for proteins related to endopeptidase activity, binding capacity to the same protein, enzyme regulatory activity, cytoskeletal protein binding ability, specific binding ability to protein domains, and protein molecule function regulation; the genes associated with cellular components are mainly involved in the plasma membrane structure, vacuoles, endoplasmic reticulum, cell gap junction, extracellular matrix and other structural proteins. (2) Pathway analysis: After searching KEGG and BioCarta databases, a total of 28 cellular signaling pathways were differentially expressed between the two groups of samples, involving 178 genes, as shown in Fig. 5; Tables 2, 3, and 4 are a detailed list of differentially expressed genes contained in the three strongest enriched pathways (Focal adhesion, Pathways in cancer and Regulation of actin cytoskeleton respectively).

Real-time PCR verification
Results of real-time quantitative PCR detection of differentially expressed genes listed in Fig. 6, the relative , as can be seen from the figure, in addition to CLPS, the expression level of differential genes in Mongolian pancreatic cancer tissues was significantly higher than that in Han pancreatic cancer tissues. There was no significant difference in differential gene expression in pancreatic cancer adjacent tissues between Mongolian and Han patients.

Discussion
The occurrence and development of pancreatic cancer is affected by many factors. A large number of studies have shown that the incidence of pancreatic cancer in different regions, races and even ethnic groups is significantly different. For example, Wormann SM [5] pointed out in his report that ethnic differences are one of the high-risk factors for the onset of pancreatic cancer; Investigation by Ma J [6] showed that from 1970 to 2009, the trend of the change in the mortality rate of pancreatic cancer in white and black people in the United States is diametrically opposite, suggesting that the prognosis of pancreatic cancer may be diverse between people with different genetic background [7]. At present, there are few studies on pancreatic cancer related to Mongolian population at home and abroad. There is no strong evidence to confirm that there is a significant difference in the  prevalence and prognosis of pancreatic cancer between Mongolian and Han or other ethnic minorities [8].
In this study, Affymetrix gene expression microarray was used to detect and analyze differentially expressed genes and their biological information in surgically resected pancreatic cancer tissue from Mongolian and Han patients. From the molecular level we explored the potential difference in the development of pancreatic cancer in different ethnic groups. This provides a reliable reference for further elucidating the generation and maintenance of malignant biological characteristics of pancreatic cancer.
In this study, through screening differentially expressed gene in Mongolian and Han pancreatic cancer tissue samples, we found that there were 1034 genes with significant different expression level between the two groups of samples, accounting for 2.69% of the total number of detected genes. Compared to the Han patients, 257 genes were significantly up-regulated and 712 were down regulated in the Mongolian patients. According to gene ontology analysis, a total of 793 genes in the differentially expressed genes identified above were fully documented in the database. These differentially expressed genes were closely associated with the proliferation and differentiation of pancreatic cancer cells, invasion, metastasis and multidrug resistance according to the annotation of the gene biological function in the database. The highest score for differential expression and significant difference in genes associated with biological processes was the PLA2G1B (Phospholipase A2, group IB) gene, which was significantly down-regulated (9.26 times) in Mongolian pancreatic cancer tissue samples. The protein encoded by the PLA2G1B gene is phospholipase A2, which plays a key role in membrane channel activation, information transmission, hemodynamics, and pathophysiology during pancreatic inflammation and after tissue injury [3,9]. In addition, the study by Abbenhardt C et al. [10] showed that the single nucleotide polymorphism of PLA2G1B gene is closely related to the susceptibility of rectal cancer. However, there are few reports on the association between this gene and pancreatic cancer at home and abroad. Carboxypeptidase-1 (CPB1) gene has the highest score for the differential expression and significance among all the genes related to molecular function. Compared with the Han nationality, this gene was 16.88 times downregulated in Mongolian pancreatic cancer tissues. The CPB1 gene mainly encodes pancreatic carboxypeptidase, an important serum marker for pancreatic dysfunction [4,11]. In recent years, there have been relatively few researches on the association between CPB1 gene and malignant tumors. Jin et al. [12] showed that CPB1 has a noticeable abnormal expression in some breast cancer patients. The study of Bouchard P et al. [13] also suggested that CPB1 may be related to the lymph node metastasis of breast cancer. The CPB1 gene may be related to the structural components in extracellular matrix of the pancreatic cancer cell. The degree of enrichment of differential genes in GO classifications can, to a certain extent, reflect the degree of differences between Mongolian and Han pancreatic cancers in the above biological classifications, but using these results alone to evaluate the differences in the biological characteristics between Mongolian and Han pancreatic cancers could be inaccurate. The distribution of the above classification in pancreatic cancer-related genes in all populations needs to be considered. Pathway analysis is currently the most commonly used method for the analysis of gene expression microarray differential gene bioinformatics. It retrieves detailed information on biological signal transduction pathways in the two authoritative databases KEGG and BioCarta, using the pathway as a unit and using all genes included as background, to analyze and calculate the significance levels of differentially expressed genes enriched in each pathway, thereby identifying the metabolic and signal transduction pathways that were significantly affected, and clarifying the molecular regulatory mechanisms that underlie the biological function of genes. Through Pathway analysis, we detected a total of 28 signal pathways differentially expressed between Mongolian and Han pancreatic cancer tissue samples, involving a total of 178 genes, among which the FAK (Focal Adhesion Kinase) pathway was differentially expressed at the highest degree of gene enrichment. There were 37 members with significant expression differences, of which 34 were upregulated and 3 were down-regulated. The COL11A1 gene obtained the highest score for differential expression and significant difference test. This gene was 5-time up-regulated in Mongolian pancreatic cancer tissues.
The FAK signaling pathway can integrate multiple extracellular signals to regulate the expression of downstream molecules, thereby controlling the proliferation and apoptosis of cells. This pathway is cascaded with multiple signal transduction pathways in the body and is the central link of intracellular and extracellular signal transduction [14,15].. A large number of studies have confirmed that the abnormal activation of FAK signaling pathway is closely related to the occurrence and development of various malignant tumors [16][17][18]. In the study of pancreatic cancer, Gao Z et al. [19] reported that the overexpression of SRPX2 (Sushi repeat-containing protein, X-linked 2), which is dependent on the phosphorylation level of FAK, is closely related to the local invasion and distant metastasis of pancreatic cancer. The study by Hsieh YJ et al. [20] also showed that the FAK signaling pathway plays a crucial role in promoting the invasion and metastasis of pancreatic cancer cells by the newly discovered ubiquitin hydrolase family member USP22. Dao P et al. [21] pointed out in their report that intrinsic acquired drug resistance is the key factor for the clinical efficacy of a tumor necrosis factor-related apoptosis-inducing ligand (TRAIL), which is a class of potential anti-cancer drugs and is currently in the clinical research phase. Moreover, the newly discovered FAK inhibitor PH11 can induce rapid apoptosis of TRAILresistant PANC-1 cells, indicating that excessive activation of FAK signaling pathway may be related to multi-drug resistance of pancreatic cancer.

Conclusion
The occurrence and development of pancreatic cancer is a complex process involving multiple factors. A large number of evidence-based medical evidence indicates that there is a significant difference in the susceptibility and incidence of pancreatic cancer in different populations. In this study, gene expression profile microarray analysis was used to screen out significant differentially expressed genes in the pancreatic cancer tissues of the Mongolian and Han populations. The functions and regulatory mechanisms of these genes were analyzed to provide a large number of genetic loci and reference data for further study on the molecular mechanisms of the generation and maintenance of malignant biological characteristics of pancreatic cancer. However, due to the limited samples of pancreatic cancer, the results are limited. There is still a need for multi-center and large sample family studies to further clarify the characteristics of the development and molecular biology of Mongolian pancreatic cancer, to design more targeted and individualized prevention and control measures, and to further promote the accurate medical treatment of patients of various ethnic group with pancreatic cancers.