Heterogeneity revealed through meta-analysis might link geographical differences with nasopharyngeal carcinoma incidence in Han Chinese populations

Background Nasopharyngeal carcinoma (NPC) is an epithelial malignancy highly prevalent in southern China, and incidence rates among Han Chinese people vary according to geographic region. Recently, three independent genome-wide association studies (GWASs) confirmed that HLA-A is the main risk gene for NPC. However, the results of studies conducted in regions with dissimilar incidence rates contradicted the claims that HLA-A is the sole risk gene and that the association of rs29232 is independent of the HLA-A effect in the chromosome 6p21.3 region. Methods We performed a meta-analysis, selecting five single-nucleotide polymorphisms (SNPs) in chromosome 6p21.3 mapped in three published GWASs and four case–control studies. The studies involved 8994 patients with NPC and 11,157 healthy controls, all of whom were Han Chinese. Results The rs2517713 SNP located downstream of HLA-A was significantly associated with NPC (P = 1.08 × 10−91, odds ratio [OR] = 0.58, 95 % confidence interval [CI] = 0.55–0.61). The rs29232 SNP exhibited a moderate level of heterogeneity (I2 = 47 %) that disappeared (I2 = 0 %) after stratification by moderate- and high-incidence NPC regions. Conclusions Our results suggested that the HLA-A gene is strongly associated with NPC risk. In addition, the heterogeneity revealed by the meta-analysis of rs29232 might be associated with regional differences in NPC incidence among Han Chinese people. The higher OR of rs29232 and the fact that rs29232 was independent of the HLA-A effect in the moderate-incidence population suggested that rs29232 might have greater relevance to NPC incidence in a moderate-incidence population than in a high-incidence population. Electronic supplementary material The online version of this article (doi:10.1186/s12885-015-1607-0) contains supplementary material, which is available to authorized users.


Background
Nasopharyngeal carcinoma (NPC), a malignancy that forms in the epithelium of the nasopharynx, has a distinct geographic distribution and is highly prevalent in southern China, Southeast Asia, and North Africa. Although all Han Chinese populations exhibit an increased risk of NPC, the incidence rate varies by region. For example, male populations in Guangdong and Guangxi in southern China have consistently exhibited a higher incidence rate (20.6-39.94/100 000 person-years) compared with those in moderate-incidence regions, such as Taiwan (8.6/100 000 person-years), and those in most of the Western world (less than 1/100 000 personyears) [1][2][3][4][5]. The etiology of NPC is multifactorial, involving genetic components, Epstein-Barr virus infection, and other types of environmental exposure [1]. The variations in NPC incidence might be due to differences in environmental exposure among geographic regions; however, the genetic components underlying the differences in incidence in Han Chinese populations remain underexplored.
The genetic association of human leukocyte antigen (HLA) class I genes, particularly HLA-A, with NPC was established in 1974 [6] and has been confirmed in more than 100 association studies adopting traditional HLA genotyping techniques. Studies have consistently identified an association between NPC and HLA-A*1101, HLA-A*0207, and HLA-B*5801 [7][8][9]. The distribution of these three alleles in the human genome appears to be consistent with the geographical distribution of NPC incidence in southeastern China; the allele frequency is particularly high in regions with high NPC incidence rates [10][11][12]. However, no difference in HLA allele frequency has been observed in the results of NPC association studies conducted in regions with various incidence rates [13][14][15], suggesting that the HLA genes might not directly lead to differences in NPC incidence.
Three independent genome-wide association studies (GWASs) [16][17][18] have identified multiple significant association signals in chromosome 6p21.3 near the HLA-A gene, which exhibited extremely strong linkage disequilibrium (LD) (Fig. 1). These observations raised the question as to whether these associated single-nucleotide polymorphisms (SNPs) represent an independent effect or are only proxies of the HLA-A gene. Studies conducted in medium-and high-incidence regions (Taiwan [16,19], and Guangdong and Guangxi [17,18], respectively) have yielded contradictory conclusions.
To evaluate the effect of HLA-A and its neighboring gene on NPC susceptibility, we conducted a metaanalysis on the association between the five most frequently studied chromosome 6p21.3 SNPs (rs9260734, rs2517713, rs3129055, rs29232, and rs29230) and NPC susceptibility. Thus far, no meta-analysis has been conducted to explore the overall NPC risk and the genetic heterogeneity associated with chromosome 6p21 SNPs. In the current study, we postulated that moderate heterogeneity in rs29232 might contribute to the regional variation in NPC incidence rates.

Identification and eligibility of relevant studies
We reviewed the literature on PubMed for all relevant reports (the most recent search update was December 10, 2014), using the search terms "NPC," "association," and "HLA," and limiting the results to English-language papers. In this meta-analysis, studies had to fulfill the following criteria: 1) evaluate the correlation between SNPs mapped by GWASs in the HLA-A region and NPC in Han Chinese populations, 2) use a case-control design, and 3) report the genotype frequency for both cases and controls and/or odds ratios (ORs).

Description of studies
The meta-analysis was based on summary data reported by three previous GWASs on NPC [16][17][18] and four follow-up case-control studies [19][20][21][22] focusing on the HLA class I region. We extracted data according to the aforementioned inclusion criteria. The following data was collected from each study: 1) first author's name, 2) year of publication, 3) sample collection area, 4) genotyping platform, 5) SNPs assessed, 6) number of cases and controls, 7) sex ratio, and 8) age range (Table 1). Twelve SNPs discovered by Tse et al. [16] were used as major targets for analysis; the citation of these SNPs was standardized using rs numbers, as suggested by Tse et al. For rs2517713, two SNPs (rs2860580 and rs9260475) acted as surrogates according to the strong LD  [16]; Green: Bei [17]; Blue: Tang [18]. The solid triangles indicate the SNPs used in the meta-analysis. The hollow triangles indicate the other SNPs listed in the GWASs. Bottom: Detailed LD structure depicted in HaploView by using control samples from the NPC GWAS in Taiwan [16]. The increasing intensities of red represent lower D' values relationship between the target and surrogate SNPs in the HapMap Han Chinese in Beijing (CHB) population (rs2860580-rs2517713: D' = 1, r 2 = 0.90; rs9260475-rs2517713: D' = 1, r 2 = 0.91). When available, the following information was obtained from the studies and included in the meta-analysis: 1) minor allele frequency (MAF) in the case and control samples, 2) P values for the original association, and 3) ORs and 95 % confidence intervals (CIs). When the target SNPs were genotyped in both the discovery and validation groups, the combined genotyping data was used.

Assessment of publication bias
Typical publication bias is a result of small sample sizes.
Although the studies included in the meta-analysis had an adequate number of cases, the sample sizes were small compared with those of GWASs on other types of cancer. Furthermore, a low level of population admixture in a large study can cause publication bias. Therefore, potential publication bias was assessed using funnel and P-M plots [23,24]. Funnel plotting and Egger's linear regression test were performed using the Metafor package [23] in R [25], Version 3.0.2. When publication bias occurred, the funnel plot was noticeably asymmetric. Egger's linear regression test was used to test the funnel-plot symmetry. The M values of P-M plots represented the posterior probability that an effect existed in each study. A low M value (<0.1) suggested that the study had no effect, and therefore, such studies were excluded from further analysis [24].

Meta-analysis
All meta-analysis results presented in this report were calculated using the Metasoft software package, Version 2.0.1 [26]. The P-M plot, forest plot, and funnel plot were plotted using Metafor [27]. To evaluate the association between 6p21.3 SNPs and the risk of NPC, we calculated the pooled ORs and associated 95 % CIs. Standard meta-analysis involving the fixed effects model and conventional random effects model was conducted using the standard error and effect size reported in each study [24,27]. If the target SNPs were genotyped in both discovery and validation stages (Table 1), the combined data was used, otherwise only stage I data was used for meta-analysis. The fixed effects model made a conditional inference on the heterogeneity among the true effects, whereas the conventional random effects model treated the heterogeneity as purely random. Sensitivity analysis was conducted to assess the potential influences of any single study on the pooled ORs. In each metaanalysis, included studies were individually removed to ensure that no study significantly altered the pooled ORs and associated P values. Power analysis was conducted using the Power and Sample Size Calculation software Version 3.1.2 [28,29].

Test of heterogeneity
The heterogeneity effect was quantified using the I 2 test [30]. The I 2 values ranged from 0 to 100 %, and values of 25, 50, and 75 % were considered to represent low, moderate, and high levels of heterogeneity, respectively. Heterogeneity was estimated using Cochran's Q statistic, and P < 0.1 was considered to indicate significant heterogeneity [31].

Study selection
The primary search yielded 59 articles, of which seven were identified as potentially relevant, following a review of the title and abstract. In total, seven studies were included in the meta-analysis after a full-text review (Additional file 1: Figure S1). The overall study population in the current meta-analysis comprised 20,151 subjects, of which 8994 were patients with NPC and 11,157 were controls. We analyzed seven studies that examined Han Chinese subjects from Taiwan [16,19], Guangdong [17,18,[20][21][22], and Guangxi [18,22] (Table 1) and assessed five chromosome 6p21.  [21], represented rs2517713 because the LD (D' = 1, r 2 > 0.9) between the target and surrogate SNPs in the HapMap CHB population was extremely high [32].

Publication bias and synthesis of results
We used P-M and funnel plots to assess the publication bias of the included studies, detecting no evidence of potential publication bias in our target SNPs (Additional file 1: Figure S2). As expected, rs2517713 of the HLA-A gene was the most significantly associated with NPC (OR = 0.58, 95 % CI = 0.55-0.61, P = 1.08 × 10 −91 ) ( Table 2). Analysis of three of the five SNPs did not reveal heterogeneity (I 2 = 0). Although we used surrogate SNPs (rs2860580 and rs9260475) in the meta-analysis of rs2517713, we did not observe publication bias (Additional file 1: Figure S2b) or heterogeneity (I 2 = 0, Table 2), suggesting that SNPs with high LD (D' = 1, r 2 > 0.9) could be treated as the same SNP in the meta-analysis. We observed a moderate level of heterogeneity in rs3129055 (I 2 = 58 %, P = 0.0361) and rs29232 (I 2 = 47 %, P = 0.1091); however, the heterogeneity for rs29232 was not statistically significant (P > 0.1). Because Cochran's Q statistic is severely underpowered in analyses with only four to five studies, heterogeneity might still exist despite a lack of nominal statistical significance [33]. Two SNPs that exhibited heterogeneity (rs3129055 and rs29232) were the same SNPs independent from the HLA-A effect [16]. The random effect of rs29232 exhibited a

Sensitivity analysis
To investigate the potential influence of a single study on the overall meta-analysis estimation, we omitted one study at a time. Similar results were obtained for the three SNPs that did not exhibit heterogeneity (rs9260734, rs2517713, and rs29230) regardless of any study being omitted (Additional file 1: Table S1), indicating that our results were supported by reliable data. The ORs and I 2 of the two SNPs exhibiting heterogeneity (rs3129055 and rs29232) calculated using the sensitivity test differed. Power analysis of each target SNP in the total sample size involved the following assumptions: two-tailed α = 0.05 and the frequency of minor alleles in control samples. According to the ORs obtained through meta-analysis, the present sample size showed a power of 1.00 for detecting a significant association.

Subgroup analysis
We stratified the studied population on the basis of region, namely the moderate-incidence region (Taiwan) [16,19] and high-incidence regions (Guangdong and Guangxi) [17,18,22] (Table 3). The heterogeneity decreased markedly from 47 to 0 % in both the moderateand high-incidence regions, suggesting that the heterogeneity of rs29232 was attributable to the geographical difference in NPC incidence.

Discussion and conclusions
In the present meta-analysis, four of the five chromosome 6p21.3 polymorphisms exhibited strong and consistent positive associations with NPC. The three SNPs (rs2517713, rs9260734, and rs29230) yielded highly consistent results with no heterogeneity, despite differences in the genotyping platform (Table 1) and the use of surrogate SNPs for rs2517713 ( Table 2). The strongest association was discovered in rs2517713 near the HLA-A gene (P = 1.08 × 10 −91 , OR = 0.58, 95 % CI = 0.55-0.61), further confirming the critical role of the HLA-A gene in the susceptibility of NPC.
NPC has a distinct ethnic and geographic distribution. The allele frequencies of NPC-associated alleles (HLA-A*1101, HLA-A*0207, and HLA-B*5801) in the human genome are consistent with the geographical distribution of NPC incidence in southeastern Asia [10][11][12], suggesting that the genetic differences among populations might play a crucial role in NPC incidence. However, in the NPC association studies, the frequency of HLA alleles did not differ noticeably in populations with dissimilar incidence rates, suggesting that HLA genes might not directly cause this difference. In the current meta-analysis, we did not observe heterogeneity in the HLA-A SNPs, which supports our hypothesis. By contrast, our analysis indicated that rs29232 exhibited distinct features in Han Chinese populations in regions with different incidence rates. First, the heterogeneity of rs29232 was markedly reduced when we stratified the meta-analysis according to the incidence region. Second, in the subgroup analysis, the OR of rs29232 was higher in moderate-incidence regions than in the high-incidence regions. Finally, in previous NPC association studies, rs29232 was independent of the HLA-A effect in moderateincidence regions, but not in high-incidence regions. These results suggest that rs29232 might contribute to the difference in NPC incidence in Han Chinese populations.
Studies conducted in regions with different incidence rates have yielded inconsistent results regarding rs29232. A GWAS conducted in Taiwan, a moderate-NPCincidence region, using multiple logistic regression analysis and stepwise logistic regression concluded that rs29232 was significantly associated with NPC, even after the removal of the HLA-A SNP (rs2517713) and sequence-based HLA-A alleles (HLA-A*0207/0215 N and HLA-A*110101/0121 N) [16]. Another independent post-GWAS case-control study conducted in Taiwan yielded similar results, indicating that HLA-A and rs29232 are likely to be independent risk factors for NPC, and that NPC risk is highest among people carrying homozygous HLA-A*0207 and rs29232 risk alleles [19]. By contrast, a GWAS conducted in Guangdong, a high-NPC-incidence region, revealed that the strength of the association with rs29232 greatly diminished after the researchers controlled for the effect of rs2860580 (HLA-A), whereas the strength of the association with rs2860580 (HLA-A) decreased after they adjusted for rs29232. Collectively, these results suggest that the associations with rs29232 and rs2860580 (HLA-A) were correlated rather than probabilistically independent [17]. In an NPC GWAS conducted in Guangdong and Guangxi, a similar multivariate logistic regression analysis indicated that rs29232-and GABBR1related SNPs were nonsignificant after adjustment for HLA-A-related SNPs and alleles. The authors of the study stated that all the other significant associations identified were only proxies for HLA-A*1101 because of the strong LD within the region [18]. To summarize, study findings from moderate-incidence regions support the independent role of rs29232, whereas those from high-incidence regions indicate that only one true association signal (HLA-A) exists within this chromosome region.
In the current study, no heterogeneity was observed among studies except in rs3129055 (I 2 = 58 %, P = 0.0361, HLA-F) and rs29232 (I 2 = 47 %, P = 0.1092, GABBR1). However, the heterogeneity in rs29232 was not statistically significant (P > 0.1). Because tests of heterogeneity are severely underpowered in analyses of only a few studies [33], heterogeneity might still exist despite a lack of statistical significance. Since the metaanalysis result for rs3129055 did not achieve genomewide significance (<10 −7 ), we excluded it from further analysis. The moderate level of heterogeneity (I 2 = 47) of rs29232 markedly decreased (I 2 = 0) when we stratified the study population according to geographic region ( Table 3), suggesting that regional differences in NPC incidence caused the heterogeneity. Furthermore, the ORs were higher in the moderate-incidence regions (OR = 1.70, 95 % CI = 1.47-1.95) than in the highincidence regions (OR = 1.40, 95 % CI = 1.30-1.49). Since rs29232 was a significant association signal independent of the effect of the HLA-A gene in the moderate-incidence regions, we suspected that rs29232 caused the regional difference in NPC incidence among Han Chinese people. Compared with high-incidence populations, moderate-incidence populations might be more strongly affected by rs29232.
Although the current evidence on the role of rs29232 and the GABBR1 gene seems self-contradictory, previous studies have provided consistent findings regarding the role of GABBR1 in NPC. GABBR1 encodes the protein gamma-aminobutyric acid B receptor 1 (GABBR1), a G protein-coupled receptor that forms a heterodimer with GABAB receptor 2, thereby triggering downstream signaling events involved in the proliferation, differentiation, and migration of cancer cells. One study reported that the intensity of the GABBR1 signal in tumor cells was significantly higher than that detected in adjacent normal epithelial cells (P < 0.001) in the immunohistochemical staining of NPC tissues [16].
The rs29232 SNP exhibited a lower LD (r 2 < 0.6) than did all other SNPs in the HapMap CHB data set in the MHC region. This finding suggests that the role of rs29232 might not directly relate to the downstream GABBR1 gene or other neighboring genes, but rather affect genes located in other regions or chromosomes through long-range cis-acting or trans-acting. Nevertheless a recent genome-wide SNP-SNP interaction analysis detected no significant interaction between rs29232 and other SNPs [34]. Further investigation of the role of rs29232 and its relationship with the incidence and etiology of NPC is vital. In explorations of the genetic factors underlying disease susceptibility, epidemiological factors, such as disease incidence rates, are rarely considered. In the current study, an association analysis of rs29232 in regions with high or moderate incidence rates was performed. Future studies conducted in lowincidence regions such as northern China may clarify the relationship between genetic factors and geographic incidence rates. In addition, using imputation or nextgeneration sequencing to explore the detailed genotypes near rs29232 can further reveal the genetic causes underlying the variation in regional NPC incidence rates. The current study suggested that genetic effects might be involved in epidemiological factors such as regional disease incidences. However, epidemiological factors have rarely been considered in large cross-region association studies. Thus, we suggest that future large crossregion meta-analyses include geographic incidence rates as potential confounding factors.
In this meta-analysis, the results indicated a strong effect of HLA-A on NPC susceptibility and a potential role of rs29232 in the regional differences in NPC incidence among Han Chinese people. However, this study had several limitations. First, although the conclusion was based on several previous studies, the heterogeneity for rs29232 observed in this study was nonsignificant. Second, although the number of subjects in the studies included in the analysis was high, the number of studies included was relatively low. Finally, meta-analyses constitute retrospective research and, thus, are subject to methodological limitations. To minimize potential bias, we used standard methods for study selection, data extraction, and data analysis. Nonetheless, the results presented here should be interpreted with caution until additional studies on rs29232 that control for incidence rates according to region are conducted.

Additional file
Additional file 1: Supplementary Table S1. Sensitivity analysis of meta-analysis. Supplementary Figure S1. PRISMA Flow chart of literature review and study selection. Supplementary Figure S2. Forest plots, funnel plots and P-M plots for NPC association. (PDF 1068 kb) Abbreviations NPC: Nasopharyngeal carcinoma; SNP: Single-nucleotide polymorphism; HLA: Human leukocyte antigen; GWAS: Genome-wide association study;