Skip to main content

Knowledge-based analyses reveal new candidate genes associated with risk of hepatitis B virus related hepatocellular carcinoma

Abstract

Background

Recent genome-wide association studies (GWASs) have suggested several susceptibility loci of hepatitis B virus (HBV)-related hepatocellular carcinoma (HCC) by statistical analysis at individual single-nucleotide polymorphisms (SNPs). However, these loci only explain a small fraction of HBV-related HCC heritability. In the present study, we aimed to identify additional susceptibility loci of HBV-related HCC using advanced knowledge-based analysis.

Methods

We performed knowledge-based analysis (including gene- and gene-set-based association tests) on variant-level association p-values from two existing GWASs of HBV-related HCC. Five different types of gene-sets were collected for the association analysis. A number of SNPs within the gene prioritized by the knowledge-based association tests were selected to replicate genetic associations in an independent sample of 965 cases and 923 controls.

Results

The gene-based association analysis detected four genes significantly or suggestively associated with HBV-related HCC risk: SLC39A8, GOLGA8M, SMIM31, and WHAMMP2. The gene-set-based association analysis prioritized two promising gene sets for HCC, cell cycle G1/S transition and NOTCH1 intracellular domain regulates transcription. Within the gene sets, three promising candidate genes (CDC45, NCOR1 and KAT2A) were further prioritized for HCC. Among genes of liver-specific expression, multiple genes previously implicated in HCC were also highlighted. However, probably due to small sample size, none of the genes prioritized by the knowledge-based association analyses were successfully replicated by variant-level association test in the independent sample.

Conclusions

This comprehensive knowledge-based association mining study suggested several promising genes and gene-sets associated with HBV-related HCC risks, which would facilitate follow-up functional studies on the pathogenic mechanism of HCC.

Peer Review reports

Background

Hepatocellular carcinoma (HCC) is one of the most common cancers worldwide. With 750,000 new HCC cases diagnosed each year, it is the third leading cause of cancer mortality [1]. As many as 30% of patients diagnosed with hepatitis, fibrosis or cirrhosis ultimately develop HCC. In high endemic areas such as Africa and Asia, at least 60% of HCC is associated with hepatitis B virus (HBV) [2]. However, only a minority of HBV carriers develops HCC. HBV carriers with a family history of HCC were estimated to have over two-fold risk for HCC compared with those without a family history of HCC [3]. Furthermore, genetic complex segregation analysis suggested that major genes may be involved in the genetic predisposition to develop HCC at an earlier age [4].

Genome-wide association study (GWAS) is a widely used strategy for identifying risk loci of complex diseases. Recently, several GWASs on risk of HBV-related HCC were conducted using single-nucleotide polymorphisms (SNPs)-based statistical association tests. Multiple susceptibility loci were identified, including rs17401966 in intron 24 of KIF1B at 1p36.22, rs7574865 in intron 3 of STAT4 at 2q32.2–32.3, rs9275319 between HLA-DQB1 and HLA-DQA2 at 6p21.3, rs9272105 between HLA-DQA1 and HLA-DRB1 at 6p21.3, and rs455804 in intron 1 of GRIK1 at 21q21.3 [5,6,7]. However, these susceptibility loci account for only a small fraction of the contribution of genetics to HBV-related HCC. Identifying additional genetic alterations associated with HBV-related HCC may be difficult due to the relatively weak effects of many individual risk SNPs, which may be unidentifiable with the currently available but relatively small sample sizes [8]. SNP-based statistical association tests alone in GWAS do not have enough power to discover most risk loci for human complex diseases. Gene- and biological pathway-based association analysis has been proposed to enhance statistical power compared with conventional statistical tests, as the former can relieve multiple testing and enrich signals [9]. Moreover, gene- and biological pathway-based analysis also lends itself to introducing more disease-specific knowledge into the analysis.

In the present study, we performed a series of knowledge-based analyses (including gene- and gene-set-based association tests) on variant-level association p-values from two in-house GWASs of HBV-related HCC. SNPs within genes prioritized by the knowledge-based analyses were selected for replication in two independent HBV-related HCC case/control samples.

Methods

Two existing GWASs on HBV-related HCC

The association p-values were obtained from two previous GWASs on HBV-related HCC in Chinese populations for meta-analysis and knowledge-based association analysis. One study [7] contained 2689 chronic HBV carriers (1212 HBV-related HCC cases and 1477 controls) recruited from May 2006 to December 2012 by the Qidong Liver Cancer Institute in Jiangsu Province of Mainland China. The other study [10] consisted of 95 HBV-infected HCC patients (cases) and 97 HBV-infected patients without HCC (controls) recruited at Queen Mary Hospital, Hong Kong. The sample inclusion and exclusion criteria were described in the original papers [7, 10].

Subjects in replication studies

The subjects in replication, including 965 chronic HBV carriers with HCC as cases and 923 chronic HBV carriers without HCC as controls, were recruited from the affiliated hospitals of the Second Military Medical University, Shanghai, China. All the samples are of Han Chinese descent and have participated in previously published studies [7, 11]. The inclusion and exclusion criteria for all the subjects have been previously described [7, 11]. Briefly, all the subjects were negative for antibodies to hepatitis C virus, or human immunodeficiency virus; and had no other types of liver disease, such as autoimmune hepatitis, toxic hepatitis, and primary biliary cirrhosis. All the controls were chronic HBV carriers and had, by self-report, no history of HCC or other cancers. Chronic HBV carriers were defined as positive for both hepatitis B surface antigen and antibody immunoglobulin G to hepatitis B core antigen for at least 6 months. All the cases were chronic HBV carriers and diagnosed as HCC patients. The diagnosis of HCC was based on a) positive findings on cytological or pathological examination and/or b) positive images on angiogram, ultrasonography, computed tomography and/or magnetic resonance imaging, combined with an Alpha-fetoprotein level ≥ 400 ng/ml. All the cases were confirmed to not have other cancers by an initial screening. The mean (standard deviation) ages of the cases and controls were 50.8 (±12.2) years and 52.9 (±11.2) years, respectively. The male to female ratio were 5.3 in cases and 1.6 in controls, respectively.

The study was performed in accordance with guidelines approved by the local ethical committees from all participating centers involved in both the GWAS stage and the replication stage. A written informed consent to participate in the study was obtained from each subject in accordance with the declaration of Helsinki principles. All study participants approved the storage of their frozen DNA specimens, for research purposes, in our laboratory.

Genotyping and quality control in replication

Genomic DNA from the peripheral blood of all participants in replication was extracted using the QIAamp DNA Blood Mini Kit (QIAGEN GmbH, Hilden, Germany). Genotyping analyses for replication samples were conducted using the Sequenom MassArray system (Sequenom) according to the manufacturer’s instructions. Genotyping quality was examined by a detailed QC procedure consisting of a 95% successful call rate, duplicate calling of genotypes, and internal positive control samples and two water samples (PCR negative controls) included in each 96-well plate. Genotype analysis was performed by technicians in a blind fashion.

Meta-analysis of variants

The association p-values of untyped SNPs were imputed directly by the tool FAPI (http://grass.cgs.hku.hk/limx/fapi/) [12] with default settings. The p-values of the two GWASs were then combined by Stouffer’s Z-score method for meta-analysis on FAPI as well:

\( {Z}_{meta}=\frac{\sum_{i=1}^N\left({w}_i\ast {z}_i\right)}{\sqrt{\sum_{i=1}^N{w}^2}} \) where \( {w}_i=\sqrt{n_i} \)

in which N is the number of GWASs, zi is the individual z-score of the ith GWAS study, and ni is the sample size of the ith study.

Gene-based and gene-set-based analysis

The knowledge-based secondary analysis platform KGG Version 4.0 (http://grass.cgs.hku.hk/limx/kgg/) was used to map the SNPs onto reference genes (UCSC RefGene hg19), and to perform gene-based and gene-set-based association analysis with default settings. Two types of gene-based association tests, GATES [13] and ECS [14], were employed for the analysis which combined SNP-level association signal according to the best significance and accumulated significance respectively. In addition, LDRT [15] was adopted for gene-set-based association analysis. The phased genotypes of Eastern Asian samples in the 1000 Genomes Project [16] were used to account for linkage disequilibrium of SNPs through KGG. The Benjamini-Hochberg approach was used to control false discovery rate (FDR) of genome-wide genes or genes within gene-sets, which is a more powerful multiple testing approach than Bonferroni correction when there are multiple susceptibility genes.

Variants functional annotation

The genomic annotation tools, HaploReg v4.1 (http://www.broadinstitute.org/mammals/haploreg/haploreg.php) [17] and RegulomeDB Version 1.1 (http://regulomedb.org/) [18], were used to annotate SNPs with epigenomic markers and potential regulatory elements, including regions of DNase I hypersensitivity, binding sites for transcription factors (TFs), promoter regions that have been biochemically characterized to regulate transcription, chromatin states as well as DNase foot printing, PWMs, and DNA Methylation. KGGSeq (Version 1.0) [19, 20] was used to annotate selected SNP with four regulatory or functional prediction scores (including CADD.CScore [21], SuRFR [22], FunSeq2 [23] and cepip [24]).

Results

We first combined the association p-values of variants by meta-analysis from two independent GWASs. Association analyses at genes and multiple functional gene-sets were carried to prioritize potential HBV-related HCC susceptibility genes. A series of prioritized variants were selected from the knowledge-based association analyses to replicate their genetic associations in a group of independent case-control samples. The overall workflow is shown in Fig. 1.

Fig. 1
figure1

Knowledge-based prioritization framework of SNPs’ statistical p-values for association with HCC

Genome-wide meta-analysis of two HBV-related HCC GWASs in Chinese populations

Association p-values were imputed based on the linkage disequilibrium (LD) pattern in the Eastern Asian Panel from the 1000 Genomes Project. A genome-wide meta-analysis was then performed with SNP p-values from two existing Chinese HCC GWASs using the tool FAPI [12]. After quality control (QC), 5,375,073 meta-analysis p-values of SNPs were obtained. The Manhattan plot and QQ plots of p-values are shown in Supplementary Figure 1 and Supplementary Figure 2, respectively. At the upper tail of the QQ plot, there is a deviation from the 95% confidence level of the non-hypothesis line, suggesting the existence of association signals at some SNPs. The small proportion of significant signals was consistent with the estimated low heritability in the samples by GCTA, 0.063 (±0.028) on the underlying liability scale [25].

Gene-based association analysis

We then used the meta-analysis p-values for gene-based association analysis by GATES [13] and ECS [14] on KGG (version 4.0) [26]. In addition to SNPs within the untranslated regions, introns and exons, the meta-analysis p-values of SNPs within 5 kb upstream and downstream of a gene were also included in the gene-based association test by GATES and ECS. SNPs in overlapping regions of multiple genes were assigned to all involved genes. The QQ plots of gene-based p-values are shown in Fig. 2.

Fig. 2
figure2

Quantile-quantile plot of gene-based p-values and SNP-based p-values a) the p-values produced by GATES b) the p-values produced by ECS

According to the gene-based p-values by GATES, two genes, SLC39A8 and GOLGA8M passed the multiple-testing correction by FDR, 0.05 (Table 1). In addition, two genes, SMIM31 and WHAMMP2, had nearly significant q-values (< 0.06 by GATES) on the genome (Table 1). Interestingly, SMIM31, encoding small integral membrane protein 31, was annotated as a long noncoding RNA gene (LINC01207) previously. We further annotated the pseudogene, WHAMMP2, with known regulatory elements and epigenomic markers by the UCSC genome browser (http://genome.ucsc.edu). Although it is annotated as a pseudogene, there are multiple regulatory factors binding sites and epigenomic markers in WHAMMP2 (See Supplementary Figure 3). These annotations imply that this gene is also functionally active despite its non-protein-coding function. The other gene-based test, ECS, detected no significant gene. The gene with smallest p-value (7.5E-06) is RNF157-AS1.

Table 1 The top 5 genes according to gene-based p-values by GATES and ECS, respectively

Prioritization of genes in different gene-sets

To select more promising candidate genes for replication in independent samples, we resorted to a series of gene-set resources to prioritize genes with suggestive association p-values. We first examined the association with HCC in 1057 canonical pathways curated in the Molecular Signatures Database (MSigDB V 4.0), after removing the pathways containing too few (< 5) or too many (> 300) genes. The gene-set-based association p-value was performed by LDRT [15] on KGG. Although no gene-sets passed multiple testing (FDR q < 0.05), several promising functional gene sets are prioritized. The top two gene sets according to the p-value are the cell cycle G1/S transition (p = 5.5E-4) and the NOTCH1 intracellular domain regulates transcription (p = 7.1E-4). In the G1/S transition gene set, 12 out 99 genes had gene-based association (p < 0.05, See details in Supplementary Excel Table 1). The gene with the smallest p-value is CDC45 (p = 1.1E-4) in this gene set. In the gene set of NOTCH1 intracellular domain regulates transcription, 10 out 40 genes had gene-based association (p < 0.05, See details in Supplementary Excel Table 1). In the set, NCOR1 had the smallest p-value (p = 5.8E-3). The second gene, KAT2A, had similar p-value (6.6E-3).

Then, we investigated whether the genes highly and specifically expressed in human liver were associated with HCC. In the database, Tissue-specific Gene Expression and Regulation (TiGER, http://bioinfo.wilmer.jhu.edu/tiger/), 309 genes preferentially expressed in liver were retrieved. In the human proteome atlas (http://www.proteinatlas.org/humanproteome), 433 genes showing elevated expression of proteins in liver compared to other tissue types were retrieved as well. To reduce potential false positives, we only used overlapping genes in the two sets. As a result, a total of 189 genes were obtained. Three genes (PAH, UGT2B10 and UROC1) had the FDR q values < 0.1 by ECS while GATES did not detect any significant gene (See the genes and p-values in Table 2 and Supplementary Table 1).

Table 2 Genetic association p-values of genes preferentially expressed in liver

We also examined the association of recurrent integrated genes by HBV reported in previous studies [27,28,29,30], the genes reported to be genetically associated with HBV-related HCC risk in previous studies, and HCC risk genes defined by COSMIC database (http://cancer.sanger.ac.uk/cosmic). However, none of the genes had a promising association p-value with HCC in our samples (see the genes and p-values in Supplementary Tables 2, 3 and 4).

Replication study in independent samples

We replicated genetic association at genes prioritized by the above gene-based and gene-set-based associations in a group of independent HBV-related HCC case-control samples. Due to budget limit, only 21 SNPs were selected for the replication. The SNPs were at prioritized genes according to consistency of their allele frequencies in ancestry matched reference panel in the 1000 Genomes Project and HapMap Project, and/or their predicted functional importance by RegulomeDB (http://regulomedb.org/) with regulatory elements (See examples in Supplementary Figures 3 and 4). After the genotype quality assessment, two SNPs were excluded because they failed to pass the Hardy-Weinberg equilibrium test (p < 0.001).

Three genetic models (additive, dominant and recessive) were considered under a logistic regression framework in which the HCC status was adjusted for sex and age. None of the 19 SNPs survived the multiple Bonferroni correction for family-wise error rate 0.05. Only two SNPs, rs17343667 and rs389883, had a nominal p-value below 0.05. The rs17343667, which is located in the first intron of EIF2AK1, had an association p-value equal to 0.02 under the dominant model with an odds ratio of 1.27 for the minor allele, which was found to have a risk effect in both original Qidong and Hong Kong GWAS samples (Table 3). However, its p-value was only 0.15 under the additive model. The rs389883, which is in intron region of STK19, had p-values of 0.026 and 0.032 for HCC association under additive and recessive models, respectively, with a protective effect at the minor allele G. However, in the original Qidong GWAS sample and Hong Kong GWAS sample, G was estimated to have a risk effect. Therefore, the SNP-level replication was generally negative.

Table 3 Summary of genetic association results in the replication

Discussion

This study utilized knowledge-based approaches to mine new susceptibility loci of HBV-related HCC in existing HBV-related HCC GWAS data sets. The gene-based association analysis suggested four suggestively significant genes including SLC39A8, GOLGA8M, SMIM31and WHAMMP2. The gene-set-based association analysis prioritized three top genes (CDC45, NCOR1 and KAT2A), which have been implicated with HCC previously, mainly through regulated expression. In addition, three genes, PAH, UGT2B10 and UROC1 were also highlighted when multiple-testing correction (FDR q < 0.1) was performed among genes highly and specifically expressed in human liver. However, probably due to small small sample size, no associations prioritized by the knowledge-based association analysis were successfully replicated in an independent sample. The rs17343667 of EIF2AK1 is the only one with suggestive significance. Furthermore, our analysis also suggested that the germline susceptibility loci of HBV-related HCC are unlikely to be enriched in recurrent targeted genes of HBV infection, or HCC risk genes with many somatic mutations.

According to our estimation, HCC has relatively low heritability (6.3%). It is unlikely that there are susceptibility genes or loci of large effect size. The association test enriched the association signals of multiple loci in multiple genes with low effect size so that the susceptibility pathways and gene sets can be prioritized. Moreover, it is easier to prioritize potential susceptibility genes given the prioritized gene sets. In our analysis, a non-trivial fraction of genes within the gene sets achieved moderately significant p-values. It is likely that some of the genes may achieve genome-wide significance when sample sizes are increased. However, almost all of the genes would be ignored by the widely-adopted genome-wide p-value threshold (5E-8) in the present samples (1307 cases vs.1574 controls).

Our study is the first to show that genetic variations of two genes (SLC39A8 and GOLGA8M) are significantly associated with the development of HBV-related HCC. SLC39A8 encodes a member of the SLC39 family of solute-carrier genes (Zrt/Irt-like protein 8, ZIP8), which may play an important role in autophagy during ethanol exposure in human hepatoma cells [31]. Liu et al. suggested that hepatic ZIP8 deficiency was associated with tumor formation [32]. Moreover, SLC39A8 has been reported to regulate IFN-γ level in T cells [33] and influence trace element homeostasis in liver [34, 35], which may be relevant to the development of HCC. GOLGA8M encodes golgin A8 family member M. Although it has not been linked to cancer, a study suggested that palindromic GOLGA8 core duplicons promoted chromosome microdeletion and evolutionary instability [36]. In addition, two other genes (SMIM31 and WHAMMP2) also achieved suggestively significant p-values. SMIM31 has been implicated as a biomarker for survival of colorectal adenocarcinoma [37] and promoting proliferation of lung adenocarcinoma [38]. RNF157-AS1, which was implicated by ECS, is an antisense RNA gene. Differential expression between tumor and non-tumor tissue at this gene has been founded in lung cancer [39] and ovarian cancer [40]. Anyhow, functional validation studies are needed to explore the mechanisms of the potential roles of these genes in risk of HBV-related HCC.

The successful prioritization of two gene sets that are highly relevant to cancer development also implies the power of the knowledge-based analysis. The top two functional gene-sets are cell cycle G1/S transition and NOTCH1 intracellular domain regulates transcription. There have been numerous studies linking these functional gene sets to HCC [41,42,43,44]. For example, Wang et al. recently showed that lnc-UCID promotes G1/S Transition and hepatoma growth by preventing DHX9-Mediated CDK6 down-regulation [41]. As the gene with the smallest p-value in the cell cycle G1/S transition gene set, CDC45 encodes cell division control protein 45 and has been linked to many cancers according to its expression, including HCC [45] and colorectal cancer [46]. NCOR1, the gene with the smallest p-value in the gene set of NOTCH1 intracellular domain regulates transcription, encodes a protein that mediates ligand-independent transcription repression of thyroid-hormone and retinoic-acid receptors, which may regulate de novo fatty acids synthesis in liver regeneration and hepatocarcinogenesis in mice [47]. For another gene with similar p-value as NCOR1 in the gene set of NOTCH1 intracellular domain regulates transcription, KAT2A encodes lysine acetyltransferase 2A and was linked to HCC. For instance, Majaz et al. suggested that KAT2A may promote human HCC progression by enhancing AIB1 expression [48]. The highly and specifically expression in human liver is also an effective stratum for prioritization of HCC susceptibility genes. When multiple testing correction is carried out in this gene set, three genes PAH, UGT2B10 and UROC1 achieved suggestive significance level (FDR q < 0.1). All of the three genes have been implicated with HCC by multiple studies. The most significant gene PAH (p = 3.5E-4 and q = 0.064) has the largest number of literature supports, that is, many studies have implicated this gene in development of HCC. For example, Miller et al. showed p-Chlorphenylalanine effect on phenylalanine hydroxylase in hepatoma cells in culture [49]. Gopalakrishnan and Anderson showed the epigenetic activation of phenylalanine hydroxylase in mouse erythroleukemia cells by the cytoplast of rat hepatoma cells [50]. UGT2B10 (p = 7.9E-4) encodes UDP-Glucuronosyltransferase 2B10. Hanioka et al. showed that expression of UGT2B isoforms (including UGT2B10) was significantly increased by AFB1 in HepG2 cells [51]. UROC1 (p = 1.4E-3) encodes enzyme involved in histidine catabolism, metabolizing urocanic acid to formiminoglutamic acid. Zhang et al. showed that UROC1 may play important roles in HCC development, especially alcohol-related HCC development and progression [52].

The negative findings in the curated gene sets of recurrent targeted genes of HBV infection and HCC risk genes with many somatic mutations are unexpected to some extent. Both gene sets appeared to be biologically relevant to the development of HCC. In the analyses, there were no trends that genes with smaller HCC association p-values were enriched in the gene sets. These results suggest that the biological context or connection of underlying susceptibility genes is elusive, and that it is difficult to simply use our current knowledge to identify the unknown susceptibility genes of HCC. Using larger sample for hypothesis-free GWASs is likely the only effective way for identification of HCC risk genes at present.

The issue of negative association at variants in replication sample is consistent with that in the discovery sample. Due to small effect size, no variants in the discovery GWAS sample of 1307 HBV-related HCC cases and 1574 controls had a p-value less than the widely-adopted genome-wide cutoff (5E-8). It was the gene-based association analysis combing the p-values of multiple SNPs that achieved genome-wide significant p-values at some genes. Because of budget limit, however, most genes only had one selected SNPs to maximize the total number of genes for replication. Therefore, we were unable to carry out the gene-based association in the replication study as we did in the GWAS sample. Unfortunately, probably due to low effect size, no variants achieved significant p-value in the replication sample of 965 HBV-related HCC cases and 923 controls. The SNP-level negative replication implies either more powerful knowledge-based association study or larger sample is needed for identifying HCC susceptibility genes.

Conclusion

We performed the first systematic gene- and gene-set-based association study of HCC. Our study suggested several promising genes significantly associated with HCC risk, which may shed insights into pathogenic mechanisms of this fatal disorder. However, the failure in replication study also implies small effect size of the susceptibility genes. More hypothesis-free genetic studies with larger sample sizes are needed to elucidate the susceptibility genes and mechanisms of HCC.

Availability of data and materials

Please contact author for data requests.

Abbreviations

eQTL:

Expression quantitative trait locus

FDR:

False discovery rate

GWAS:

Genome-wide associated studies

HBV:

Hepatitis B virus

HCC:

Hepatocellular carcinoma

LD:

Linkage disequilibrium

QC:

Quality control

SLE:

Systemic lupus erythematosus

SNP:

Single nucleotide polymorphism

TF:

Transcription factor

References

  1. 1.

    Pinyol R, Llovet JM. Hepatocellular carcinoma: genome-scale metabolic models for hepatocellular carcinoma. Nat Rev Gastroenterol Hepatol. 2014;11(6):336–7.

  2. 2.

    Arzumanyan A, Reis HM, Feitelson MA. Pathogenic mechanisms in HBV- and HCV-associated hepatocellular carcinoma. Nat Rev Cancer. 2013;13(2):123–35.

    CAS  PubMed  Article  Google Scholar 

  3. 3.

    Yu MW, Chang HC, Liaw YF, Lin SM, Lee SD, Liu CJ, Chen PJ, Hsiao TJ, Lee PH, Chen CJ. Familial risk of hepatocellular carcinoma among chronic hepatitis B carriers and their relatives. J Natl Cancer Inst. 2000;92(14):1159–64.

    CAS  PubMed  Article  Google Scholar 

  4. 4.

    Cai RL, Meng W, Lu HY, Lin WY, Jiang F, Shen FM. Segregation analysis of hepatocellular carcinoma in a moderately high-incidence area of East China. World J Gastroenterol. 2003;9(11):2428–32.

    PubMed  PubMed Central  Article  Google Scholar 

  5. 5.

    Zhang H, Zhai Y, Hu Z, Wu C, Qian J, Jia W, Ma F, Huang W, Yu L, Yue W, et al. Genome-wide association study identifies 1p36.22 as a new susceptibility locus for hepatocellular carcinoma in chronic hepatitis B virus carriers. Nat Genet. 2010;42(9):755–8.

    CAS  PubMed  Article  Google Scholar 

  6. 6.

    Li S, Qian J, Yang Y, Zhao W, Dai J, Bei JX, Foo JN, PJ ML, Li Z, Yang J, et al. GWAS identifies novel susceptibility loci on 6p21.32 and 21q21.3 for hepatocellular carcinoma in chronic hepatitis B virus carriers. PLoS Genet. 2012;8(7):e1002791.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  7. 7.

    Jiang DK, Sun J, Cao G, Liu Y, Lin D, Gao YZ, Ren WH, Long XD, Zhang H, Ma XP, et al. Genetic variants in STAT4 and HLA-DQ genes confer risk of hepatitis B virus-related hepatocellular carcinoma. Nat Genet. 2013;45(1):72–5.

    CAS  PubMed  Article  Google Scholar 

  8. 8.

    Manolio TA. Bringing genome-wide association findings into clinical use. Nat Rev Genet. 2013;14(8):549–58.

    CAS  PubMed  Article  Google Scholar 

  9. 9.

    Kwak IY, Pan W. Gene- and pathway-based association tests for multiple traits with GWAS summary statistics. Bioinformatics. 2017;33(1):64–71.

    CAS  PubMed  Article  Google Scholar 

  10. 10.

    Chan KY, Wong CM, Kwan JS, Lee JM, Cheung KW, Yuen MF, Lai CL, Poon RT, Sham PC, Ng IO. Genome-wide association study of hepatocellular carcinoma in southern Chinese patients with chronic hepatitis B virus infection. PLoS One. 2011;6(12):e28798.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  11. 11.

    Jiang DK, Ma XP, Yu H, Cao G, Ding DL, Chen H, Huang HX, Gao YZ, Wu XP, Long XD, et al. Genetic variants in five novel loci including CFB and CD40 predispose to chronic hepatitis B. Hepatology. 2015;62(1):118–28.

    CAS  PubMed  Article  Google Scholar 

  12. 12.

    Kwan JS, Li MX, Deng JE, Sham PC. FAPI: fast and accurate P-value imputation for genome-wide association study. Eur J Hum Genet. 2016;24(5):761–6.

    CAS  PubMed  Article  Google Scholar 

  13. 13.

    Li MX, Gui HS, Kwan JS, Sham PC. GATES: a rapid and powerful gene-based association test using extended Simes procedure. Am J Hum Genet. 2011;88(3):283–93.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  14. 14.

    Li M, Jiang L, Mak TSH, Kwan JSH, Xue C, Chen P, Leung HC, Cui L, Li T, Sham PC. A powerful conditional gene-based association approach implicated functionally important genes for schizophrenia. Bioinformatics. 2019;35(4):628–35.

    CAS  PubMed  Article  Google Scholar 

  15. 15.

    Gui H, Kwan JS, Sham PC, Cherny SS, Li M. Sharing of genes and pathways across complex phenotypes: a multilevel genome-wide analysis. Genetics. 2017;206(3):1601–9.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  16. 16.

    Sudmant PH, Rausch T, Gardner EJ, Handsaker RE, Abyzov A, Huddleston J, Zhang Y, Ye K, Jun G, Hsi-Yang Fritz M, et al. An integrated map of structural variation in 2,504 human genomes. Nature. 2015;526(7571):75–81.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  17. 17.

    Ward LD, Kellis M. HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucleic Acids Res. 2012;40(Database issue):D930–4.

    CAS  PubMed  Article  Google Scholar 

  18. 18.

    Xie D, Boyle AP, Wu L, Zhai J, Kawli T, Snyder M. Dynamic trans-acting factor colocalization in human cells. Cell. 2013;155(3):713–24.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  19. 19.

    Li M, Li J, Li MJ, Pan Z, Hsu JS, Liu DJ, Zhan X, Wang J, Song Y, Sham PC. Robust and rapid algorithms facilitate large-scale whole genome sequencing downstream analysis in an integrative framework. Nucleic Acids Res. 2017;45(9):e75.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  20. 20.

    Li MX, Gui HS, Kwan JS, Bao SY, Sham PC. A comprehensive framework for prioritizing variants in exome sequencing studies of Mendelian diseases. Nucleic Acids Res. 2012;40(7):e53.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  21. 21.

    Kircher M, Witten DM, Jain P, O'Roak BJ, Cooper GM, Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet. 2014;46(3):310–5.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  22. 22.

    Ryan NM, Morris SW, Porteous DJ, Taylor MS, Evans KL. SuRFing the genomics wave: an R package for prioritising SNPs by functionality. Genome Med. 2014;6(10):79.

    PubMed  PubMed Central  Google Scholar 

  23. 23.

    Fu Y, Liu Z, Lou S, Bedford J, Mu XJ, Yip KY, Khurana E, Gerstein M. FunSeq2: a framework for prioritizing noncoding regulatory variants in cancer. Genome Biol. 2014;15(10):480.

    PubMed  PubMed Central  Article  Google Scholar 

  24. 24.

    Li MJ, Li M, Liu Z, Yan B, Pan Z, Huang D, Liang Q, Ying D, Xu F, Yao H, et al. cepip: context-dependent epigenomic weighting for prioritization of regulatory variants and disease-associated genes. Genome Biol. 2017;18(1):52.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  25. 25.

    Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet. 2011;88(1):76–82.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  26. 26.

    Li MX, Sham PC, Cherny SS, Song YQ. A knowledge-based weighting framework to boost the power of genome-wide association studies. PLoS One. 2010;5(12):e14480.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  27. 27.

    Paterlini-Brechot P, Saigo K, Murakami Y, Chami M, Gozuacik D, Mugnier C, Lagorce D, Brechot C. Hepatitis B virus-related insertional mutagenesis occurs frequently in human liver cancers and recurrently targets human telomerase gene. Oncogene. 2003;22(25):3911–6.

    CAS  PubMed  Article  Google Scholar 

  28. 28.

    Ding D, Lou X, Hua D, Yu W, Li L, Wang J, Gao F, Zhao N, Ren G, Li L, et al. Recurrent targeted genes of hepatitis B virus in the liver cancer genomes identified by a next-generation sequencing-based approach. PLoS Genet. 2012;8(12):e1003065.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  29. 29.

    Sung WK, Zheng H, Li S, Chen R, Liu X, Li Y, Lee NP, Lee WH, Ariyaratne PN, Tennakoon C, et al. Genome-wide survey of recurrent HBV integration in hepatocellular carcinoma. Nat Genet. 2012;44(7):765–9.

    CAS  PubMed  Article  Google Scholar 

  30. 30.

    Jiang Z, Jhunjhunwala S, Liu J, Haverty PM, Kennemer MI, Guan Y, Lee W, Carnevali P, Stinson J, Johnson S, et al. The effects of hepatitis B virus integration into the genomes of hepatocellular carcinoma patients. Genome Res. 2012;22(4):593–601.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  31. 31.

    Liuzzi JP, Yoo C. Role of zinc in the regulation of autophagy during ethanol exposure in human hepatoma cells. Biol Trace Elem Res. 2013;156(1–3):350–6.

    CAS  PubMed  Article  Google Scholar 

  32. 32.

    Liu L, Geng X, Cai Y, Copple B, Yoshinaga M, Shen J, Nebert DW, Wang H, Liu Z. Hepatic ZIP8 deficiency is associated with disrupted selenium homeostasis, liver pathology, and tumor formation. Am J Physiol Gastrointest Liver Physiol. 2018;315(4):G569–79.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  33. 33.

    Aydemir TB, Liuzzi JP, McClellan S, Cousins RJ. Zinc transporter ZIP8 (SLC39A8) and zinc influence IFN-gamma expression in activated human T cells. J Leukoc Biol. 2009;86(2):337–48.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  34. 34.

    Lin W, Vann DR, Doulias PT, Wang T, Landesberg G, Li X, Ricciotti E, Scalia R, He M, Hand NJ, et al. Hepatic metal ion transporter ZIP8 regulates manganese homeostasis and manganese-dependent enzyme activity. J Clin Invest. 2017;127(6):2407–17.

    PubMed  PubMed Central  Article  Google Scholar 

  35. 35.

    Engelken J, Espadas G, Mancuso FM, Bonet N, Scherr AL, Jimenez-Alvarez V, Codina-Sola M, Medina-Stacey D, Spataro N, Stoneking M, et al. Signatures of evolutionary adaptation in quantitative trait loci influencing trace element homeostasis in liver. Mol Biol Evol. 2016;33(3):738–54.

    CAS  PubMed  Article  Google Scholar 

  36. 36.

    Antonacci F, Dennis MY, Huddleston J, Sudmant PH, Steinberg KM, Rosenfeld JA, Miroballo M, Graves TA, Vives L, Malig M, et al. Palindromic GOLGA8 core duplicons promote chromosome 15q13.3 microdeletion and evolutionary instability. Nat Genet. 2014;46(12):1293–302.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  37. 37.

    Zeng JH, Liang L, He RQ, Tang RX, Cai XY, Chen JQ, Luo DZ, Chen G. Comprehensive investigation of a novel differentially expressed lncRNA expression profile signature to assess the survival of patients with colorectal adenocarcinoma. Oncotarget. 2017;8(10):16811–28.

    PubMed  PubMed Central  Article  Google Scholar 

  38. 38.

    Wang G, Chen H, Liu J. The long noncoding RNA LINC01207 promotes proliferation of lung adenocarcinoma. Am J Cancer Res. 2015;5(10):3162–73.

    CAS  PubMed  PubMed Central  Google Scholar 

  39. 39.

    Qiao F, Li N, Li W. Integrative bioinformatics analysis reveals potential Long non-coding RNA biomarkers and analysis of function in non-smoking females with lung Cancer. Medical Sci Monitor. 2018;24:5771–8.

    CAS  Article  Google Scholar 

  40. 40.

    Zhan L, Li J, Wei B. Long non-coding RNAs in ovarian cancer. J Exp Clin Cancer Res. 2018;37(1):120.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  41. 41.

    Wang YL, Liu JY, Yang JE, Yu XM, Chen ZL, Chen YJ, Kuang M, Zhu Y, Zhuang SM. Lnc-UCID promotes G1/S transition and Hepatoma growth by preventing DHX9-mediated CDK6 Down-regulation. Hepatology. 2019;70(1):259–75.

    CAS  PubMed  PubMed Central  Google Scholar 

  42. 42.

    Liu RY, Diao CF, Zhang Y, Wu N, Wan HY, Nong XY, Liu M, Tang H. miR-371-5p down-regulates pre mRNA processing factor 4 homolog B (PRPF4B) and facilitates the G1/S transition in human hepatocellular carcinoma cells. Cancer Lett. 2013;335(2):351–60.

    CAS  PubMed  Article  Google Scholar 

  43. 43.

    Zhang L, Chen J, Yong J, Qiao L, Xu L, Liu C. An essential role of RNF187 in Notch1 mediated metastasis of hepatocellular carcinoma. J Exp Clin Cancer Res. 2019;38(1):384.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  44. 44.

    Fang S, Liu M, Li L, Zhang FF, Li Y, Yan Q, Cui YZ, Zhu YH, Yuan YF, Guan XY. Lymphoid enhancer-binding factor-1 promotes stemness and poor differentiation of hepatocellular carcinoma by directly activating the NOTCH pathway. Oncogene. 2019;38(21):4061–74.

    CAS  PubMed  Article  Google Scholar 

  45. 45.

    Sang L, Wang XM, Xu DY, Zhao WJ. Bioinformatics analysis of aberrantly methylated-differentially expressed genes and pathways in hepatocellular carcinoma. World J Gastroenterol. 2018;24(24):2605–16.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  46. 46.

    Yang S, Ren X, Liang Y, et al. KNK437 restricts the growth and metastasis of colorectal cancer via targeting DNAJA1/CDC45 axis. Oncogene. 2020;39(2):249–61. https://doi.org/10.1038/s41388-019-0978-0.

  47. 47.

    Ou-Yang Q, Lin XM, Zhu YJ, Zheng B, Li L, Yang YC, Hou GJ, Chen X, Luo GJ, Huo F, et al. Distinct role of nuclear receptor corepressor 1 regulated de novo fatty acids synthesis in liver regeneration and hepatocarcinogenesis in mice. Hepatology. 2018;67(3):1071–87.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  48. 48.

    Majaz S, Tong Z, Peng K, Wang W, Ren W, Li M, Liu K, Mo P, Li W, Yu C. Histone acetyl transferase GCN5 promotes human hepatocellular carcinoma progression by enhancing AIB1 expression. Cell & bioscience. 2016;6:47.

    Article  CAS  Google Scholar 

  49. 49.

    Miller MR, McClure D, Shiman R. P-Chlorphenylalanine effect on phenylalanine hydroxylase in hepatoma cells in culture. J Biol Chem. 1975;250(3):1132–40.

    CAS  PubMed  Google Scholar 

  50. 50.

    Gopalakrishnan TV, Anderson WF. Epigenetic activation of phenylalanine hydroxylase in mouse erythroleukemia cells by the cytoplast of rat hepatoma cells. Proc Natl Acad Sci U S A. 1979;76(8):3932–6.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  51. 51.

    Hanioka N, Nonaka Y, Saito K, Negishi T, Okamoto K, Kataoka H, Narimatsu S. Effect of aflatoxin B1 on UDP-glucuronosyltransferase mRNA expression in HepG2 cells. Chemosphere. 2012;89(5):526–9.

    CAS  PubMed  Article  Google Scholar 

  52. 52.

    Zhang X, Kang C, Li N, Liu X, Zhang J, Gao F, Dai L. Identification of special key genes for alcohol-related hepatocellular carcinoma through bioinformatic analysis. PeerJ. 2019;7:e6375.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

Download references

Acknowledgements

Not applicable.

Funding

This study was supported by Hong Kong Health and the Medical Research Fund (No. 01121436 to M.X.L.), the National Natural Science Foundation of China (No. 31771401 to M.X.L.; No. 31100895, 81472618 and 81670535 to D.K.J.; No. 81402297 to Q.L.X.), the National Science and Technology Major Project (No. 2017ZX10202202 and 2018ZX10301202 to D.K.J. and J.H.), the Local Innovative and Research Teams Project of Guangdong Pearl River Talents Program (No. 2017BT01S131 to D.K.J. and J.H.), the Research and Development Project in Key Areas of Guangdong Province (No. 2019B020227004 to D.K.J.), the Innovative Research Team Project of Guangxi Province (No. 2017GXNSFGA198002 to D.K.J.), the Grant for Recruited Talents to Start Scientific Research from Nanfang Hospital (to D.K.J.), and the Outstanding Youths Development Scheme of Nanfang Hospital, Southern Medical University (No. 2017 J001 to D.K.J.). The funders had no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.

Author information

Affiliations

Authors

Contributions

D.K.J.: study concept and design, material support, obtained funding, analysis and interpretation of the data, and drafting of the manuscript; J.D.: analysis and interpretation of the data, and drafting of the manuscript; C.D.: analysis and interpretation of the data; X.M.: material support; Q.X.: material support; B.Z.: material support; C.Y.: revision of the manuscript; L.W.: analysis and interpretation of the data; C.C.: critical revision of the manuscript; S.L.Z.: technical, acquisition of data; I.O.N.: study concept and material support; L.Y.: material support; J.X.: material support; P.C.S.: study concept and design; X.Q.: critical revision of the manuscript; J.H.: material support; Y.J.: analysis and interpretation of the data; G.C.: material support; M.X.L: study supervision, study concept and design, obtained funding, analysis and interpretation of data, drafting of the manuscript. All authors have read and approved the manuscript.

Corresponding authors

Correspondence to Guangwen Cao or Miaoxin Li.

Ethics declarations

Ethics approval and consent to participate

The study was performed in accordance with guidelines approved by the local ethical committees from all participating centers involved in both the GWAS stage (the ethics committee of Qidong Liver Cancer Institute) and the replication stage (the ethics committee of the Second Military Medical University). A written informed consent to participate in the study was obtained from each subject in accordance with the declaration of Helsinki principles. All study participants approved the storage of their frozen DNA specimens, for research purposes, in our laboratory.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1: Supplementary Figure 1.

Manhattan plots of p-values of 5,375,073 SNPs obtained by a meta-analysis of two HBV-related HCC GWASs in Chinese populations. Supplementary Figure 2. The quantile-quantile plots of p-values of 5,375,073 SNPs obtained by a meta-analysis of two HBV-related HCC GWASs in Chinese populations. Supplementary Figure 3. Regulatory annotations at gene WHAMMP2. Supplementary Figure 4. Visualization of regulatory annotation at rs17343667 by RegulomeDB. Supplementary Table 1. Genetic association p-values of genes preferentially expressed in liver. Supplementary Table 2. Association p-values of genes frequently intergraded by HBV. Supplementary Table 3. Association p-values of genes with significant association in previous studies. Supplementary Table 4. Association p-values of COSMIC HCC risk genes.

Additional file 2:

Excel table of canonical pathways of MSigDB with p < 0.05 by gene-set based association tests.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Jiang, D., Deng, J., Dong, C. et al. Knowledge-based analyses reveal new candidate genes associated with risk of hepatitis B virus related hepatocellular carcinoma. BMC Cancer 20, 403 (2020). https://doi.org/10.1186/s12885-020-06842-0

Download citation

Keywords

  • Knowledge-based genetic association
  • Susceptibility
  • Hepatitis B virus
  • Hepatocellular carcinoma