A rare CHD5 haplotype and its interactions with environmental factors predicting hepatocellular carcinoma risk

Background CHD5 is a conventional tumour-suppressing gene in many tumours. The aim of this study was to determine whether CHD5 variants contribute to the risk of hepatocellular carcinoma (HCC). Methods Gene variants were identified using next-generation sequencing targeted on referenced mutations followed by TaqMan genotyping in two case-control studies. Results We discovered a rare variant (haplotype AG) in CHD5 (rs12564469-rs9434711) that was markedly associated with the risk of HCC in a Chinese population. A logistical regression model and permutation test confirmed the association. Indeed, the association quality increased in a gene dose-dependent manner as the number of samples increased. In the stratified analysis, this haplotype risk effect was statistically significant in a subgroup of alcohol drinkers. The false-positive report probability and multifactor dimensionality reduction further supported the finding. Conclusions Our results suggest that the rare CHD5 gene haplotype and alcohol intake contribute to the risk of HCC. Our findings can be valuable to researchers of cancer precision medicine looking to improve diagnosis and treatment of HCC. Electronic supplementary material The online version of this article (10.1186/s12885-018-4551-y) contains supplementary material, which is available to authorized users.


Background
Hepatocellular carcinoma (HCC) is the most common primary liver cancer and has the worst prognoses of all malignancies. The etiological background of HCC patients differs between patients from different regions. In China, chronic hepatitis B virus (HBV) infection is the most important risk factor for HCC; two-thirds of the worldwide HBV carriers are Chinese, and approximately 20% of them have a chronic HBV infection [1].
Chromodomain helicase DNA-binding protein 5 (CHD5) is on the Homo sapiens chromosome 1p36. 31. It is one of the nine members of the CHD-binding enzymes and belongs to the snf2 DNA helicase/methylase superfamily [2]. CHD5 consists of 42 exons coding for a 223 kDa protein. Based on its protein sequence, it contains two PHD zinc fingers, two chromodomains and a helicase/ATPase domain.
Evidence that CHD5 functions as a tumour suppressor in human cancers has emerged principally from studies of neuroblastoma, wherein loss of the CHD5 locus on chromosome 1p36.3 is very common. CHD5 has garnered considerable interest owing to its ability to severely impact clonogenicity and tumourigenecity. Although its expression was thought to be restricted to neural-related tissues, it was subsequently found to be a tumour suppressor in neuroblastoma [3], melanoma [4], lung cancer [5], breast cancer [6], ovarian cancer [7], gastric cancer [8], colorectal cancer [9] and HCC [10]. CHD5 loss leads to a wide range of cellular consequences, and it, therefore, remains a promising candidate for further investigation in HCC. In this study, we tested the hypothesis that single-nucleotide polymorphisms (SNPs) in the 1p36 region of CHD5 are associated with HCC.

Study subjects
First, 280 unrelated HCC patients and 255 healthy controls (admitted to the Zibo Central Hospital in North China between 2006 and 2010) were recruited for our study. Then, 549 HCC patients and 510 controls (admitted to the Peking University Shenzhen Hospital between 2007 and 2010, the First Affiliated Hospital at the Sun Yat-Sen University between 2007 and 2015, and the Cancer Hospital of Guangzhou Medical University between 2009 and 2011 in South China) were enrolled in the replication study. The selection criteria for the controls included no individual/family history of cancer or diabetes; no history of HBV, HCV, tuberculosis or HIV infection and frequency of age (± 5 years) and sex matching those of the patients. All patients were newly diagnosed, previously untreated (no radiotherapy or chemotherapy) and were proven to have no other tumours. We used published diagnostic criteria for HCC [11,12]. The definition of 'Ever or current smokers' is those who had smoked more than 100 cigarettes, which is equal to five packs in their whole life before the date they were diagnosed with cancer or before the date they were interviewed for the controls [13,14]. The definition of 'Ever or current drinkers' were those who have consumed alcoholic beverages ≥one time per week for 6 months or more previously; otherwise, they were defined as non-drinkers [15]. The purpose of frequency matching was to control confounding factors while evaluating the main effect of CHD5 polymorphisms. All patients and controls were Han Chinese in origin and lived in China. Relevant biographical features of the subjects are summarised in Table 1.
The committee of ethics in Guangdong Medical University authorised the experimental and research protocols of this study. Experiments on humans were performed in accordance with relevant guidelines and regulations. After clearly explaining the purpose of the study to the participants, all controls and patients (or relatives of patients who already died) provided written informed consent. The study also adhered to tenets in the Helsinki declaration. All potential participants who declined to participate or ended up not participating were eligible for treatment, and non-participation did not result in any disadvantages for patients.

Targeted next-generation sequencing (NGS) and identification of genetic variants
Aliquots of buffy coat and plasma separated from blood samples were stored at − 80°C until subsequent treatment. All samples were included in the combined study. Genomic DNA was extracted from peripheral whole blood cells using the QIAamp system (QIAGEN Co.). Genomic DNA from 255 controls and 280 HCC patients were randomly sheared by sonication to an average size of 250 bp per fragment. Target enrichment technology was used as described by Anna Kiialainen et al. [16]. The enriched libraries were loaded onto the HiSeq system 2000 and approximately 90-bp paired-end reads were produced using the NGS technology (Illumina Genome Analyzer). We will use fastq short reads to align the NCBI build 37.1 hg19 [17]. Single-nucleotide variants (SNV) that obey the criteria that a. P for Hardy-Weinberg equilibrium (HWE, <10 − 4 ), b. duplicated paired-end reads, c. overall depth ≤ 8×, d. SNP within 10 bp of a gap, or e. copy number variant ≥2 were then filtered [18]. For these concerns, only qualified SNPs were considered for this evaluation, so a 164-SNP set was used for the primary case-control study. Plink was used to calculate single-nucleotide variants [19], and the Haploview was used to perform visualisation [20].

Population risk evaluation, linkage disequilibrium (LD) mapping and gene-gene interactions
We used the chi-square and Mann-Whitney U tests to compare and evaluate the clinical data between the patients and controls in discovery, replication and the combined groups. The risk evaluation was assessed using the Pearson chi-square test. Because 164 SNPs were genotyped, the Bonferroni-corrected P value for association studies is 0.05/164 = 0.0003 for single SNPs.
A gene-gene interaction in this study is defined as an SNP-SNP interaction and was conducted with LD mapping. To estimate the degree of LD between pairs of loci, the standardised disequilibrium coefficient (D′) was calculated and haplotype blocks were defined using the Haploview programme [20]. The haplotypic imputation, reconstruction and frequency estimations were conducted with an expectation-maximisation algorithm [21]. n e = 1/∑ Pi 2 was used to calculate the number of effective haplotypes, and Pi was the estimate of individual haplotype frequency [22]. Pi was calculated because the phase of the genotype was known and it was chosen in compliance with the homologous probabilities of occurrence that had a higher likelihood (>0.95 as cut-point).

Permutation test and quantile-quantile (Q-Q) analysis
We performed permutation tests for 10 5 permutations, in which subjects' phenotypes were randomly realigned. P values (permutation or empirical P values) were specified as permutation values that were at least as extreme as the original statistics divided by the total permutation numbers. For better estimation of empirical P values, SNPs were reconsidered with 10 5 permutations. Permutations were used to redistribute controls and patients. By convention if P < 0.05, the difference was considered statistically significant.
A Q-Q plot was then graphed to check the P value distribution. The 'cumulative distribution function' of the normal density and qth quantile of a Gauss distribution was signified by Φ(z) and ξ q , respectively, (Φ(ξ q ) = q). Therefore, the probability <ξq is actually just q. The theoretical quantile was defined by the inverse of the normal cumulative distribution function. Especially, the theoretical fitting the empirical quantile z (i) should be

False-positive report probability (FPRP) analysis
To avoid the possibility of false-positives inherent to performing multiple tests, a Bayesian statistical test-the FPRP-was performed for all significance in genetic association studies [23]. According to the method proposed, an FPRP value of ≤0.2 was regarded as pointing to a significant association, and a prior probability of 0.1 to check ORs of 1.50/0.67 was applied for risk/protective functions. The statistical power was calculated according to the case/control numbers and OR/P values in the study.

Gene-environment interactions
The possible gene-environment interactions with high-order in the associations were evaluated using the multiple dimension reduction (MDR) programme [24]. Briefly, we carried out a 100-fold cross-validation and 1000-fold permutations under the assumption of no association. The maximum cross-validation consistency (CVC) and minimum average prediction error were requirements for the best interaction model.

Statistical software
The SPSS 22.0 for Windows (SPSS, Chicago, IL) and R scripts (3.0.2 Suite) software were used for statistical analyses.

Results
Population association risk (PAR) in the discovery study We detected a total of 164 single-base substitutions analysing the targeted NGS results (Fig. 1a and Additional file 1: Table S1). Of these, eight were in a promoter region, 129 were intronic and 27 were in coding exons. A case-control study was conducted and the results indicated potential associations between the risk of HCC and the CHD5 polymorphisms rs9434741 (PAR = 0.0051), rs2273032 (PAR = 0.0089) and rs12067480 (PAR = 0.0261) in the Han population ( Fig. 1b and Additional file 1: Table S1). But they lost statistical significance after performing a Bonferroni correction. They also lost their significance after 10 5 permutation tests (for example, P = 0.3156 for rs9434741, Fig. 1c). Q-Q plots were used to compare with the observed chi-square results with the distribution expected under the null hypothesis, there was deviation from expectation at a higher value of approximately 2.8 (Fig. 1d). After removing rs9434741, there were no significant curve changes compared with the expected distribution (Fig. 1e).

LD and haplotypic analysis in the discovery study
Direct sequencing results revealed a total of 164 SNPs in CHD5. We identified three blocks with high LD (Fig. 1a). Block 1 includes SNP3-SNP6 (rs12037962, rs11587, rs41307753 and rs3810989). Block 2 includes SNP35-SNP38 (rs2273041, rs2273040, rs2273038 and rs55930553). Block 3 includes SNP115 and SNP116 (rs12564469 and rs9434711). Blocks were reconstructed according to their frequencies. The results of the haplotype-based case-control study between the HCC and control groups are shown in Table 2. We found that a haplotype AG in block 3 showed a significant association with HCC (P = 1.94 × 10 − 5 ). It remained significant according to unconditional logistic regression analysis after adjustment for age, sex, smoking and drinking status (P corrected = 5.73 × 10 − 5 ) and after 10 5 permutation tests (P = 4.00 × 10 − 5 ).

Population association and haplotypic analysis based on selected SNPs in the replication and combined studies
We selected SNPs rs12564469 and rs9434711 in block 3 from the first SNP discovery study for the next study. Replicative results showed no associations for rs12564469 (PAR = 0.0800, P adjusted = 0.1029, P Permutation = 0.1062) or for rs9434711 (PAR = 0.8718, P adjusted = 0.8485, P Permutation = 0.9601). Finally, a combined study including discovery and replicative cohort data was conducted. Combined results also showed no association for rs12564469 (PAR = 0.0210, P adjusted = 0.0290, P Permutation = 0.0286) and for rs9434711 (PAR = 0.8829, P adjusted = 0.9137, P Permutation = 0.9704; Table 3).
The results of the haplotype-based replication and combined studies between the HCC and control groups are shown in Table 2. We observed increased frequencies of haplotype AG in HCC patients compared with those seen in healthy controls both in the replication study (PAR = 5.038 × 10 − 8 , P adjusted = 7.571 × 10 − 8 , P Permutation = 0.00001) and in the combined study (PAR = 4.393 × 10 − 12 , P adjusted = 5.514 × 10 − 11 , P Permutation = 0.00001).

Stratification analysis of haplotypes
The association of haplotype AG (block 3) with the risk of HCC in subgroups such as age, sex, smokers and drinkers were evaluated further using replication and combined studies (Table 4). We found that those individuals carrying haplotype AG had a significantly increased risk of HCC, and the risk was Empirical p-value based on 10 5 permutations of case-control status using the max(T) procedure. p < 0.05 means significant value increased in patients of >55 years (P = 6.04 × 10 − 8 and P i (P2/P1) = 5.12 × 10 − 4 ) and in drinkers (P = 9.43 × 10 − 8 and P i (P2/P1) = 3.25 × 10 − 6 ).

FPRP
The significant associations of FPRP values for block 3 haplotype AG (vs. AA + GG) at different levels of prior probability are listed in Table 5. FPRP values of haplotype AG for HCC risk in patients >55 years were <0.20 for the assigned prior probability (0.017 for the prior probability of 0.1 in the replication study; 0.004 and 0.010 for the prior probabilities of 0.1 and 0.01, respectively, in the combined study). For the risk of HCC in alcohol drinkers, when the assumptions of prior probability were 0.1 and 0.01, all findings were significant not only in the discovery study but also in the replication and combined studies (FPRP < 0.20). Moreover, when the assumption of prior probability was 0.001, this association was still prominent in the combined study (FPRP = 0.069).

Association of high-order interactions with HCC risk by MDR
The interactions of high-order assessed with MDR were conducted, including the potential risk haplotype AG and four known risk factors (age, sex, smoking and drinking status), in order to check whether possible gene-environmental interactions in association with the risk of HCC exists. In the discovery study, we noticed that the best one-factor model was drinking status, with the highest CVC (99/100, the same model is selected as the best model 99 out of 100 times) and the lowest prediction error (0.385). The best model for two-factors was drinking status plus haplotype AG, with the highest CVC (96/100) and the lowest prediction error (0.403). Interestingly, the model with 5-factors had a maximum CVC (100/100) and a minimum prediction error (0.378). This is a model with better prediction than the model with one factor. Same results were found in the replication study and the combined study (Table 6).

Discussion
Studies have found that the chromosome aberration of 1p36 deletion is not frequent in HCC. It remains to be determined whether the common SNPs in CHD5 are associated with the risk of HCC. CHD5 is a tumour-suppressing gene of the chromodomain gene family, first identified as a tumour-suppressing gene mapping to 1p36.31 [25]. The integration of clinical phenotypes and genomic information may enable precision cancer medicine through NGS approaches [26]. Results of our targeted NGS and TaqMan genotyping revealed no significant associations with the risk of HCC neither in the discovery study nor in the replication and combined studies. For two data sets, it is important to identify whether the hypothesis of a common distribution is proven to be true. The Q-Q plot offers more insight into the discrepancy than any other statistical analysis such as the Kolmogorov Smirnov 2-sample test or the chi-square test. However, we did not find any significant change after removing rs9434741, which suggests that the most likely associated SNP is not a risk locus.
Nonetheless, we inadvertently found a positive association of a rare haplotype AG (block 3: rs-12564469-rs9434711) in CHD5 and HCC, which has not been reported to date. Importantly, this association quality increased in a gene dose-dependent manner as the number of samples increased (PAR in Table 2). Thus, our results support the idea that the 1p36 region plays a role in HCC. We believe it is possible that hereditary mutations of tumour-suppressing genes in the 1p36 region contribute to the aggressive properties of liver cancer. Hereditary changes in the 1p36 region are extraordinarily common in human tumours, occurring in malignancies of epithelial, neural and haematopoietic origin [25]. Genetic mutations of the tumour-suppressing gene CHD5 have conduced to the understanding of human oncogenesis.   Table 4) with the stratified analysis. One of the possible comments is that the sample size is smaller in subgroups. Nevertheless, the results of the FPRP analysis for those findings showed that the drinkers group remained significant at the prior probability level of 0.1. We believe that in drinkers, alcohol-related carcinogens may cause DNA damage [27] and that accumulated DNA damage caused by the regular carcinogenic exposure to alcoholic drinks [28,29] might enhance the effect of genetic instability.
Next, we conducted a high-order gene (haplotype)-environment interaction analysis with MDR testing to support the above results. The best interaction model revealed that the CHD5 haplotype AG interacted with the drinking status with a maximal CVC and minimal prediction error, which was more obvious in the interaction entropy analysis. Our results suggested that the stratification testing reliably identified alcohol drinking as a risk factor.
Our recent study had reported that the CHD5 rs12564469-rs9434711 region might functionally contribute to HCC prognosis and CHD5 mRNA expressions [30]. It is possible that CHD5 plays an essential role in cancer development. The expression of multiple genes that regulate pathways in the tumourigenic process was modulated by CHD5 [31]. Apoptosis, cellular senescence and neonatal death will occur by excessive activation of these tumour-suppressive pathways, dependent on p53, p19 and p16. CHD5 expression seems to be restricted to neural-derived tissues, as opposed to CHD4 which is expressed in all tissues. CHD5 mRNA cannot be detected in the liver, placenta, spleen, bone marrow, thyroid, stomach, pancreas, small intestine, colon or prostate [8,30]. Because of this, expression of the candidate tumour-suppressing genes was sequentially disrupted by specific shRNAs. What is more, CHD5 expression is down-regulated in HCC tissues and HepG2, and the expression level of CHD5 was inversely correlated with the expression of oncogene miR-454 in HCC tissues [32]. Therefore, CHD5 as the cause of the observed phenotype was identified.
Alternatively, CHD5 or a CHD5-containing complex could interact with p53 directly. A similar model for a MTA2-containing NuRD complex regulating the p53-mediated transactivation by modulating the p53 acetylation status [33] was suggested. CHD5 may function in a similar manner since it was shown to be part of a NuRD-like complex [34]. Both the interactions and functions are equally important for the development of HCC. The genetic engineering mice with a heterozygous deficiency of the (human) 1p36 locus were prone to develop non-neural tumours (lymphoma, squamous cell carcinoma and hibernoma). CHD5 was found to positively regulate p53 presumably via p14/p19ARF [35,36]. But the exact molecular mechanisms could not be defined. The best model with maximum cross-validation consistency and minimum prediction error rate