The association of germline variants with chronic lymphocytic leukemia outcome suggests the implication of novel genes and pathways in clinical evolution

Background Chronic Lymphocytic Leukemia (CLL) is the most frequent lymphoproliferative disorder in western countries and is characterized by a remarkable clinical heterogeneity. During the last decade, multiple genomic studies have identified a myriad of somatic events driving CLL proliferation and aggressivity. Nevertheless, and despite the mounting evidence of inherited risk for CLL development, the existence of germline variants associated with clinical outcomes has not been addressed in depth. Methods Exome sequencing data from control leukocytes of CLL patients involved in the International Cancer Genome Consortium (ICGC) was used for genotyping. Cox regression was used to detect variants associated with clinical outcomes. Gene and pathways level associations were also calculated. Results Single nucleotide polymorphisms in PPP4R2 and MAP3K4 were associated with earlier treatment need. A gene-level analysis evidenced a significant association of RIPK3 with both treatment need and survival. Furthermore, germline variability in pathways such as apoptosis, cell-cycle, pentose phosphate, GNα13 and Nitric oxide was associated with overall survival. Conclusion Our results support the existence of inherited conditionants of CLL evolution and points towards genes and pathways that may results useful as biomarkers of disease outcome. More research is needed to validate these findings. Electronic supplementary material The online version of this article (10.1186/s12885-019-5628-y) contains supplementary material, which is available to authorized users.


Background
Chronic Lymphocytic Leukemia (CLL) is the most frequent lymphoproliferative disease in western countries, and it shows remarkable clinical heterogeneity [1]. Recently, some studies demonstrated a wealth of genomic and epigenomic differences that determine part of its clinical aggressivity [2], such as point mutations in NOTCH1, SF3B1, ATM, TP53 and POT1, and the absence of somatic hypermutation in the IGHV locus.
Inherited predisposition to the development of CLL has been addressed by various genome wide association studies (GWAS) during the last years. In this regard, dozens of common variants at genes such as BCL2, EOMES, CASP10 and POT1 have been associated with significant risk of CLL development [3][4][5]. Similarly, GWAS studies in other lymphoproliferative disorders such as follicular lymphoma and diffuse large B cell lymphoma have found evidence for the association of germline variants with overall survival (OS) and progression-free survival [6,7]. Despite this evidence, analysis of CLL clinical evolution have been limited almost exclusively to acquired somatic events.
In this paper, we addressed for the first time to our knowledge the association of common genomic variants with time to treatment (TTT) and OS of CLL patients participating in the Spanish ICGC cohort. Our results suggest the existence of polymorphisms at some genes (e.g., PPP4R2 and MAP3K4) significantly associated with TTT. Moreover, we found significant associations with TTT and OS both at the gene and pathway-level, which could shed new light about CLL biology and its mechanisms of progression.

Data source
We applied for access to the International Cancer Genome Consortium (ICGC) CLL sequencing data [8] deposited in the European Genome-Phenome Database (EGA). The Data Access Committee approved access to this data under DACO-1040945. We downloaded exome-seq data from control non-tumoral samples from patients with CLL under the accession code EGAD00001001464.

Data preprocessing
Exome-seq data were previously aligned to the reference genome (GRCh37.75) using bwa [9] as described in Puente et al [10]. Briefly, 3 μg of genomic DNA were used for paired-end sequencing library construction, followed by enrichment in exomic sequences using the SureSelect Human All Exon 50 Mb v4 kit or the SureSelect Human All Exon 50 Mb + UTR kits (Agilent Technologies). Next, DNA was pulled down using magnetic beads with streptavidin, followed by 18 cycles of amplification. Sequencing was performed on an Illumina GAIIx or on a HiSeq2000 sequencer (2x76bp). Duplicate read removal, sorting and indexing was done using samtools [11]. Base quality score recalibration was made with BamUtil [12] using a logistic regression model.

Sample filtering
We used principal component analysis (PCA) to detect outliers in our study cohort. Similarly, identity-by-descent (IBD) was used to discard all individuals with a degree of relatedness equivalent to third degree or higher. PCAs and IBD data were computed on a linkage disequilibrium (LD) pruned dataset (LD upper threshold of 0.2) using the Bioconductor [15] package SNPRelate [16]. Our final filtered dataset contained 426 cases. Among these, 253 were males and 173 were females. By IGHV status, there were 146 unmutated cases and 273 mutated cases; and by clinical staging, the data contained 47 monoclonal B-cell lymphocytosis (MBL), 332 Binet A, 37 Binet B and 8 Binet C cases. Information about clinical staging and IGHV mutation status was not available for 2 and 7 cases, respectively.

Regression analysis
Cox regression and assumption of proportional hazards was performed with the survival R package [17,18]. Variables with p-value < 0.2 in a univariate model were selected as covariates for the GWAS. In the cases of TTT these were "donor sex", "IGHV mutation status" and "Binet stage"; whilst in the case of overall survival we used "IGHV mutation status", "Binet stage" and "donor age at diagnosis" as covariates. Three association models were computed: an additive model, a dominant model and a recessive model. P-values were adjusted using the Benjamini-Hochberg (BH) method.
Due to the heterogeneity of exome-seq coverage and quality metrics, many variables had incomplete data. We included in the analysis variables with at least 25% call rate, a minimum of 10 events (progression or death). A minimum allele frequency of 1% was selected as the lowest threshold. Furthermore, we only analyzed polymorphisms where Platypus called at least 10 minor alleles (additive model) or genotypes (dominant and recessive models).
Inflation values were estimated with the R package QQperm [19]. Briefly, a random distribution of p-values was created by randomly permuting phenotype variables. Then, the association p-values are compared with the null. This method doesn't consider the null distribution to be distributed uniformly.

Gene-level analysis
VEGAS2 [20] was used to calculate LD-adjusted association p-values for TTT and OS. Briefly, VEGAS2 takes GWAS p-values, and then uses a simulation-based approach using information from population variant reference panels to adjust for LD effects. We used the 1000 Genomes phase 3 data from the iberian population in Spain as our reference population, since all patients of this cohort were of Spanish origin [21]. Only variants falling within the 5′ and 3′ coordinates of RefSeq genes were included. P-values were adjusted for multiple testing using the BH method.
Pathways and gene ontology (GO) analysis GSEA4GWAS version 1.1 [22] was used for testing significant associations in pathways and biological process annotations. Our input pathways were "Canonical Pathways" and "Gene Ontology Biological Process". Maximum distance was set to 20 Kb, and the major histocompatibility complex region was masked from the analysis. P-values were adjusted with the BH method.
Some of this polymorphisms are associated with functional changes in their corresponding genes. According to dbSNP [14], rs2247870 induces a missense change in the GPR98 gene. In a similar fashion, a search in the HaploReg database [23] points toward functional implications of some of these variants. For example, rs7620924 is located in a lymphocyte-specific enhancer region that is strongly associated with PPP4R2 expression in whole blood (p-value 2.18 × 10 − 34 ) and in lymphoblastoid cells (p-value 2.36 × 10 − 8 ). Similarly, the polymorphisms rs361807, rs361946, rs5992169 and rs1043278 in PEX36 are associated with PEX36 expression in lymphoblastoid cells (minimum p-value 6.66 × 10 − 10 ). rs9463 in TTLL12 strongly correlates with the expression of the adjacent gene TSPO (p-value 9.81 × 10 − 198 ), and to a lower extent, with TTLL12 (p-value 9.57 × 10 − 4 ) in blood cells. Other polymorphisms suggestively associated with TTT such as those in TXNRD2 and ZCCHC7 are also significantly associated with the expression of their respective genes in blood cells. On the contrary, no functional information exists about the intronic variant rs537453728 within the MAP3K4 gene.   Figure S2, Additional file 3: Figure S3 Additional file 4: Figure S4, Table 2, Additional file 9: Tables S4-S6). Nevertheless, 20 variants were below 0.06 in the additive models, with the top variants falling in TTC32, WDR35 and CLIP1. Notably, rs2304588 at TTC32 achieved the lowest p-value (7.6 × 10 − 7 , Bonferroni-adjusted p-value 0.067, BH-adjusted p-value 0.056). Overall, the number of variants with BH adjusted p-values below 0.1 was 25 in the additive model, 4 in the dominant model and 18 in the recessive model.

Discussion
In this work we present evidence that suggest the existence of germline variation modulating CLL's clinical aggressivity. The most remarkable finding was the strong and recessive association of rs7620924 near PPP4R2 with short time to treatment. The implication of PPP4R2 in the regulation of cell survival and DNA repair in hematopoietic and leukemia cells has been recently reported [24]. Indeed, different studies have identified      PPP4R2 as a modulator of protein phosphatase 4 (PPP4), which regulates DNA repair through non-homologous end joining [25]. Concordantly, the ablation of PPP4 activity in mice increases genomic instability and aborgates class switch recombination in B cells, leading to an abnormal immune response [26]; and its function also seems to be essential in V(D) J recombination during normal B cell maturation [27]. Other polymorphisms associated with time to first treatment were located in MAP3K4, PEX26 and TTLL12. MAP3K4 participates in the TRAIL/MAP3K4/p38/HSP27/Akt pathway, thereby modulating processes such as autophagy and cell migration. Indeed, MAP3K4 is affected by recurrent loss-of-function mutations in different types of cancers [28][29][30][31][32]. Conversely, less is known about the peroxisome-related gene PEX26 [33]; the G-protein GPR98; and TTLL12, which participates in chromosome stability and mitosis-related processes [34]. On the contrary, although we did not find any variant significantly associated with overall survival, we devised some variants with suggestive associations. The two most significant ones were in the TTC32/WDR35 and CLIP1 loci, the last of which is overexpressed in Reed-Sternberg cells of Hodgkin lymphoma [35,36].
In a similar fashion, we detected variation on different genes associated with CLL evolution. The most relevant was the association of RIP3K with both time to treatment and overall survival. RIP3K encodes a protein that regulates necroptosis, a form of regulated cell death characterized by cell membrane permeabilization [37]. Other relevant genes associated with rapid progression were the pro-proliferative gene NIFK [38], the tumor suppressor SIK1 [39,40] and ZCCHC7, which is encoded near the B-cell specific PAX5 super-enhancer locus [41,42]. On the other hand, various genes were associated with overall survival, such as CLUAP1, which participates in tumor growth and cytoskeleton regulation [43][44][45]; and the enzyme GAMT, which converts S-adenosylmethionine to creatine in order to foster high energy demands [46]. Moreover, the BCL-2 interacting gene BNIPL [47,48] was suggestively associated with survival and deserves further characterization. In the same direction, the most remarkable pathway-level association with survival was that of the pentose phosphate metabolic pathway, which fuels cells with metabolites for nucleotide and lipid biosynthesis, and provides reducing power to promote cell survival under stressful conditions [49]. Other pathways such as GNα13 and Nitric Oxide were also significant. Concordantly, recurrent inactivating mutations in the G-protein superfamily gene GNA13 have been described in B cell lymphomas [50][51][52][53], and the contribution of nitric oxide to apoptosis resistance in CLL cells has been addressed by various studies [54,55].
The main limitation of this study is the lack of an independent cohort for validation of these findings. Furthermore, although inflation values were low, we assume that treatment heterogeneity could have an impact on overall survival associations. Nevertheless, the global results are not only statistical significant but also biologically plausible. Thus, we believe that this report will motivate further studies in order to confirm the effect of these variants and to determine their mechanisms of action in lymphoproliferative disorders.