Interferon-lambda (IFNL) germline variations and their significance for HCC and PDAC progression: an analysis of The Cancer Genome Atlas (TCGA) data

Background Hepatocellular carcinoma (HCC) and pancreatic ductal adenocarcinoma (PDAC) are malignancies with a leading lethality. With reference to interferons (IFNs) known to mediate antitumor activities, this study investigated the relationship between germline genetic variations in type III IFN genes and cancer disease progression from The Cancer Genome Atlas (TCGA) data. The genetic variations under study tag a gain-or-loss-of-function dinucleotide polymorphism within the IFNL4 gene, rs368234815 [TT/ΔG]. Methods The entirety of the TCGA sequencing data was used to assess genotypes of 187 patients with HCC and of 162 patients with PDAC matched for ethnicity. Stratified for IFNL genotypes, both cohorts were subjected to time-to-event analyses according to Kaplan-Meier with regard to the length of the specific progression free interval (PFI) and the overall survival (OS) time as two clinical endpoints for disease progression. Results Logrank analysis revealed a significant relationship between IFNL genotypes and disease outcome for PDAC. This relationship was not found for HCC. A multiple Cox regression analysis employing patients’ age, tumor grade and tumor stage as further covariates proved IFNL genotypes to be independent predictors for PDAC disease outcome. Conclusion This repository-based approach unveiled clinical evidence suggestive for an impact of IFNL germline variations for PDAC progression with an IFNL haplotype predisposing for IFNL4 expression being favorable.

(CIR) has been coined [6]. Germline genetic variants have been proposed to contribute to CIR including those in IFN signaling genes [6]. IFNs and their effectors have been known for long not only to mediate antiviral but also to edit antitumor host responses [7,8]. They divide into type I (IFN-α n /β), type II (IFN-γ) and type III (IFN-λ [1][2][3][4] ). Among the genes encoding IFN species, only one, type III IFNL4, harbors a common exonic gain-or-loss-of-function variation: while the phylogenetically older variant ΔG enables functional IFN-λ 4 protein expression, the knockout variant TT causes a frameshift thereby disrupting the open reading frame and preventing translation [9,10]. This germline dinucleotide polymorphism, IFNL4 rs368234815 [TT/ΔG] (merged into IFNL4 rs11322783) thus reflects the ongoing process of pseudogenization dividing human beings into those who are predisposed to express IFN-λ 4 protein and into those who are not [11]; it associates with clearance of hepatitis C virus (HCV) and a variety of other disease conditions (reviewed in [12]).
In the context of viral infections, generally, an IFNλ 4 creating genetic background rather was found to be unfavorable for the host. This counter-intuitive relationship was first recognized for HCV infection, when the IFN-λ 4 creating genotypes were shown to be in LD with those IFNL variants that had been identified before to be correlated with poor clearance of HCV infection in genome wide association studies (GWAS) on main ethnic populations [10,13]. Similarly, in human immunodeficiency virus (HIV) infection, the IFN-λ 4 creating genotype was found to be associated with a higher prevalence of AIDS [14] while the non-encoding genotype associates with a lower probability to acquire HIV [15,16]. Also cytomegalovirus (CMV) reactivation is described to be more prevalent in patients encoding for a functional IFN-λ 4 protein [17,18].
The availability of collaborative comprehensive data repositories enables analyses of patient material on a whole genome scale and on large sample sizes. The Cancer Genome Atlas (TCGA) database provides datasets on more than 11,000 cancer patients across 33 tumor entities to the scientific community. Besides demographic and clinical data, TCGA comprises whole exome DNA and RNA sequencing data not only of tissue samples derived from primary tumors but also from corresponding non-malignant material, the latter giving rise to patients' germline genetic background.
By employing TCGA datasets, this study aimed at finding clinical evidence for or against an impact of IFNL germline variations for HCC or PDAC progression. Using the Kaplan-Meier estimator, disease progression was assessed by (i) the length of the specific progression-free interval (PFI) and by (ii) the overall survival (OS) time as two clinical outcome endpoints. A multivariate Cox proportional-hazards model was applied considering patients' age, tumor grade, and tumor stage as covariates along with IFNL genotypes.

TCGA data portal
Analyses are based upon data generated by TCGA (phs000178.v10.p8) which is managed by the NCI and NHGRI. Specifically, projects on HCC (TCGA-LIHC; https://portal.gdc.cancer.gov/projects/TCGA-LIHC) and on PDAC (TCGA-PAAD; https://portal.gdc.cancer.gov/ projects/TCGA-PAAD) were included. The access to controlled datasets was approved by NIH (project ID 20041). Open access demographic (gender, age at diagnosis, race and ethnicity) and clinical data (tumor grade and stage, specific PFI, OS time) were gathered from a curated and standardized dataset named TCGA Pan-Cancer Clinical Data Resource (TCGA-CDR) with a focus on clinical outcome endpoints [19].

Reading-out IFNL genotypes
Controlled access whole exome sequencing (WXS) reads of non-malignant tissue (code 11A) or peripheral blood leucocyte (code 10A) DNA, or sequencing reads of non-malignant tissue RNA, were cut down to the region spanning the IFNL gene cluster (chr19: 39, 230,000 -39,300,000) by using the BAM slice tool before downloading. By using the NCBI genome workbench software, the truncated sequence files were aligned to chromosome 19 of the human genome reference assembly GRCh38.p12. Depending on the depth of coverage, genotypes of up to five nucleotide polymorphisms were read out ( Fig. 1). According to established criteria, a coverage of 20-30 sequence reads was considered reasonably confident. For heterozygous calls, both alleles should have an allele-call score > 10 and the ratio of their scores should be < 3. Call rates for both, the HCC and the PDAC cohort under study, reached 100%. Identical genotypes were obtained irrespective of whether malignant or nonmalignant material was analyzed, on a sample basis.
All other statistical analyses including time-to-event analysis according to the Kaplan-Meier method, and uni-and multivariate Cox proportional-hazard models were performed in R version 3.5.2 [20]. P-values ≤0.05 were considered statistically significant.
Regarding etiology, HCC patients divide into those with no or unknown history of primary risk factors (n = 54), with non-viral risk factors (n = 43), with confirmed HBV (n = 42) or HCV (n = 31) infection or with HBV/ HCV co-infections (n = 41) (data not shown). Most of themaged 63.3 yrs. on average at diagnosispresented with tumor grade G2 (55.7%) and tumor stage I (47.3%) (Tab. 1). Starting at the time of diagnosis, the length of the median observation period for disease progression in terms of the specific PFI was 12.0 mo. During this period, 99 patients faced an event while 88 were censored. Regarding OS time, the median observation period was 22.1 mo. During this period, 77 patients deceased while 110 being censored. Median specific PFI amounts up to 19.7 mo, median OS time was 45.9 mo (Tab. 1).
PDAC patients aged 65.4 yrs. on average at the time of diagnosis presented above all with tumor grade G2, too, but compared to the HCC cohort, with a more advanced tumor stage in the majority of cases. Details to median observation periods and events are listed in Table 1. The median OS time was 20.2 mo, half the less of that for the HCC cohort.

Genotyping
Patients' genotypes were raised from up to five polymorphic sites within the IFNL gene cluster by aligning whole exome DNA and RNA sequence reads from nonmalignant material to a reference genome. Coverage at IFNL4 rs368234815 was found to be insufficient for most of the HCC and PDAC samples. An LD analysis based on the data of the 1000 Genomes Project and adjusted for the European population demonstrates that all polymorphic sites under study are in a nearly complete LD to each other ( Fig. 1). Due to similar minor allele frequencies (MAF), four of them qualify as mutual tagSNPs (Fig. 1

Analysis of disease progression with regard to IFNL genotypes
The length of the specific PFI and the OS time were chosen as clinical endpoints for disease progression. By employing Kaplan-Meier analyses, both parameters were analyzed with regard to patients' IFNL genotypes.
For HCC patients, the length of the median specific PFI did not relate to the number of the IFNL3 rs4803217 alleles (gene dosage) (i.e., 18.4 mo: 21.0 mo: 14.9 mo for CC: CA: AA). The lack of this relationship became apparent also in Kaplan-Meier graphs (Fig. 2). The logrank test confirmed a lack of a significant difference in the length of the specific PFI between IFNL3 rs4803217 C allele homozygotes and A allele carriers (p = 0.65). Similar nonsignificant results were obtained when the OS time as an endpoint of disease progression was analyzed with regard to IFNL3 genotypes (p = 0.87, logrank test).
For PDAC patients, the length of the median specific PFI was shorter for IFNL3 rs28416813 CC homozygotes  (Fig. 2). The logrank test revealed a significant difference of the length of the specific PFI with CC homozygotes (corresponding to patients not encoding IFNL4) showing earlier disease progression than G allele carriers (p = 0.01). Similar results were obtained for OS time as a further clinical endpoint (p = 0.05, logrank test).

Uni-and multivariate analyses of disease progression
In order to find out whether IFNL genotype is an independent parameter relating to disease progression, a multivariate Cox proportional-hazard model was applied. Parameters that revealed significant association in the univariate analysis were considered as covariates. For HCC, univariate analysis revealed tumor grade and tumor stage to be significantly related to the length of the specific PFI, while patients' age was found to be significantly related to OS time (Tab. 2). Multivariate analysis revealed lower tumor grade compared to higher grade (G1 vs G2) tend to be independently associated with a twice as long specific PFI. A higher stage was found to be related to an up to 2-fold shortened specific PFI. Regarding OS time, multivariate analysis revealed patients' age at diagnosis to be the only independent predictor (Tab. 2).
For PDAC, univariate analyses similarly revealed tumor grade and tumor stage to be related to the length of the specific PFI, in addition to IFNL genotypes (Tab. 3). Multiple analysis proved tumor stage and IFNL genotypes to be independently and significantly related to disease Fig. 2 Time-to-event analyses for the length of the specific PFI and for OS time according to Kaplan-Meier for HCC and PDAC patients. HCC and PDAC patients were stratified for IFNL3 rs4803217 and IFNL3 rs28416813 genotypes, respectively. The probability of the absence of an event, which is progression (upper panels) or death (lower panels), is given in Kaplan-Meier graphs for each genotype for a period of 4 yrs. for HCC or 3 yrs. for PDAC as indicated. Dotted lines indicate the median specific PFIs and the median OS times. Tables list absolute and relative numbers of patients at risk (living and non-censored). A logrank test yielded a significant relationship between IFNL genotypes and disease outcome for PDAC patients (p(PFI) = 0.01, and p(OS) = 0.05, IFNL3 rs28416813 CC vs CG/GG) but not for HCC patients (p(PFI) = 0.65 and p(OS) = 0.87, IFNL3 rs4803217 CC vs CA/AA). This test was performed by comparing carriers of SNP variants that correspond to their ability to express IFNL4 (light blue and yellow) to knockout variant homozygotes (dark blue) progression in terms of the length of the specific PFI. Patients with tumor stage I face a 63% less probability for progression when compared to patients with stage II (p = 0.03). Patients with IFNL genotypes corresponding to the ability to express a functional IFN-λ 4 protein had a 39% lower risk to face progression than patients with an IFNL4 knockout haplotype (p = 0.02) (Tab. 3).
Regarding OS time, patients' age was found to be significantly related to this endpoint, too, in the univariate model. The multivariate model revealed patients with tumor stage II to face higher risk of mortality than patients with tumor stage I (p = 0.06), however, with reservations (Tab. 2). Multivariate analysis, moreover, yielded a tendency of an association for IFN-λ 4 creating genotypes and a lower risk to decease (32%) when compared to IFNL4 knockout haplotypes (p = 0.09).

Discussion
Based on TCGA datasets, this study revealed significant associations between IFNL germline variations and progression of PDAC in terms of the length of the specific PFI and the OS time as two clinical endpoints. These IFNL variations are in nearly complete LD to a dinucleotide polymorphism that controls IFNL4 gene expression (IFNL4 rs368234815). By performing a multiple regression analysis including patients' age at diagnosis, tumor stage, and tumor grade as further covariates, IFNL variation was proven to be an independent parameter for the length of the specific PFI (p = 0.02) and with a tendency to significance also for OS time (p = 0.09). This relationship was not observed for a cohort of patients matched for ethnicity but diagnosed for HCC.
A genetic background corresponding to the ability to express a functional IFN-λ 4 protein, i.e. carriers of the IFNL4 rs368234815 creating ΔG allele, was found to be related to a delayed progression of PDAC disease, i.e. being favorable. As outlined above, in the context of viral diseases, an IFN-λ 4 creating genetic backgroundin generalis unfavorable for the host.
This disadvantage is also seen in the context of some cancer diseases, in particular for cancer entities with a virus related etiology. For instance, the IFNL4 rs368234815 ΔG allele was shown to be associated with prostate cancer among men at increased risk of sexually transmitted infections [21]. In an independent study, this allele was shown to be related to significant decreased overall survival of African-American men with prostate cancer [22]. Moreover, susceptibility to AIDSrelated Kaposi's sarcoma was also found to be associated with genotypes predicted to produce an active IFN-λ 4 [23].
In contrast, for PDAC, an entity for which no virus-related etiology is supposed, we found the genetic predisposition to encode for IFN-λ 4 to be favorable for the outcome in terms of the length of the specific PFI and of the OS time. Even if biology of IFN-λ species is not yet completely understood, this is in accordance with the supposed general antitumor activity of type III IFNs [24,25]. TCGA also provides information on cancer treatments. Where available, data comprise the type of the therapy, its starting date and duration, and the response to it. All of the HCC patients included into our analysis are documented to have received surgery, i.e. liver lobectomy or segmentectomy. Some of them received ablation (n = 40), adjuvant radiation (n = 7), or one or several regimens of chemotherapy (n = 12). Likewise, all of the PDAC cases under investigation were subjected to partial or total pancreatectomy. Some of them received adjuvant radiation therapy (n = 37) or adjuvant chemotherapy (n = 109). Our analyses were performed disregarding therapeutic schemes or their outcomes, which is a limitation. However, our analyses focusing on disease outcome in terms of the length of the specific PFI and OS time, are based on the assumption that patients were receiving the best possible care according to their individual health conditions. Nevertheless, the significant but weak association between IFNL genotype and clinical outcome for PDAC patients in the whole cohort might mask stronger associations within subgroups, e.g. among patients who are responding or nonresponding to a cytostatic therapy. Data thus might reflect therapy responsiveness that, in turn, might translate into disease outcome. Accordingly, this association between IFNL variants and disease progression might be more prominent for PDAC than for HCC patients due to a higher proportion of patients subjected to chemotherapy, i.e. 109/162 (73.3%) vs 12/187 (6.4%), respectively. Whether treatment response is underlying the association between IFNL genotypes and cancer disease progression needs to be addressed in a separate analysis with a higher sample number. Alternatively, the lack of a relationship between IFNL genotypes and HCC progression might be related to that proportion of cases with viral etiology that distinguishes the HCC cohort from the PDAC cohort. The HCC cohort under study includes more than half of the patients (61%) with HBV infection (n = 42), with HCV infection (n = 31) or with HBV/HCV coinfections (n = 41).

Conclusion
By employing a collaborative oncologic data repository with a given number of cases, TCGA facilitated the exploratory mining of clinical evidence suggestive for of an impact of IFNL germline variations on PDAC cancer progression. An IFNL haplotype predisposing for IFNL4 gene expression appeared to be favorable for the host, which is in line with the concept of antitumor activities of type III IFNs. Further analyses will regard therapeutic interventions as additional variates. a Cox regression analyses were performed for complete data sets (n = 158) * significant p-value (p ≤ 0.5) (*) tendency to significant p-value (p ≤ 0.1) Results with a 95% CI including 1 and/or a significant Schoenfeld residual have to be considered with reservation.