Do genetic factors protect for early onset lung cancer? A case control study before the age of 50 years

Background Early onset lung cancer shows some familial aggregation, pointing to a genetic predisposition. This study was set up to investigate the role of candidate genes in the susceptibility to lung cancer patients younger than 51 years at diagnosis. Methods 246 patients with a primary, histologically or cytologically confirmed neoplasm, recruited from 2000 to 2003 in major lung clinics across Germany, were matched to 223 unrelated healthy controls. 11 single nucleotide polymorphisms of genes with reported associations to lung cancer have been genotyped. Results Genetic associations or gene-smoking interactions was found for GPX1(Pro200Leu) and EPHX1(His113Tyr). Carriers of the Leu-allele of GPX1(Pro200Leu) showed a significant risk reduction of OR = 0.6 (95% CI: 0.4–0.8, p = 0.002) in general and of OR = 0.3 (95% CI:0.1–0.8, p = 0.012) within heavy smokers. We could also find a risk decreasing genetic effect for His-carriers of EPHX1(His113Tyr) for moderate smokers (OR = 0.2, 95% CI:0.1–0.7, p = 0.012). Considered both variants together, a monotone decrease of the OR was found for smokers (OR of 0.20; 95% CI: 0.07–0.60) for each protective allele. Conclusion Smoking is the most important risk factor for young lung cancer patients. However, this study provides some support for the T-Allel of GPX1(Pro200Leu) and the C-Allele of EPHX1(His113Tyr) to play a protective role in early onset lung cancer susceptibility.


Conclusion:
Smoking is the most important risk factor for young lung cancer patients. However, this study provides some support for the T-Allel of GPX1(Pro200Leu) and the C-Allele of EPHX1(His113Tyr) to play a protective role in early onset lung cancer susceptibility.

Background
Lung cancer is the most common cause of death from cancer in the world. The estimated total number of cases is 1.2 million annually and is still increasing [1,2]. For men lung cancer mortality is declining in Germany since nearly a decade, whereas the incidence in women is increasing. However for men and women of age 50 or younger the incidence of lung cancer is low [1].
The median age of onset is 66 years; about 5% to 10% of patients are younger than 50 years. These young patients differ from older patients regarding the distribution of sex, histological type of the neoplasm and in genetic susceptibility [15][16][17][18][19][20][21][22][23][24]. Smoking remains to be the major risk factor in these younger patients [25][26][27], but familial aggregation of lung cancer was identified as a consistent additional risk factor in several epidemiological studies [28][29][30][31][32][33][34][35][36][37]. Recent investigations from Germany showed a 2.6-fold increased lung cancer risk in young patients (OR, 95% CI 1.6-6.0) if first degree relatives had cancer [27] and a 4.7fold increased risk if a parent or sibling was affected with lung cancer [38]. Even for nonsmokers in the age between 40 and 59 an increase of the lung cancer risk up to 6-fold was seen in the presence of lung cancer in a first-degree relative [39].
The results of a segregation analysis suggest the presence of a high risk gene contributing to early-onset of lung cancer particularly in nonsmokers [40]. Another indication for a genetic contribution to lung cancer in the young is given by a larger increase of risk in monozygotic compared to dizygotic young twins, which was more evident in female than in male twins [41]. No such risk differences could be seen in a cohort of twins older than 50 [42].
Hence, the etiology of lung cancer in patients before age 50 seems to differ from that in older patients by a stronger genetic component, likely to interact with the exposure to tobacco smoke.
While smoking the body absorbs numerous carcinogens that need to be eliminated. In recent years several cytogenetic and molecular biological studies indicated chromosomal regions or candidate genes as linked to or associated with lung cancer. For example, a major susceptibility gene locus in the region of 6p23-25 was found to be linked in families with three or more individuals affected by lung, throat, or laryngeal cancer [43]. It can be hypothesized that gene products regulating phase I and II enzymes [15,44,45], tumor suppressor genes [46] and DNA repair genes [47][48][49] are associated with the development of lung cancer, but results are contradictory.
This study was set up to further clarify associations of DNA variants in candidate genes for metabolizing enzymes, a tumor suppressor gene and genes relevant for DNA repair and their interaction with smoking in lung cancer patients with age of onset 50 years or younger.

Study Design and Study Subjects
We carried out a frequency matched case-control study.
Caucasian patients with newly diagnosed and histologically or cytologically confirmed primary lung cancer with age 50 years or younger were recruited in 21 major lung clinics across Germany. They completed an intervieweradministered questionnaire in which detailed information on personal history, history of lung diseases, family history of cancer and smoking habits were assessed. Blood samples were taken from all patients. A DNA bank was established. From July 2000 to April 2003 blood samples and case report forms were obtained from 247 young lung cancer patients. One of the patients was excluded because the parents were Vietnamese.
Cancer free control individuals are a random sample (frequency matched by 5-year age categories and sex to cases) drawn from the participants of a population based survey (KORA -Cooperative Health Research in the Region of Augsburg [50], survey S4). KORA, a continuation of the WHO MONICA study, provides a platform for research in epidemiology, health economics and genetics, where data and blood samples can made available. Since 1984/85 four representative surveys have been performed, including approximately 4000 -5000 adults each.
After excluding one control individual because the preparation of the blood sample for genotyping failed, a total of 246 cases were subsequently compared with 223 control individuals. With respect to yet genotyped markers neither major population stratification between the whole KORA sample (southwest Germany) or two other cohorts from Northern Germany [51] nor deviations in the Minor Allele Frequencies (MAFs) to those of the HapMap Ceu Population could be detected.
The study was approved by the ethics committee of the Bayerische Landesärztekammer München and all necessary local ethic committees of the involved recruitment clinics. All participants signed an informed consent form.

Selection of candidate genes and single nucleotide polymorphism (SNP)
The selection of DNA variants in candidate genes for this study was based on two criteria: published significant association together with plausible biological relevance of a polymorphism to lung cancer. We searched MEDLINE for reviews about the genetics of lung cancer published between 1995 and 2002. (Search term: "Lung Neoplasms/ genetics" [Mesh] AND ("molecular" [TI] OR "gene" [TI] OR "genetic" [TI]) Limits: Publication Date from 1995/1/ 1 to 2003/1/1, Humans, Review, English, German). From 138 hits, we selected 35 by screening for promising titles or abstracts. All mentioned genes and DNA variants with significant association in these reviews and in a wide ranging selection of original study reports were listed. Two experts in the molecular biology of cancer rated these DNA variants for plausible biological relevance to lung cancer. The final selection furthermore also needed to meet limited financial constraints.

Blood Sampling and Genotyping
Blood samples (4 × 9 ml) were taken by clinicians and sent to the study center (GSF-National Research Centre for Environment and Health) within 24 hours. Immortalized cell lines were prepared and stored in liquid nitrogen. DNA was isolated from fresh or frozen blood using the DNA isolation kit of Gentra, Minneapolis, and stored at -80°C. Genotyping of SNPs was performed by matrixassisted laser desorption/ionization time-of flight (MALDI-TOF MS, Sequenom) according to Weidinger et al. [50]. Standard genotyping quality control including 10% duplicate samples, checking for Hardy-Weinberg equilibrium as well as negative samples revealed no major errors. In none of the duplicate samples a deviating genotype could be determined.

Statistical Methods
When investigating potential modifications for lung cancer risk by marker genotypes, we considered sex, age and smoking habits as covariates. Patient's age was defined by age at first diagnosis, while for KORAS4 controls age at recruitment was recorded. Cumulative smoking exposure of former and current smokers was measured as packyears (PY). Cases and controls were grouped according to their smoking exposure level (SEL) into never and light smokers (≤1 PY), moderate (1-<20 PY) and heavy smokers (20 and more PY).
For cases we collected the smoking history in detail, as recommended [52]. For controls PY had to be approximated from the last amount of cigarette consumption per day and the duration of smoking. Preliminarily we classified all cases into the upper mentioned grouped by both concepts. We found these classifications to agree for 79% of cases. A similar agreement had been found by Bernaards et al. [53], when comparing retrospectively calculated PY with prospectively calculated PY. They concluded, that misclassification error in categorizing PY is smaller than quantitative error on continuous retrospectively PY calculation. Hence, we assume the use of PY groups to be at least as reliable as the collected retrospectively PY.
Using exact tests the distribution of histological subtypes was compared to a published German collection of 251 lung cancer patients with age of onset before the age of 46 years [20].
Hardy Weinberg equilibrium (HWE) was tested in controls using a likelihood ratio test [54].
As exact tests of genetic association we performed a BWS-Test (Baumgartner-Weiss-Schindler-Test [55]). These tests were also carried out in subsamples according to sex, age (grouped in age of onset ≤ 45 years and age of onset = 46-50 years, which almost splits the sample in two equally sized groups), smoking status (never, former and current smoker) as well as SEL (never and light, moderate, heavy smokers) and histological tumor subtype (small-cell, SCC and adenocarcinoma). For markers showing any significant association we performed two logistic regression models including age and sex as covariables. In model I smoking exposure level (SEL) was incorporated as the main effect while the genotype was nested within the SEL groups. Thus the genetic association was investigated nested within the SEL groups. The relative chance for lung cancer is estimated compared to genetic protected of the same smoking exposure. We also test for modification of the genetic effect by smoking by testing the contrast between the estimated parameters of model I for never or light smokers versus for moderate or heavy smokers.
In model II SEL-genotype interaction was directly included. Here all effects are given in comparison to wildtype-never and lightsmokers. The relative chance for lung cancer is estimated compared to genetic protected never or light smokers. Similar models were fitted with smoking status instead of SEL.
When appropriate, only subgroups of patients of a particular histological subtype were included. Motivated by single marker results on two genes we defined a genetic protection score (gPS) as the count of protective alleles, which are the T-allele of GPX1(Pro200Leu) and the Callele of EPHX1(His113Tyr). Logistic regression including gPS was performed as described above.
We also carried out a sensitivity analysis for missing data. The level of significance was set to 5% for all tests. To take multiple comparisons into account, the p-values of BWS tests were interpreted at the familywise significance level of 5%/11 = 0.445%.

Results
Most patients were men (75%). The median age at diagnosis was 46 years for both sexes, which ranges from 24 to 50 years. For about 80% of the cases both parents were originating from Germany, further 8% had at least one German parent. Almost all non-German parents came from other European countries or North-America. Table 1 shows the characteristics of patients and controls.

Histological subtypes of lung cancer
As expected, the leading histological subtypes were squamous-cell carcinoma (SCC) in men (30%) and adenomacarcinoma in women (37%). The gender specific distributions of histological subtypes within cases were similar to an other German study [20]. For more details see Table 1.

Smoking habits
Nearly all patients (97% of men, 87% of women) were ever smokers (current or former) with high tobacco consumption (current smokers: mean 32.4 PY, former smokers: mean 28.8 PY). According to the cumulative smoking * for controls packyears (py) have been calculated from the current amount of cigarette consumption per day and the duration of smoking dose 2 of 3 patients (men: 73%, women: 63%) were classified as highly exposed to tobacco (≥20 pack years), while in controls this were 14% and 5%, respectively. Furthermore, the frequency of 55% female smokers among patients clearly exceeded the nationwide percentage given by the micro-census 1998 (31% at 15 to 50 years of age) [56].

Genotypes
The call rates of genotyping were on average 91% across markers.
Estimated allele frequencies are given in Table 2. Significant departures from HWE were not found in controls and only for p53 (Arg72Pro) in patients (p = 0.0384). Results of BWS-tests for a genetic association, an estimator of a main genetic effect within the total study population, age groups, male and female and current smokers are given in Table 3.

Genetic association analysis
The estimated odds ratios for lung cancer were OR = 6.6 (95% CI: 3.4-12.8) for moderate and OR = 22.7 (95% CI: 11.9-43.3) for heavy smokers without taking any genetic marker information into account.

GPX1(Pro200Leu)
Among the 11 markers investigated, only the marker for the GPX1(Pro200Leu) gene showed a significant difference in the distribution of genotypes between all cases and controls (p BWS-Test = 0.002). The variant T-allele was associated with a lower risk for lung cancer and showed a frequency of 22% (95% CI: 20%-24%) in cases, compared to 31% (95% CI: 29%-33%) in controls. Significant association could also be observed within men (p BWS- Because of the small number of never or light smokers beyond cases no significance (p = 0.9012) was achieved when testing for modification or the genetic effect by SEL groups. Even if the T-allele seems to have some protective  Table 4.
We could find a risk decreasing genetic effect for C-carriers only for moderate smokers (OR = 0.2, 95% CI:0.1-0.7, p = 0.012). Please note the small number of TT-carriers within cases and controls, which lowers the evidencenot the significance -of this finding. No such significant effect was found for heavy smokers (OR = 0.8, 95% CI:0.3-1.9, p = 0.593), where the 95%-confidence interval for OR does not cover the point estimate of OR for moderate smokers. For more details see Table 4. Because of the small number of never or light smokers beyond cases no significance (p = 0.1898) was achieved when testing for modification or the genetic effect by SEL groups.

GPX1(Pro200Leu) and EPHX1(His113Tyr)
As a combined effect of these two polymorphisms, one might look at the count of protective alleles (T for GPX1(Pro200Leu), C for EPHX1(His113Tyr)) as a genetic prediction score (gPS In the presence of one protective allele only (gPS = 1) no differences in the decrease of lung cancer risk compared to gPS = 0 could be found between moderate (OR = 0.5, 95% CI:0.1-3.0), heavy (OR = 0.4, 95%: 0.1-1.3) and never and light smokers (OR = 0.4, 95% CI:0.1-2.0) (see Figure  1 and Table 5). In the presence of two or more protective alleles (gPS > 1) the risk for lung cancer further decreases for moderate and heavy smokers (OR = 0.2, 95% CI:0.1-0.6). For never and light smokers no further risk reduction was observed. However, the subsample of never and light smokers is too small to gain statistical evidence for such a conclusion (only 2 never and light smoking cases have a gPS > 2).
Please note, even if both DNA-variants independently showed some protective effect for smoking exposed individuals, the risk for lung cancer in the double protected ever smokers (gPS ≥ 3) was significantly higher (OR = 4.8, 95% CI: 1.8-20, p = 0.028) compared to genetically unprotected never and light smokers (gPS = 0).

GSTP1(A-193C)
For GSTP1(A-193C) we did not find an overall genetic association (p BWS = 0.213, crude OR per G-allele = 1. For none of the other markers any significant genetic association was observed.

Discussion
Some chromosomal regions or candidate genes are indicated as associated with lung cancer of any age of onset, additionally and/or interactively to the main risk factor tobacco smoking by several studies yet. For lung cancer with age of onset before age 50 a consistent familial aggregation was observed. The main interest of this study was to investigate the role of some candidate markers for earlyonset lung cancer patients.

GPX1(Pro200Leu)
Antioxidant enzymes like glutathione peroxidase (GPX) are thought to be the primary cellular defence mechanism against reactive oxygen species. The lung epithelium is in particular endangered by exogenous NOx that causes epoxides, aldehydes and peroxides. They react to superoxidradical anion and hydrogen peroxide and in the presence of transition metal ions these continue to react to the aggressive OH radical. These reactive oxygen species (ROS) have the ability to cause massive injury to the cell. They are involved in inflammation processes, peroxidation of membranes which influences their permeability, binding on SH-groups of several enzymes which interferes its activity. Extracellular and intracellular antioxidative defence systems protect the cells from this damage. Extracellular defence is mainly done by small molecular particles like vitamins and small molecular proteins. Intracellular antioxidative defence mostly consists of the anti oxidative enzymes from the glutathion redox cycle (glutathion reductase and glutathion peroxidase). GPX is a tetramerical enzyme with four selenium-atoms bound as selenocystein in the active centre. It is important in the cellular defence against cytotoxic lipid peroxidation products [62]. The catalytic activity of GPX depends on the availability of reduced glutathione as coenzyme and on several endogenous and exogenous influences like genotype and nutrition. Smokers need more protection against . By this procedure, the sample size of effects in lower hierarchical steps becomes fairly small. They included in their final conclusion a non-significant effect of GPX1(Pro200Leu) within never-smokers, but didn't show any effect by GPX1(Pro200Leu) within smokers. In contrast to them we could see significant association by GPX1(Pro200Leu) for heavy SEL, as for current and former smokers, with ORs from 0.3 to 0.5. We missed significance for never and light and for moderate smokers possibly owing to low sample size. Nevertheless, the estimated effect was of the same size. Hence, T-carriers seem to have some genetic protection against lung cancer within smokers of age 50 years or less. Sensitivity analysis could demonstrate that even under worst conditions for missing genotypes the findings were qualitatively identical and genotyping errors appeared as missing at random. Finally, findings from our sample of early-onset cases are consistent with the previous reports, when restricting samples to age of 50 years or less [60]. However they are in conflict with findings from non-early-onset samples [63,66].
Several factors have yet been identified to modify the activity of GPX. The consumption of fruits and vegetables as well as a supplementation with the trace elements selenium in populations with a low rate of daily intake affect the activity of GPX in human erythrocytes [67]. Serum concentrations of selenium and erythrocyte GPx activity were lower in smokers [68]. Additionally it was reported that alcohol induces lipid peroxidation which might lead to a decrease in GPX activity. Ravn-Haren and colleges could recognize a correlation between alcohol consumption and GPX activity to be modified by the GPX(Pro200Leu) genotype [63,69]. Stronger association between smoking, alcohol intake and lung cancer was seen in carriers of the genotype TT of GPX1(Pro200Leu) than in carriers of genotype CC [11].
Thus the observed association between GPX(Pro200Leu) and the risk for lung cancer might be caused by the complex interplay between smoking, nutrition and GPX activity.

EPHX1
Microsomal epoxide hydrolase (EPHX1) has a putative dual function for enzyme activity which possibly modifies lung cancer risk. On the one hand EPHX1 catalyzes the hydrolysis of epoxides to less reactive substances easier to be solubilised. On the other hand it activates some acrylamine metabolites or polycyclic aromatic hydrocarbons of cigarette smoke into a more carcinogenic form [57]. It is also reported that endotoxin in organic dust induces lung function decline. The strength of such a longitudinal decline is modified by the investigated EPHX1 polymorphism [70].
The activation or inactivation effects of EPHX1 may depend on the specific compounds being metabolized. Changing the structure of the enzyme via polymorphisms in EPHX might have both, protective or promotional effect on developing of lung cancer in smokers. There are two mainly discussed variants of the EPHX gene, one in exon 3 and the other in exon 4. In exon 3 a C has been substituted for a T, resulting in an amino acid exchange at codon 113 (Tyr113His). This amino acid exchange results in a decreasing enzymatic activity (40%-50%) in vitro. In exon 4 a C to A transition causes a histidine to arginine change at codon 139 (His139Arg) with an in vitro increasing enzyme activity (25%) [71].
Two meta-analyses have been recently published investigating the genetic impact of EPHX1(His113Tyr) (T→C polymorphism in exon 3) without age constraints and did not find an association with lung cancer (OR = 0.96, 12 studies included [46] and OR = 0.98, 7 studies included [72]). However, Lee et al. [72] reported a significant decrease in lung cancer risk after adjustment for age, sex, smoking and study centre in pooling data of four published and four unpublished case-control studies (OR = 0.7, 95% CI: 0.51-0.96), which is confirmed for a white population in a recently published meta-analysis (5 studies combined, OR = 0.65, 95% CI: 0.44-0.96) [73]. The authors of the first meta-analyses suggested a possible protection for heavy smokers carrying the CC genotype which is in line with our results in young high tar exposed lung cancer patients (OR = 0.8, 94% CI: 0.3-1.9). However, the risk reduction at a moderate level of smoking exposure in our study was estimated even stronger by a point estimate of OR = 0.2, lower than the confidence interval given by Lee et al. [60].

GPX1(Pro200Leu) and EPHX1(His113Tyr)
In combining both observed protective alleles of both genes we defined gPS, a genetic protection score. We could observe a positive association between the count of protective alleles and the reduction of tobacco smoke induced risk for lung cancer (OR = 0.20, 95% CI: 0.07-0.60).
Within our control group 53% are T-carriers for the GPX1(Pro200Leu) variant and 50% are C-carriers for the EPHX1 variant. Therefore, we might expect 3 out of 4 individuals of the population to have some genetic protection against lung cancer at younger age. However, the risk raising effect of smoking cigarettes is much stronger. Even under double protection by GPX1(Pro200Leu) and EPHX1(His113Tyr) the risk of current smokers is at least 4.5-times larger than in unprotected never and light smokers.
In conclusion, our study investigates the association of several candidate genes with lung cancer in the young. Only some of the results from this sample of early-onset lung cancer patients are consistent with previously reported age independent findings or suspicions. However, their role in the developing process is different.

Some remarks to the study design
We used a candidate gene approach based on the literature lung cancer as a whole. Thus, we can identify no other than previously reported susceptible genes to general age of onset within our young age sample. We also restricted considerations to the most promising marker per gene and did not consider haplotypes within candidate genes. So far, results presented here need to be understood as further investigation of controversial findings. We limited the chance of false positive results by applying a two-step strategy. First we performed two-group-comparisons with BWS-tests, followed by multiple logistic regression modeling for selected markers only.
For four markers the call rate of genotyping is shortly lower than 90%, which results from some suboptimal logistic in the early phase of the study and is not due to genotyping errors.

Conclusion
Smoking is the most important risk factor for young lung cancer patients. However, this study provides some support for the T-Allel of GPX1(Pro200Leu) and the C-Allele of EPHX1(His113Tyr) to play a protective role in early onset lung cancer susceptibility.