Noninvasive detection of lung cancer by analysis of exhaled breath

Background Lung cancer is one of the leading causes of death in Europe and the western world. At present, diagnosis of lung cancer very often happens late in the course of the disease since inexpensive, non-invasive and sufficiently sensitive and specific screening methods are not available. Even though the CT diagnostic methods are good, it must be assured that "screening benefit outweighs risk, across all individuals screened, not only those with lung cancer". An early non-invasive diagnosis of lung cancer would improve prognosis and enlarge treatment options. Analysis of exhaled breath would be an ideal diagnostic method, since it is non-invasive and totally painless. Methods Exhaled breath and inhaled room air samples were analyzed using proton transfer reaction mass spectrometry (PTR-MS) and solid phase microextraction with subsequent gas chromatography mass spectrometry (SPME-GCMS). For the PTR-MS measurements, 220 lung cancer patients and 441 healthy volunteers were recruited. For the GCMS measurements, we collected samples from 65 lung cancer patients and 31 healthy volunteers. Lung cancer patients were in different disease stages and under treatment with different regimes. Mixed expiratory and indoor air samples were collected in Tedlar bags, and either analyzed directly by PTR-MS or transferred to glass vials and analyzed by gas chromatography mass spectrometry (GCMS). Only those measurements of compounds were considered, which showed at least a 15% higher concentration in exhaled breath than in indoor air. Compounds related to smoking behavior such as acetonitrile and benzene were not used to differentiate between lung cancer patients and healthy volunteers. Results Isoprene, acetone and methanol are compounds appearing in everybody's exhaled breath. These three main compounds of exhaled breath show slightly lower concentrations in lung cancer patients as compared to healthy volunteers (p < 0.01 for isoprene and acetone, p = 0.011 for methanol; PTR-MS measurements). A comparison of the GCMS-results of 65 lung cancer patients with those of 31 healthy volunteers revealed differences in concentration for more than 50 compounds. Sensitivity for detection of lung cancer patients based on presence of (one of) 4 different compounds not arising in exhaled breath of healthy volunteers was 52% with a specificity of 100%. Using 15 (or 21) different compounds for distinction, sensitivity was 71% (80%) with a specificity of 100%. Potential marker compounds are alcohols, aldehydes, ketones and hydrocarbons. Conclusion GCMS-SPME is a relatively insensitive method. Hence compounds not appearing in exhaled breath of healthy volunteers may be below the limit of detection (LOD). PTR-MS, on the other hand, does not need preconcentration and gives much more reliable quantitative results then GCMS-SPME. The shortcoming of PTR-MS is that it cannot identify compounds with certainty. Hence SPME-GCMS and PTR-MS complement each other, each method having its particular advantages and disadvantages. Exhaled breath analysis is promising to become a future non-invasive lung cancer screening method. In order to proceed towards this goal, precise identification of compounds observed in exhaled breath of lung cancer patients is necessary. Comparison with compounds released from lung cancer cell cultures, and additional information on exhaled breath composition in other cancer forms will be important.


Background
Lung cancer is one of the leading causes of death in Europe and the western world. At present, diagnosis of cancer very often happens late in the course of the disease since available diagnostic methods are not sufficiently sensitive and specific.
Current interest is not only focused on breath of lung cancer patients but also on emission of volatile organic compounds (VOCs) from lung cancer tissue and lung cancer cell cultures [23][24][25][26]. In our own investigations we observed the release (or consumption) of various compounds for the cancer cell lines A549, CALU-1 [25], NCI-H2087 [26] and NCI-H1066 (Sponring A, Filipiak W, Mikoviny T, Ager C, Schubert J, Miekisch W, Amann A, Troppmair J: Release of volatile organic compounds from the lung cancer cell line NCI-H1666 in vitro, submitted). It would be most interesting to investigate jointly the composition of exhaled breath of a particular patient and in vitro release of compounds from cancer cells obtained from the same patient (during a tumor resection). Different analytical techniques have been used for analysis of exhaled breath and headspace of cancer cell cultures. One of the most useful technique is gas chromatography and mass spectrometry (GCMS) [5,7,9,11,18,23,[27][28][29][30][31][32][33][34][35]. This technique gives the most detailed analytical information. Other techniques, such as sensor arrays [36] and proton transfer reaction mass spectrometry (PTR-MS) [8] or selected ion flow tube mass spectrometry (SIFT-MS) [37][38][39] do not need any preconcentration step and can work in on-line mode, even in breath-to-breath resolution.
In the present manuscript we present results obtained with GCMS with preconcentration by solid phase microextraction (SPME), and results obtained with PTR-MS. Both techniques have their advantages and shortcomings. PTR-MS measurements do not provide as much information as GCMS, but the technique is more easy to handle. Therefore the number of patients or volunteers investigated by PTR-MS in our laboratory is much higher than the number of persons investigated by GCMS.
It has been stated by Phillips et al., that 3481 different compounds have been observed in exhaled breath. Some of these compounds arise in higher concentrations in inhaled air than in exhaled breath, having thus "negative alveolar gradient" [40]. A total of 1753 different compounds was described as having "positive alveolar gradient", i.e., arising in higher concentration in exhaled breath than in inhaled air. Michael Phillips et al. used adsorption of exhaled breath on adsorbents (solid phase extraction, SPE) with subsequent thermodesorption (TD) and GC-MS-analysis [5,7,27,41,42]. In their work, identification of the peaks has been done by spectral library match only without confirmation of retention time.
Among sample preparation methods, solid phase microextraction (SPME) is one of the most frequently used. SPME offers a simple and inexpensive alternative to preconcentration on sorbent tubes followed by thermal desorption (SPE). SPME was developed in the late 1980s by Arthur and Pawliszyn [43,44]. This technique was successfully used in determination of volatile compounds in different matrices in environmental, toxicological, and pharmacological analysis. SPME has been also applied to analysis of VOCs in human breath [45].
For the validated identification of potential marker compounds, we prepared calibration mixtures of the respective pure compounds in order to determine the retention time and mass spectra. A particular focus of the present investigation was the comparison of smokers vs. nonsmokers within the group of lung cancer patients, and the exclusion of smoking-related compounds from comparisons between lung cancer patients and healthy controls. We recently performed a PTR-MS investigation comparing healthy smokers with healthy non-smokers and identified 7 different mass-to-charge ratios (in the range of 20 ≤ m/ z ≤ 231) showing statistically significant differences between smokers and non-smokers [34]. SPME-GCMS identifies compounds, but is a relatively insensitive method. Hence compounds not appearing in healthy volunteers' exhaled breath may appear at lower concentrations below the limit of detection (LOD). In addition, SPME-GCMS is only semiquantitative due to competitive absorption on the SPME-fibre. PTR-MS, on the other hand, does not need preconcentration and gives much more reliable quantitative results. The shortcoming of PTR-MS is that it cannot identify compounds with certainty, and several compounds may overlap on a particular mass-to-charge ratio [46]. Hence SPME-GCMS and PTR-MS complement each other, each method having its particular advantages and disadvantages.

GCMS analysis
The GCMS analysis was performed on Agilent 5975 Inert XL MSD coupled with 7890 A gas chromatograph (Agilent, Waldbronn, Germany) with split-splitless injector. The temperature of injector was 290°C. The splitless time was 1 min, while split ratio was 1:50. Helium was used as a carrier gas at velocity 0.8 ml min -1 . The MS analyses were carried out in a full scan mode, with scan range 35 -200 amu. A scan rate of 3.46 scan/s was applied. Electron impact ionisation at energy 70 eV was used for every measurement. The ion source, quadrupole and transfer line temperatures were maintained at 230°C, 200°C and 200°C, respectively. The acquisition of chromatographic data was performed by means of Chemstation (Agilent) and mass spectrum library NIST 2005 (Gatesburg, USA) was applied to identification. The 25 m × 0.32 mm × 5 μm capillary column CP-Porabond-Q (Varian Inc., Middelburg, The Netherlands) was used. Oven temperature programme was as follows: initial 90°C held for 7 min, then ramped 7°C min-1 to 140°C, held for 7 min then ramped 15°C min-1 to 260°C and held for 10 min.
An automatic SPME holder with Carboxen/polydimethylsiloxane (CAR/PDMS) fiber of 75 μm thickness was purchased from Supelco (Bellefonte, PA, USA). The sorption and desorption of analytes have been performed automatically by means of autosampler MPS 2XL (Gerstel, Mülheim an der Ruhr, Germany). SPME conditions were as follow: extraction time of 10 min at a temperature of 37°C. Desorption of volatiles from the fiber was placed in hot GC injector at 290°C, for 1 min. Prior every SPME sorption, the fiber was preconditioned in small needle heater (Gerstel, Mülheim an der Ruhr, Germany) at temperature 290°C, for 5 min. 20 ml headspace vials, teflon coated rubber septa and crimp caps were purchased from Gerstel (Mülheim an der Ruhr, Germany). Helium and nitrogen of purity 6.0 (i.e., 99.9999%) were purchased from Linde (Vienna, Austria) and 3 Lit Tedlar bags from SKC (Eighty Four, PA, USA). Gas tight syringes were purchased from Hamilton (Bonaduz, Switzerland) and 1 l gas bulb from Supelco (Bellefonte CA, USA).
We defined signal to noise ratio of 3 as the limit of detection (LOD) and signal to noise ratios of 9 as the limit of quantification (LOQ).

PTR-MS analysis
Volatile organic compounds with a volume ratio up to 1:10 10 (0.1 mm 3 of the compound to 1 m 3 air; 0.1 parts-perbillion; 0.1 ppb) can be measured. The molecules are ionized by proton transfer from H 3 O + -ions produced in the ion source of the instrument with subsequent measurement by a Quadrupol-Mass-Spectrometer [47][48][49]. The proton transfer reaction only takes place if the proton affinity of the analyte is higher than that of water, hence it is possible to detect most aldehydes, ketones, alcohols, acids, esters and many unsaturated aromatic as well as N or S substituted hydrocarbons. The count rate of the primary ions (H 3 O + ) was calculated taking the count rate at m/z 21 and m/z 37 into consideration. For compounds with unknown kinetic reaction constant, the nominal constant k = 2 × 10 -9 cm 3 sec -1 allows to determine a first estimate of the respective concentration (based on the measured count rate).
To calculate the concentrations for the mentioned compounds (acetone, benzene, isoprene, methanol) correctly, we performed calibration series (under consideration of the specific reaction rate constant k). The results of the used calibration as well as the used reaction constant are given in Table 1.
A high-sensitivity proton transfer reaction mass spectrometer (hs-PTR-MS, 3 turbopumps; Ionicon Analytic GMBH, Innsbruck, Austria) with Teflon rings (instead of Viton rings) was used. The PTR-MS showed a count rate of the primary ions (H 3 O + ions) of ~1.5 × 10 7 counts per second. The settings of the PTR-MS result in a count rate of H 2 O . H 3 O + at around 7.5 × 10 4 and the percentage of parasitic precursor ions O 2 + <1%, NO + <0.5% and NH 4 + < 8% (the given values refer to dried, filtered room air, so-called zero-air). For additional details see references [46,50].

Preparation of gaseous standards
Calibration gases were prepared mostly by evaporation of liquid compounds (except C4-C6 alkanes, 2-butene and dimethyl ether) in a glass gas bulb. Before using, all bulbs had been cleaned with methanol, dried in oven at 120°C for at least 20 h, and then purged with ultra-clean nitrogen for at least 10 min. Afterwards, bulbs were evacuated by means of a vacuum pump for 10 min. Calibration vapour mixture was obtained by injection of 1-3 μl of each compound through membrane, using a GC syringe into a glass bulb. After evaporation, appropriate amount of vapour mixture was removed using gas tight syringe and introduced into Tedlar ® bags (SKC 232 Series, Eighty Four, PA, USA) with 0.5, 1 or 1.5 L of nitrogen.

Human subjects: GCMS analysis
A cohort of 65 patients (28 smokers, 31 exsmokers and 6 nonsmokers) suffering from lung cancer at different stages and in different treatment regimes (median age 63.0 years and an age range of 37 -84 years) was recruited. All individuals gave informed consent to participation in the study. The patients completed a questionnaire describing their current smoking status (active smokers, non-smokers) and the time elapsed since their last smoke. The classification as smoker/non-smoker/ex-smoker is based on the self-declaration of the patients. The amount of smoking (in pack years) was determined. * ) The "calibration factor" is the factor used to correct concentrations which are initially determined using the kinetic rate constants as taken from reference [55]. The large "calibration factor" for isoprene is due to fragmentation of protonated isoprene (m/z 69) towards m/z 41.
The lung cancer patients were compared with 31 healthy volunteers (7 smokers, 2 exsmokers and 22 nonsmokers, median age 38.0 years and an age range of 21 -87 years), who also gave informed consent to participation in the study and declared their smoking habits.
All patients and volunteers consumed food not later than one hour before breath sampling. No special dietary regimes were applied. The samples were collected at different daytime independent of the time of meals and were processed within 6 hours at most. The study was approved by the local ethics committee of Innsbruck Medical University.

Sampling of exhaled breath
Samples of mixed breath gas were collected in Tedlar bags (SKC Inc, Eighty Four, PA) with parallel collection of ambient air (also in Tedlar bags). Breath gas samples were obtained after a ~5 minutes sitting of a volunteer. Each subject provided 1 or 2 breath samples by use of a straw. All samples were processed within 3-6 hours. We collected mixed alveolar breath (instead of alveolar breath) in order to find also compounds directly released from the lungs.
Before collection of breath, all bags were thoroughly cleaned to remove any residual contaminants by flushing with nitrogen gas (purity of 99.9999%), and then finally filled with nitrogen and heated at 85°C for more than 8 hours with a complete evacuation at the end.
All compounds detected in breath were compared to the ambient air and only compounds with concentrations at least 15% higher than in ambient air concentrations were reported. 18 ml of gas sample has been transferred to 20 ml volume evacuated glass vials, and equilibrated with nitrogen gas.
Here we determined VOCs in exhaled breath of lung cancer patients. We restrict ourselves to compounds which show at least 15% higher concentrations in exhaled breath as compared to inhaled air. In particular, we exclude compounds which show lower concentration in exhaled breath than in inhaled air. Our experience showed that different rooms show quite different indoor air composition, which is particularly pronounced in clin-ical environments. The threshold of 15% was arbitrarily chosen.

Smokers evaluation (GCMS measurements)
For each compound found, we determined the proportion of lung cancer patients in whose exhaled breath the compound appears, separately for smokers, exsmokers and non-smokers, see Figure 1. Putting smokers into one class and combining exsmokers and non-smokers into the other class, the respective proportions, namely proportion smoker and proportion exsmoker and non-smoker were computed for appearance of each compound. The p-value for the null hypothesis "proportion smoker = proportion exsmoker and non-smoker " was computed according to the method proposed by Agresti and Caffo [51]. Statistical results are considered to be significant if p <0.05.

Group comparisons (PTR-MS measurements)
The non-parametric Kruskal-Wallis test (equivalent to the Wilcoxon rank sum test) was used for all comparisons of concentrations in two groups. For all comparisons, the null-hypothesis was that the concentrations of compared Compounds observed in exhaled breath of cancer patients which are related to smoking behavior (GCMS) Figure 1 Compounds observed in exhaled breath of cancer patients which are related to smoking behavior (GCMS). Only those measurements have been considered, which show at least 15% higher concentration in exhaled air than in indoor air. The relative proportions for observations with respect to smoker patients (red), ex-smoker patients (blue) and non-smoker patients (green) is shown. Nine compounds (blue) were identified by retention time and by spectral library identification. One compound, namely 1,3cyclopentadiene was not commercially available and therefore only identified by spectral library match. Acetonitrile and toluene arise in almost every smokers' exhaled breath. p-Values computed according to the method of Agresti and Caffo [51] are smaller than 0.0001 with the exception of 2methyl-1-butene and 1,4-pentadiene, where p < 0.05.
groups are equally distributed. Statistical results are considered to be significant if p <0.01. We chose this limit (instead of p < 0.05) because of the large number of subjects in the cohort investigated by PTR-MS (220 lung cancer patients, 441 healthy controls).

Compounds excluded for distinction of lung cancer patients from healthy volunteers
Certain compounds may arise in relatively high concentrations in indoor air. Typical examples are isopropanol or limonene (from cleaning agents) or p-xylene (in hospital indoor air). Other compounds like cineole, menthol, pcymene or ethanol may be contained in toothpaste, candies or foodstuff. Phenol and N,N-dimethyl acetamide are released from Tedlar bags and carbon disulfide is frequently released by GCMS septa. Halogenated compounds like trichloroethylene, acetylbromide or 1,1difluoroethane have to be treated with care. N,N-dimethyl formamide may also have a hospital-related background.
Other compounds like acetonitrile are typical for smoking (see Results Section). These mentioned compounds were excluded when trying to distinguish lung cancer patients from healthy volunteers with respect to VOC concentration patterns in exhaled breath.

Patient characterization
The demographic data of patients, volunteers and the histologic subtypes of lung cancer are presented in Table 2, 3, 4, and 5.

Concentrations of volatile organic compounds as determined by PTR-MS
The distribution of concentrations for benzene, isoprene, acetone and methanol was determined by PTR-MS. The substances isoprene, acetone and methanol are the main components of exhaled breath.
The results for benzene are shown in Figure 2. The median concentrations of benzene are increased in smokers (CAsmoker 2.9 ppb; control-smoker 2.4 ppb) as compared to non-smokers and exsmokers (CA-nonsmoker/exsmoker 0.9 ppb, control-nonsmoker/exsmoker 1.1 ppb). We observe a small difference between lung cancer patients and controls: statistically significant for the non/ex-smoker subgroups (p = 0.002), but not for the smoker-subgroup (p = 0.469).
The results for isoprene are shown in Figures 3 and 4. The median concentrations of isoprene are lower in lung cancer patients (81.5 ppb) than in healthy controls (105.2 ppb), with a p-value p < 0.01. Since isoprene shows age and gender effects [33], we differentiated also with respect to gender, see Table 6: For male persons we observed median concentrations of 133.7 ppb in controls which are younger than 50 years, 100.0 ppb in controls which are older than 50 years, and 87.1 ppb in lung cancer patients.
In female persons we observed median concentrations of 103.9 ppb in controls which are younger than 50 years, 98.1 ppb in controls which are older than 50 years, and 72.9 ppb in lung cancer patients. Additional information on lung cancer patients with and without radiotherapy are given in Figure 4: Lung cancer patients undergoing a radiotherapy show a slightly higher concentration of isoprene than lung cancer patients without radiotherapy.
The results for acetone are shown in Figure 5: The median concentrations of acetone are lower in lung cancer patients (458.7 ppb) than in healthy controls (627.5 ppb), with a p-value p < 0.01.
The results for methanol are shown in Figure 6: The median concentrations of methanol are lower in lung cancer patients (118.5 ppb) than in healthy controls (142.0 ppb), with a p-value p = 0.011.

Volatile organic compounds in lung cancer patients observed by GCMS
We observed altogether 103 volatile organic compounds (VOCs) in exhaled breath of lung cancer patients. We do not claim that all of these compounds are endogenously produced. These compounds were identified by spectral library match. For 84 of these 103 compounds we confirmed identification by comparison of retention times with native standards.
The appearance of the following compounds is influenced by smoking habits: toluene, benzene, acetonitrile, 2methyl furan, 2,5-dimethyl furan, furan, 1,3-cyclohexadiene, 1,3-cyclopentadiene, 2-methyl-1-butene, 1,4-pentadiene. p-Values computed according to the method of Agresti and Caffo [51] are smaller than 0.0001 with the exception of 2-methyl-1-butene and 1,4-pentadiene, where p < 0.05. The proportion of lung cancer patients showing these compounds in their exhaled breath is shown in Figure 1 for smokers, ex-smokers and nonsmokers. Apart from 1,3-cyclopentadiene (which is not    commercially available), all these smoking-related compounds were not only identified by spectral library match, but also by comparison of GCMS retention time with that of standards based on the respective pure compound.
All the compounds related to smoking behavior were excluded from the list of compounds which are thought to be specific for lung cancer.

VOCs specific for lung cancer patients (PTR-MS)
For isoprene, acetone and methanol we observe lower concentrations in exhaled breath of lung cancer patients as compared to healthy volunteers (see Discussion Section).

VOCs specific for lung cancer patients (GCMS)
A comparison of the results of our cohort of 65 lung cancer patients with those of 31 healthy volunteers revealed differences in the concentration patterns. Sensitivity for detection of lung cancer patients based on 4 different compounds (or 15 different compounds or 21 different compounds) not arising in exhaled breath of healthy volunteers was 52% (71% and 80%). Compounds specific for smoking were not included in these sets. The three used sets of compounds were increasing, i.e., The smallest set-A of 4 compounds consisted of 2butanone, benzaldehyde, 2,3-butanedione, 1-propanol. The other sets of compounds are listed in Table 7. All the 21 compounds used here for detection of lung cancer patients were not observed in healthy controls at concentrations at least 15% higher in exhaled breath than in indoor air (as detected by SPME with its relatively high LOD). Hence specificity is always 100% for all three sets of compounds. Figure 7 illustrates the concentration patterns of selected compounds (including the above set-A of 4 compounds) for lung cancer smokers, lung cancer ex-smokers and lung cancer non-smokers as compared with healthy smokers, healthy ex-smokers and healthy non-smokers. Some compounds like isoprene and acetone arise in everybody's exhaled breath. Some compounds like acetonitrile are typical for smoking behavior. Some compounds were observed only in lung cancer patients (e.g., the sets A-C of compounds given in Table 7).

Discussion
The following points characterize our investigation and are discussed here: (A) No differences of concentrations (concentration expiratory -concentration inspiratory ) between exhaled breath and indoor air were considered. Only expiratory concentrations (concentration expiratory ) were considered. All our results refer to measurements which showed at least 15% higher concentrations in exhaled breath than in indoor air. If indoor air concentrations of some compound was higher than the concentration in exhaled air, the respective compound was not considered for the particular patient/volunteer in question.
(B) We excluded certain compounds from use for the distinction between lung cancer patients and healthy volunteers, because these compounds may be related to artefacts. These compounds may be observed due to indoor air contamination, cleaning agents, toothpaste, candies or foodstuff, compounds being released from tubings, Tedlar bags or GCMS septa etc, see Table  8.
(C) We were interested in compounds related to smoking behavior, but did not use them to distinguish between lung cancer patients and healthy volunteers.

Concentration distributions of benzene (determined by PTR-MS) in exhaled breath of lung carcinoma patients (smokers:
red; non/ex-smoker: magenta) and healthy volunteers (smok-ers: blue; non/ex-smokers: green)

Figure 2 Concentration distributions of benzene (determined by PTR-MS) in exhaled breath of lung carcinoma patients (smokers: red; non/ex-smoker: magenta) and healthy volunteers (smokers: blue; non/ex-smokers: green).
The concentration is shown in logarithmic scaling. Benzene concentration is increased in the subgroups of smokers and therefore a marker for smoking behavior. The median values for benzene in human breath have been determined as follows: CA-patients-non/ex-smoker 0.9 ppb, controlnon/ex smokers 1.1 ppb (significant difference according to Kruskal-Wallis test: p-value 0.002); CA-patients-smoker 2.9 ppb, control-smokers 2.4 ppb (no statistical significant difference according to Kruskal-Wallis test: p-value 0.469). The difference in smoking behaviour (between smoker and non/ ex-smoker) is statistically significant for the control group (pvalue < 1*10 -5 ) as well as for the CA-patients (p-value < 1*10 -5 ).
(D) We determined retention times of as many compounds as possible based on standards prepared from the respective pure compound. In total, we determined retention times of ~220 compounds with respect to our SPME-GCMS measurement protocol. For the present study with 103 observed compounds altogether 84 compounds were confirmed by retention time. We clearly indicate for each compound if it was identified by spectral library match only, or additionally by determination of retention time.
(E) We strictly observed the policy that one GCMSpeak corresponds to one compound.
(G) All our GCMS-experiments described here were done using SPME. This is not as sensitive as solid phase extraction with subsequent thermodesorption. If measurements are done with higher sensitivity, certain compounds may be present in exhaled breath of healthy volunteers, even if we do not detect them by SPME.
The approach (A) for us is physiologically more informative than taking differences in concentrations whenever a VOC behaves like carbon dioxide: the concentration of CO 2 in exhaled air (about 4%) is, within the normal inspiratory range, independent of the CO 2 concentration in inhaled air (0.03% to 2% in indoor air) [52]. Only the exhaled concentration of CO 2 (and not the difference of concentrations in exhaled breath and indoor air) refers to the physiological state of the human body. Nevertheless, compounds showing higher concentration in exhaled breath than in indoor air are not exclusively the result of metabolic processes, but can also be stored in different body compartments as a consequence of particular food consumption, smoking, medication, toothpaste, cosmetics and other sources. Some compounds, which could be important as potential biomarkers, are widespread in ambient air especially in a clinical environment, e.g. isopropanol. Several compounds were excluded from consideration (B) due to possible indoor air contamination (p-xylene, isopropanol, N,N-dimethyl formamide), due to their release by Tedlar bags (N,N-dimethyl acetamide, phenol) or GCMS septa (carbon disulfide), due to their appearance in cleaning agents or foodstuff (limonene, p-cymene, menthol) and due to potentially exogenous origin (halogenated compounds), see Table 8. The compound N,Ndimethyl-acetamide is released from Tedlar bags and the compound carbon disulfide (CS 2 ) is released by GCMSsepta. Some compounds need a more detailed investigation, such as the ester methyl acetate. This compound might appear in exhaled breath of healthy volunteers at low concentrations (ca. 1 ppb), and has been demonstrated to increase in concentration with increasing cardiac output (in one volunteer, only, results not shown).
Other compounds like 2-methyl pentane and 3-methyl pentane are potentially interesting for cancer screening, but might be released from certain types of GCMS-septa, even though not released by the septa used in the present investigation. Ethyl-benzene is a potentially interesting compound but excluded here because it is one of the volatile BTEX-compounds (= benzene, toluene, ethyl-benzene, xylene) appearing in gasoline, which are ubiquitous due to the contamination of soil and groundwater with these compounds.
For distinction between lung cancer patients and healthy volunteers we also excluded (C) all the compounds related to smoking behavior. A typical example is ace-

Concentration distributions of isoprene (determined by PTR-MS) in exhaled breath of lung carcinoma patients treated with radiotherapy (CA-patients-radiotherapy-female (n = 22): magenta; CA-patients-radiotherapy-male (n = 36): green), carcinoma patients not treated with radiotherapy (CA-patients-no radiotherapy-female (n = 50): cyan; CA-patients-no radiotherapymale (n = 88): black) and healthy volunteers (control < 50 years, female (n = 121): dark red; control < 50 years, male (n = 70): dark blue; control > 50 years, female (n = 142): red; control > 50 years, male (n = 94): blue). 19
CA-patients (7 female, 12 male) were excluded from consideration, because they did not clearly fit into one of the two groups (with/without radiotherapy). The concentration is shown in logarithmic scaling. For females, the median concentration for isoprene in breath is: for CA-patients-radiotherapy 85.1 ppb; for CA-patients-no radiotherapy 68.1 ppb; for control < 50 years 103.9 ppb and for controls > 50 years 98.1 ppb. For males, the median concentration for isoprene in breath is: for CA-patients-radiotherapy 89.9 ppb; for CApatients-no radiotherapy 85.4 ppb; for control < 50 years 133.7 ppb and for controls > 50 years 100.0 ppb. A discussion of age-and gender effects in healthy volunteers is given in reference [33].
tonitrile. The concentration of acetonitrile in exhaled breath of smokers is around ~30-60 ppb, whereas the concentration in exhaled breath of non-smokers is low (~2-5 ppb).
Identification based on spectral library match only may be misleading (D). We therefore add retention time for proper identification and have built up a database of ~300 physically available compounds (stored in a refrigerator). Since many of the exhaled compounds observed here have never been described in the biochemical literature, it is of utmost importance for further biochemical investigations to confirm the peak attribution with certainty. Therefore identification of peaks based on calibration mixtures (starting from pure compounds) is necessary. Only such a validated identification allows to investigate the biochemical background of the observed compound by, e.g., 13 C-labelling of metabolic precursors.
We strictly identify only one compound per GCMS-peak (E). Even though this may sound trivial, we know that other researchers attribute more than one compound to a peak (to circumvent the difficulty of chosing one among the several suggestions given by the exclusive use of spectral library identification without checking the retention time).
Collection of exhaled breath has to be done with great care. In particular, all alternative exogenous sources of volatiles should be avoided. We therefore use only Teflon tubings (F) and avoid all other materials, which might release plasticizers.
Altogether 53 compounds were observed in (some) cancer patients but not in healthy volunteers at our sensitivity level of GCMS-SPME. By considering different sets of candidates for cancer marker compounds (set-A: 4 compounds; set-B: 15 compounds; set-C: 21 compounds), we arrived at a specificity of 100% and a sensitivity of 52% (71% and 80%, respectively for the different sets of compounds). Considering all 53 compounds would not increase the sensitivity. Our primary aim was not to achieve high sensitivity for detection of lung cancer, but to identify the compounds observed in exhaled breath of lung cancer patients (of all disease stages) in a careful manner, taking into account not only spectral library match, but also retention time using standards prepared from the respective pure compounds. The mentioned 21 compounds do not include smoking-related compounds and do not include the compounds given in Table 8, which were excluded from consideration. The compounds which we presented here as candidates for cancer marker compounds were not tested in an independent cohort. SPME-GCMS identifies compounds, but is a relatively insensitive method. Hence compounds not appearing in healthy volunteers' exhaled breath may appear at lower concentrations below the limit of detection (LOD). In addition, SPME-GCMS is only semiquantitative due to competitive absorption on the SPME-fibre. PTR-MS, on the other hand, does not need preconcentration and gives much more reliable quantitative results. The shortcoming of PTR-MS is that it cannot identify compounds with certainty, and several compounds may overlap on a particu-lar mass-to-charge ratio [46]. Hence SPME-GCMS and PTR-MS complement each other, each method having its particular advantages and disadvantages.
In PTR-MS measurements we looked at relatively large groups of lung cancer patients (n = 220) and healthy volunteers (n = 441). We use tentative identifications for benzene (m/z 79), isoprene (m/z 69), acetone (m/z 59) and methanol (m/z 33). Benzene clearly is a compound related to smoking behavior. The respective concentra-  tions of isoprene, acetone and methanol are lower in lung cancer patients as compared to healthy volunteers. This might partly be an effect of age (as in isoprene [33]), and partly might be due to lower exhalation force in lung cancer patients.
Our healthy controls are younger than patients, and this is particularly so for the GCMS-measurements. Also, no disease controls have been considered, e.g., controls suffering from lung diseases such as COPD. Certainly age, gender and lung diseases (other than lung cancer) are confounding variables, which have to be evaluated very carefully, cf. refs [33,50].
We did not find pronounced differences in composition of exhaled breath between different stages and different treatment regimes. Small differences occur for isoprene, acetone and methanol (PTR-MS measurements). We discussed isoprene in more detail as an illustrative example (see Figures 3,4 and Table 6): A statistically significant dif-ference in concentration of isoprene was observed between female lung cancer patients and female healthy controls (p < 0.00001). For male lung cancer patients, the respective p-value (p = 0.022) was not below our chosen treshold (namely 0.01) for significance.
The biochemical background of the compounds observed in exhaled breath is largely unknown and needs further elucidation in the future before using certain compounds as biomarkers. In order to be on the right track, it is absolutely compelling to have a proper identification of the compounds observed by GC-MS analysis. It will also be desirable to determine the kinetic rate constants for the exchange of compounds (e.g., isoprene) between different compartments of the body. This is, in particular, important for compounds which are lipophilic, such as isoprene [53,54] or such as methylated hydrocarbons.
The final goal of our investigations is the development of a clinically applicable screening test for detection of lung cancer. Even though our results are promising, there is still a long way to go. We shall need more detailed information about the composition of exhaled breath in cohorts suffering from other carcinomas and other lung diseases, and get information on compounds released from primary cancer cell cultures. Preferably combined information on primary cancer cell cultures and exhaled breath of one and the same patient should be collected.

Conclusion
Exhaled breath analysis is a promising diagnostic method for detection of lung cancer. Typical compounds arising in everybody's exhaled breath are isoprene, acetone and methanol, which exhibit decreased concentrations in breath of lung cancer patients as compared to healthy controls. Many other compounds arise in lung cancer patients only (when detected by SPME-GCMS with a lower detection limit of ~2-3 ppb). An important issue is the validated identification of volatile compounds observed in breath by comparison with commercially available pure compounds, since many observed compounds from exhaled breath have not been considered before in medical or biochemical context.

Competing interests
Anton Amann declares to be a representative of Ionimed Analytik GesmbH (Innsbruck). Nevertheless no financial reimbursements, fees, funding, or salary have been payed by Ionimed Analytik GesmbH in the last five years. Ionimed Analytik GesmbH was a consortium partner in the EU-project BAMOD.
Anton Amann does not hold any shares or patents of an organization which in any way might gain or lose financially from the publication of this manuscript.
This figure shows the distribution of appearance in exhaled breath of cancer patients (smokers, exsmokers and non-smokers) for 60 compounds out of altogether 103 com-pounds (GCMS) Figure 7 This figure shows the distribution of appearance in exhaled breath of cancer patients (smokers, exsmokers and non-smokers) for 60 compounds out of altogether 103 compounds (GCMS). Isopropanol is not appearing in this figure, because it usually shows higher concentrations in indoor air than in exhaled air. Carbondisulfide is not appearing in this figure, because it may be released from septa used for SPME investigations. Names of those compounds which have been identified not only by spectral library match but in addition by comparison with retention time of the pure respective compound, are shown in magenta color. Names of compounds only identified by spectral library match are shown in black color. The columns of four suggested marker compounds (set-A) are shown between white vertical lines. These compounds appear in cancer patients (smokers, ex-smokers and non-smokers), but do not appear in exhaled breath of healthy volunteers with concentrations above the limit of detection (LOD). p-cymene is contained in essential oils (e.g., in cumin and thyme) m-cymene misidentification possible (mix-up with natural isomer p-cymene) menthol mix of isomers might be contained in candies, toothpaste or foodstuff methyl acetate is observed in healthy volunteers in low concentration (ca. 1 ppb), and increases with increased cardiac output n-hexane there is an ubiquitous pollution with n-hexane in the environment n-pentane marker for oxidative stress p-xylene indoor air component in hospital rooms pentane, 2-methyl-and pentane, 3-methyl potentially interesting compound, but might be released by GCMS septa styrene styrene is sometimes added to the BTEX-compounds (see ethyl-benzene above), making it BTEXS trichloroethylene TCE; groundwater contamination by TCE is an important environmental concern, hence an exogenous origin is possible The other authors declare that they have no competing interests.

Authors' contributions
The original plan of the workpackage "lung cancer study" in the EU-project BAMOD was devised and written by JS, WM and AA. GCMS-methodology was developed by JS and WM. The project BAMOD was coordinated by AA.
AB performed patients' interviews and documentation, collection of samples, PTR-MS measurements, wrote parts of the manuscript. CA performed GCMS data analysis and validation. MP performed patients' interviews and documentation, collection of samples, PTR-MS measurements. MK performed patients' interviews and documentation, collection of samples, PTR-MS measurements. KS performed the PTR-MS data analysis, wrote parts of the manuscript. ML performed GCMS measurements of patient samples, indoor air samples and calibration samples, determined the retention times of compounds, did integration of results, wrote part of the manuscript. TL performed GCMS measurements of patient samples, indoor air samples and calibration samples, determined the retention times of compounds, did integration of results, wrote part of the manuscript. WF performed PTR-MS calibration measurements. HD recruited patients, discussed the GCMS and PTR-MS results and compared volatile analyses to patients' medical status and histology of cancer, revised the manuscript. MF recruited patients, discussed the GCMS and PTR-MS results, and compared volatile analyses to patients' medical status and histology of cancer, revised the manuscript. WH recruited patients, discussed the GCMS and PTR-MS results, and compared volatile analyses to patients' medical status and histology of cancer, revised the manuscript. WW recruited patients and discussed the GCMS and PTR-MS results of patients under radiotherapy. PL recruited patients, discussed the GCMS and PTR-MS results of patients under radiotherapy and revised the manuscript. HJ recruited patients and discussed the GCMS and PTR-MS results. MH recruited patients, discussed the GCMS and PTR-MS results, and compared volatile analyses to patients' medical status and histology of cancer. AH assisted in preparing the recruitment protocol and the ethics commission application. BB worked at GCMS protocol planning and measurements, finalized the manuscript. WM planned the original study and the EU-project BAMOD, devised the temperature program and GCMS protocol, provided validation of results and finalized manuscript. JS planned the original study and the EU-project BAMOD, devised the temperature program and GCMS protocol, provided validation of results and finalized manuscript. AA planned the original study and the EU-project BAMOD, supervised all experiments and measurements, validated results, provided software for data analysis and validation, wrote the manuscript. All authors read and approved the manuscript.