Skip to main content
  • Research article
  • Open access
  • Published:

Accuracy of volatile urine biomarkers for the detection and characterization of lung cancer



The mixture of volatile organic compounds in the headspace gas of urine may be able to distinguish lung cancer patients from relevant control populations.


Subjects with biopsy confirmed untreated lung cancer, and others at risk for developing lung cancer, provided a urine sample. A colorimetric sensor array was exposed to the headspace gas of neat and pre-treated urine samples. Random forest models were trained from the sensor output of 70 % of the study subjects and were tested against the remaining 30 %. Models were developed to separate cancer and cancer subgroups from control, and to characterize the cancer. An additional model was developed on the largest clinical subgroup.


90 subjects with lung cancer and 55 control subjects participated. The accuracies, reported as C-statistics, for models of cancer or cancer subgroups vs. control ranged from 0.795 – 0.917. A model of lung cancer vs. control built using only subjects from the largest available clinical subgroup (30 subjects) had a C-statistic of 0.970. Models developed and tested to characterize cancer histology, and to compare early to late stage cancer, had C-statistics of 0.849 and 0.922 respectively.


The colorimetric sensor array signature of volatile organic compounds in the urine headspace may be capable of distinguishing lung cancer patients from clinically relevant controls. The incorporation of clinical phenotypes into the development of this biomarker may optimize its accuracy.

Peer Review reports


There has been a substantial amount of research in the field of molecular biomarker development aimed at improving our ability to predict who will develop lung cancer, detect lung cancer at an early stage, and characterize the cancer that is found. This work has most commonly used tissue or blood specimens to identify characteristic alterations in the genome, proteome, transcriptome, or metabolome of lung cancer patients.

Urine is a non-invasively collected biospecimen that has been relatively under-represented as a source of potential molecular biomarkers of lung cancer. Discovery level studies have identified differences in metal elements, [1] specific proteins, [2] proteomic signatures, [3] ratios of fluorescent peaks, [4] non-volatile metabolites, [5] exosomal proteins, [6] and tobacco metabolites, [7] in the urine of people with lung cancer.

Volatile organic compounds (VOCs) are present in very low concentrations in the headspace gas of urine samples. Over 700 VOCs have been identified in the urine of healthy volunteers. Diverse classes of VOCs are found in the urine including alcohols, aldehydes, amides, amines, carboxylic acids, esters, ethers, halides, heterocyclic compounds, hydrocarbons, ketones, nitriles, sulfides, terpenoids, and thiols [8]. There is a greater diversity of VOC classes in the urine than other biospecimen sources where VOCs can be measured, such as breath, skin, blood, and buccal mucosa [9]. These VOCs are felt to reflect metabolic alterations at the tissue level that enter the bloodstream and can leave the body in part by transfer into the urine. The composition of VOCs is affected by the altered metabolic properties of cancer cells, such as the manner in which they handle oxidative and energy stresses.

The premise that urine VOC profiles can be used to identify disease has been supported by studies of patients with celiac disease, [10] inflammatory bowel disease, [11] diabetes, [11] urinary tract infections, [12] and tuberculosis [13]. In addition, research aimed at developing forensics tools to identify individuals, and to locate people during disasters, has suggested unique patterns of VOCs are present in our urine [9, 14]. Discovery level studies have produced promising results for the identification of leukemia, colorectal cancer, and lymphoma through the use of gas chromatography–mass spectrometry analysis of the urine, [15] while bladder and prostate cancer studies have assessed VOC profiles detected by canine scent and ion mobility spectroscopy [1618]. Two small discovery level studies using gas chromatography–mass spectrometry to detect a lung cancer signature from urine VOCs have been published, the first in mice injected with cancer cell lines, [19] and the second in humans .[20]. Promising results have encouraged us to explore this area further.

A colorimetric sensor array (CSA) is a cross-responsive chemical sensor whose output is a change in the color of its chemoresponsive elements upon exposure to VOCs [21]. The CSA signal is refined enough to separate VOCs by class and individually within a class when exposed to one VOC at a time, or to separate complex mixtures of VOCs from one another, such as those in the headspace gas of bacterial cultures [22]. In the current discovery level study we report on the accuracy of CSA derived signatures of the headspace gas of urine to detect and characterize lung cancer.


This study was approved by the IRB of the Cleveland Clinic (CC) (IRB 1021). All study subjects signed informed consent.

Study subjects were included as cases if they had biopsy confirmed, untreated lung cancer or an imaging abnormality highly suspicious for lung cancer being scheduled for biopsy. Only those who later were biopsy confirmed remained cases in the study. Study subjects were included as controls if they were at risk for developing lung cancer based on age > 40 years and tobacco use of at least 10 pack-years, and/or a family history of lung cancer, and/or the presence of chronic obstructive pulmonary disease (COPD); or if they presented with an indeterminate lung nodule 8–30 mm in diameter that was ultimately confirmed to be benign based on biopsy or serial imaging. The duration of imaging was based on the size of the nodules as recommended in current guidelines [23]. Study subjects were excluded from participation if they had a prior history of lung cancer, a history of another cancer within 5 years, were receiving immunosuppression, or were using continuous supplemental oxygen. Consecutive subjects presenting to the outpatient Pulmonary department of the Cleveland Clinic, who met the above criteria, were approached. Approximately 50 % of those approached agreed to participate. Samples from all who agreed to participate were included in the analysis. Data collection included demographic variables and comorbidities for all subjects, nodule size for control subjects with lung nodules, and cancer histology and stage for the cancer subjects.

The CSA was designed to have 73 chemoresponsive elements (Fig. 1). It was housed in a glass container above a sample of blotting paper to which the urine specimens were added. Study subjects provided a clean-catch urine sample at the end of a clinic visit in which they were at least 1 h from their last meal or drink. The sample was aliquoted and frozen at −80 °C within 2 h of being collected. At the time of testing the frozen urine was slowly thawed in a water bath then separated into four test conditions in order to maximize the sensor information: 1. Unaltered urine was analyzed with the CSA and separately used to measure the urine osmolality and perform a urine dipstick measurement, 2. A non-volatile acid (1 M tosic acid in a 1:1 volume ratio) was added in order to protonate organic acids to facilitate their evaporation, 3. A non-volatile base (1 M sodium hydroxide in a 1:1 volume ratio) was added to deprotonate amines, allowing them to evaporate more easily, 4. Urine was added to a pre-oxidation tube (sulfochromic acid on silica) to derivatize VOCs into more reactive species. Once prepared, 200 uL of each sample was added to a urine sensor cartridge which had been pre-warmed to 37 °C in an incubator for 20 min. An Epson V600 scanner imaged the sensor at 3 min intervals for 4 h with the cartridge held at 37 °C. Color difference maps were constructed by extracting the red, green, and blue values of the 73 indicators in the sensor array under each of the 4 conditions. The color vector of the initial image was subtracted from the color vectors of all subsequent images in order to construct a time series of color difference vectors. The person performing the urine tests (SL) was blinded to the study subject category (cancer or control).

Fig. 1
figure 1

Sensor elements. The chemoresponsive elements include metalloporphyrins, base indicators, acid indicators, redox dyes, solvatochromic dyes, and nucleophilic indicators

Our statistical prediction model building procedure included four steps. The first step was feature extraction. To derive features that describe characteristics of the observed time series, a nonparametric local polynomial regression as well as a simple linear regression was produced from the data for each color time series. Four model-based features were derived for each time series: the area under the curve of the nonparametric regression; residual standard error of the fitted curve; total variation of the fitted curve (a statistical measure of variation of a nonlinear function); and the linear growth trend of the data (i.e. the slope of the linear regression line). The second step was feature filtering. The purpose of this step was to reduce variable dimension and select a set of relevant features for use in constructing an efficient prediction model. A univariate logistic regression was fit for each feature, and those whose C-statistic was greater than 0.6 were identified as potential predictor variables. The third step was model training. Our data set was randomly split into a training set and a testing set with 7:3 ratio. A variable selection procedure and correlation analysis were conducted to avoid multi-colinearity and overfit in the model. Clinical features known to be associated with lung cancer risk, including age, smoking history, and COPD, were included as variables. Random forest models were built using the subset of variables selected from the training set, with and without the inclusion of the clinical variables. The fourth step was model validation. The fitted random forests models were evaluated on the testing set. To avoid randomness in data split, we repeated the third and fourth steps 100 times and have summarized the prediction accuracy results. The prediction models (built from the training datasets) were applied to the subjects in the testing datasets. The observation in the testing data was classified as positive if the predictive probability of the outcome was greater than 0.5. The observation in the testing data was classified as negative if the predictive probability of the outcome was less than or equal to 0.5. For comparison, models were built in a similar fashion using only the clinical features and separately using all study subjects (rather than the 7:3 training:testing split).

Demographic variables were described using sample mean with standard deviation or proportion as appropriate. Categorical variables were compared using the Pearson’s chi-square test, and continuous variables were compared using the two sample independent t-test. All analyses were performed by using the R statistical package (


145 subjects were enrolled between 7/2012 and 3/2014, 90 with lung cancer and 55 controls. Control subjects were reported to have COPD more often than cancer subject (41.8 % vs. 23.3 %, p = 0.0188). There were no differences in other demographic variables or relevant comorbidities (Table 1). The control group included 31 at risk subjects and 24 who presented with indeterminate lung nodules. All demographic variables and relevant comorbidities were similar between these groups. The mean nodule diameter was 12.4 mm (range of 3–32). Of the 90 lung cancers, 6 were small cell, 53 adenocarcinoma, and 28 squamous cell carcinoma. There was a nearly equal distribution of localized and advanced stages of lung cancer (Table 2).

Table 1 Study Population
Table 2 Lung cancers

Models were developed and tested comparing cancer and histology subgroups to controls. The accuracies, reported as C-statistics, ranged from 0.795 – 0.917. Models built from the entire dataset had similar accuracies (C-statistic 0.792 - 0.923). The accuracies were higher when the histology subgroups were compared to controls. There was little difference in the model accuracies when urine features alone were used to develop the models compared to models that included clinical variables. The model accuracies of stage I cancers vs. controls were equally, or more, accurate though the numbers of subjects with stage I were relatively small. Models developed and tested to characterize cancer histology, and to compare early to late stage cancer, were very accurate (Table 3). Normalization of the data for urine osmolarity and specific gravity did not substantially influence model accuracies. Models developed using clinical variables only were less accurate (C-statistics 0.543 – 0.687).

Table 3 Accuracy of models: Validated C-statistics with confidence intervals through model training on 70 % of subjects and testing on 30 %

To assess the influence of the subjects’ phenotypes on model accuracy we performed additional analyses. The study population was divided by sex, age (<55, 55–70, >70), and COPD into 12 subgroups. The largest subgroup (male, age 55–70, without COPD) contained 30 subjects (18 cancer, 12 control). Models developed and tested within this phenotype were very accurate, with a C-statistic of 0.970 for all cancer vs. control, and 0.987 for non-small cell carcinoma vs. control (Table 3).


We report the results of the development of a CSA based profile of urine headspace gas VOCs as a biomarker that could assist with the diagnosis and characterization of lung cancer. To our knowledge, this is the first study using a cross-responsive chemical sensor for this purpose. We found the CSA profile had good accuracy at separating subjects with lung cancer from clinically appropriate controls; that the accuracy improved when subtypes of lung cancer were compared to controls; and that the accuracy was very high when the signatures were developed within a specific subset of subjects defined by their clinical phenotype. Finally, the results showed promise at being able to characterize the lung cancer’s histology and stage.

The current report describes a discovery level study of a novel urine based lung cancer biomarker. To advance this work, technical validation of the test and clinical validation of the results will be required. Technical validation will include the development of standard operating procedures for urine collection and processing, and confirmation of uniform performance of the CSA from one batch of sensors to the next. Relatively little is known about the proper conditions in which urine should be collected and processed for VOC evaluation. Studies have suggested each individual’s urine VOC signature is unique, with a small amount of variability based on diet which is exceeded by the variability between individuals [24, 25]. Storage of urine samples for up to 1 month at −80 °C seems to have little influence on the urine VOC profile, [13] whereas storage at room temperature for 3 days may influence the concentration of VOCs identified [26]. The number and classes of VOCs detected is highest in acidified and basified urine, [13, 26] with only a small number of VOCs being ubiquitous independent of pH [8]. Other components of urine dipstick measurements did not affect classification accuracy in a canine bladder cancer study [16].

Additives were used to maximize the liberation of VOCs based on pH and oxidation. Urine samples were processed and frozen within 2 h but were tested at a variable distance from the time of processing (some over 1 year later). There did not appear to be any impact from normalization of results for urine concentration measures. As a next step, we will learn more about the influence of diet, the type of collection and storage containers, the time to processing and testing, the ideal additives, the optimal urine volume, temperature during testing, and the need for normalization to other urine values.

Lung cancer is heterogeneous in its clinical presentation and molecular makeup, as is the group of people in whom it develops. It is likely that one metabolic biomarker cannot accurately identify all patients with lung cancer. A patient’s clinical phenotype could influence the metabolic baseline. Alterations from this baseline may be useful in distinguishing a non-cancer from a cancer biosignature. Our exploratory results support very high accuracy of the metabolic biosignature when developed within a relatively uniform clinical phenotype. Clinical validation of a technically validated sensor platform will require a larger number of subjects in each clinical phenotype to be confident in the accuracies reported.

The output of the CSA, a cross-responsive sensor, is influenced by the mixture of VOCs to which it is exposed. The output is not able to identify the components of this mixture. Gas chromatography–mass spectrometry has been used in a small study to try to define the individual VOCs that make up the mixture. Further work in this area will help us understand the nature of the VOC signatures. Sensor technologies are more apt to be useful in the clinical setting because they are inexpensive and less technically demanding to apply and interpret.

Other limitations of our study include the small sample size for some of the comparisons where the accuracy was highest. These comparisons should be viewed as exploratory, helping to guide the next phase of urine biomarker development. It is not clear that the urine processing methodologies used in this study are optimal, and minor inconsistencies in the sensor manufacturing could impart unseen biases in the results. These issues will need to be addressed as part of the validation of this biomarker for clinical use. The distinguishing signatures from a technically validated instrument will then require validation on an independent cohort of a relevant population. The target for this test could be an upfront screening test or an adjunct to nodule evaluation. The validation cohort will need to reflect these targets. The results presented compare favorably with other biomarkers of early detection and/or nodule management.


In conclusion, the CSA signature of urine headspace gas VOCs is capable of distinguishing cancer patients from clinically relevant controls. The incorporation of clinical phenotypes into the development of this biomarker may optimize its accuracy.



Cleveland Clinic


Chronic obstructive pulmonary disease


Colorimetric sensor array


Volatile organic compound


  1. Tan C, Chen H, Wu T. Classification models for detection of lung cancer based on nine element distribution of urine samples. Biol Trace Elem Res. 2011;142:18–28.

    Article  PubMed  CAS  Google Scholar 

  2. Spivey KA, Banyard J, Solis LM, Wistuba II, Barletta JA, Gandhi L, et al. Collagen XXIII: A potential biomarker for the detection of primary and recurrent non-small cell lung cancer. Cancer Epidemiol Biomarkers Prev. 2010;19:1362–72.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  3. Zhang Y, Li Y, Qiu F, Qui Z. Comparative analysis of the human urinary proteome by 1D SDS-PAGE and chip-HPLC-MS/MS identification of the AACT putative urinary biomarker. J Chromatog B. 2010;878:3395–401.

    Article  CAS  Google Scholar 

  4. Al-Salhi M, Masilamani V, Vijmasi T, Al-Nachawati H, VijayaRaghavan AP. Lung cancer detection by native fluorescence spectra of body fluids – A preliminary study. J Fluoresc. 2011;21:637–45.

    Article  PubMed  CAS  Google Scholar 

  5. Carrola J, Rocha CM, Barros AS, Gil AM, Goodfellow BJ, Carreira IM, et al. Metabolic signatures of lung cancer in biofluids: NMR-based metabonomics of urine. J Proteome Res. 2011;10:221–30.

    Article  PubMed  CAS  Google Scholar 

  6. Li Y, Zhang Y, Qiu F, Qiu Z. Proteomic identification of exosomal LRG1: A potential urinary biomarker for detecting NSCLC. Electrophoresis. 2011;32:1976–83.

    Article  PubMed  CAS  Google Scholar 

  7. Yuan JM, Butler LM, Stepanov I, Hecht SS. Urinary tobacco smoke-constituent biomarkers for assessing risk of lung cancer. Cancer Res. 2014;74:401–11.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  8. Rocha SM, Caldeira M, Carrola J, Santos M, Cruz N, Duarte IF. Exploring the human urine metabolomics potentialities by comprehensive two-dimensional gas chromatography couple to time of flight mass spectrometry. J Chromatog A. 2012;1252:155–63.

    Article  CAS  Google Scholar 

  9. Kusano M, Mendez E, Furton KG. Comparison of the volatile organic compounds from different biological specimens for profiling potential. J Forensic Sci. 2013;58:29–39.

    Article  PubMed  CAS  Google Scholar 

  10. Di Cagno R, De Angelis M, De Pasquale I, Ndagijmana M, Vernocchi P, Ricciuti P, et al. Duodenal and faecal microbiotia of celiac children: molecular, phenotype and metabolome characterization. BMC Microbiol. 2011;11:219.

    Article  PubMed  PubMed Central  Google Scholar 

  11. Arasaradnam RP, Quraishi N, Kyrou I, Nwokolo CU, Joseph M, Kumar S, et al. Insights into ‘fermentonomics’: evaluation of volatile organic compounds (VOCs) in human disease using an electronic ‘e-nose’. J Medical Engineering Technol. 2011;35:87–91.

    Article  CAS  Google Scholar 

  12. Storer MK, Hibbard-Melles K, Davis B, Scotter J. Detection of volatile compounds produced by microbial growth in urine by selected ion flow tube mass spectrometry (SIFT-MS). J Microbio Methods. 2011;87:111–3.

    Article  CAS  Google Scholar 

  13. Banday KM, Pasikanti KK, Chan EC, Singla R, Rao KV, Chauhan VS, et al. Use of urine volatile organic compounds to discriminate tuberculosis patients from healthy subjects. Anal Chem. 2011;83:5526–34.

    Article  PubMed  CAS  Google Scholar 

  14. Rudnicka J, Mochalski P, Agapiou A, Statheropoulos M, Amann A, Buszewski B. Application of ion mobility spectrometry for the detection of human urine. Anal Bioanal Chem. 2010;398:2031–8.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  15. Silva CL, Passos M, Camara JS. Investigation of urinary volatile organic metabolites as potential cancer biomarkers by solid-phase microextraction in combination with gas chromatography–mass spectrometry. Br J Cancer. 2011;105:1894–904.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  16. Willis CM, Britton LE, Harris R, Wallace J, Guest CM. Volatile organic compounds as biomarkers of bladder cancer: Sensitivity and specificity using trained sniffer dogs. Cancer Biomarkers 2010/2011;8:145–53

  17. Cornu JN, Cancel-Tassin G, Ondet V, Girardet C, Cussenot. Olfactory detection of prostate cancer by dogs sniffing urine: A step forward in early diagnosis. Europ Urol. 2011;59:197–201.

    Article  Google Scholar 

  18. Roine A, Veskimae E, Tuokko A, Kumpulainen P, Koskimaki J, Keinanen TA, et al. Detection of prostate cancer by an electronic nose: A proof of principle study. J Urol. 2014;192:230–5.

    Article  PubMed  Google Scholar 

  19. Matsumura K, Opiekun M, Oka H, Vachani A, Albelda SM, Yamazaki K, et al. Urinary volatile compounds as biomarkers for lung cancer: A proof of principle study using odor signatures in mouse models of lung cancer. PLoS ONE 5:e8819. Doi:10.1371/journal.pone.0008819

  20. Hanai Y, Shimono K, Matsumura K, Vachani A, Albelda S, Yamazaki K, et al. Urinary volatile compounds as biomarkers for lung cancer. Biosci Biotechnol Biochem. 2012;76:679–84.

    Article  PubMed  CAS  Google Scholar 

  21. Lim SH, Feng L, Kemling JW, Musto CJ, Suslick KS. An Optoelectronic Nose for Detection of Toxic Gases. Nature Chem. 2008;1:562–7.

    Article  Google Scholar 

  22. Carey JR, Suslick KS, Hulkower KI, Imlay JA, Imlay KRC, Ingison CK, et al. Rapid identification of bacteria with a disposable colorimetric sensing array. J Am Chem Soc. 2011;133:7571–6.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  23. Gould MK, Donington J, Lynch WR, Mazzone PJ, Midthun DE, Naidich DP, et al. Evaluation of individuals with pulmonary nodules: When is it lung cancer? Diagnosis and management of lung cancer 3rd ed: American College of Chest Physicians evidence-based clinical practice guidelines. Chest. 2013;143(Suppl):e93S–e120S.

    Article  PubMed  PubMed Central  Google Scholar 

  24. Zlatkis A, Liebich HM. Profile of volatile metabolites in human urine. Clin Chem. 1971;17:592–4.

    PubMed  CAS  Google Scholar 

  25. Zlatkis A, Brazell RS, Poole CF. The role of organic volatile profiles in clinical diagnosis. Clin Chem. 1981;27:789–97.

    PubMed  CAS  Google Scholar 

  26. Mochalski P, Krapf K, Ager C, Wiesenhofer H, Aqapiou A, Statheropoulos M, et al. Temporal profiling of human urine VOCs and its potential role under the ruins of collapsed buildings. Toxicol Mechanisms Meth. 2012;22:502–11.

    Article  CAS  Google Scholar 

Download references


This research was funded by Metabolomx through support from SBIR contract 1R43CA177023-01. The study was designed and the data analyzed by the CC authors. Metabolomx employees converted images of the sensors into numerical vectors without knowledge of the subject label (cancer or control).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Peter J. Mazzone.

Additional information

Competing Interests

PJM, JJ – funding support for the presented research provided to the Cleveland Clinic and National Jewish Health respectively by Metabolomx

SL – CSO, Metabolomx

RM – COO, Metabolomx

PR – CEO, Metabolomx

XW, HC, AV, QZ, MB, MS – no relevant conflicts

Authors’ contributions

PJM takes responsibility for the content of the manuscript, including the data and analysis. PJM made substantial contributions to the conception and design of the research, acquisition of data, analysis and interpretation of data, drafting and revision of the article, and provided final approval of the version to be published. XW and QZ made substantial contributions to the analysis and interpretation of data and revision of the submitted article, has provided final approval of the version to be published and has agreed to be accountable for all aspects of the work. SL, RM, and PR made substantial contributions through performance of the urine assays, the revision of the manuscript, and have provided approval of the version submitted and agree to be accountable for all aspects of the work. HC, JJ, AV, MB, and MS made substantial contributions to the acquisition of data and revision of the submitted article, provided final approval of the version submitted, and agree to be accountable for all aspects of the work. All authors have read and approved the final version of the manuscript.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mazzone, P.J., Wang, XF., Lim, S. et al. Accuracy of volatile urine biomarkers for the detection and characterization of lung cancer. BMC Cancer 15, 1001 (2015).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: