Open Peer Review
A study on volatile organic compounds emitted by in-vitro lung cancer cultured cells using gas sensor array and SPME-GCMS
BMC Cancervolume 18, Article number: 362 (2018)
Volatile organic compounds (VOCs) emitted from exhaled breath from human bodies have been proven to be a useful source of information for early lung cancer diagnosis. To date, there are still arguable information on the production and origin of significant VOCs of cancer cells. Thus, this study aims to conduct in-vitro experiments involving related cell lines to verify the capability of VOCs in providing information of the cells.
The performances of e-nose technology with different statistical methods to determine the best classifier were conducted and discussed. The gas sensor study has been complemented using solid phase micro-extraction-gas chromatography mass spectrometry. For this purpose, the lung cancer cells (A549 and Calu-3) and control cell lines, breast cancer cell (MCF7) and non-cancerous lung cell (WI38VA13) were cultured in growth medium.
This study successfully provided a list of possible volatile organic compounds that can be specific biomarkers for lung cancer, even at the 24th hour of cell growth. Also, the Linear Discriminant Analysis-based One versus All-Support Vector Machine classifier, is able to produce high performance in distinguishing lung cancer from breast cancer cells and normal lung cells.
The findings in this work conclude that the specific VOC released from the cancer cells can act as the odour signature and potentially to be used as non-invasive screening of lung cancer using gas array sensor devices.
Cancer is one of the leading causes of mortality among humans worldwide. These phenomena are mainly because cancer commonly detected at a very late stage. The American Cancer Society , estimated about 1,685,210 new cases of cancer to be diagnosed and 595,690 cancer related deaths to be reported in the United States in the year 2016. It is also reported that lung cancer (LC) is the second most common cancer affecting men (14%) and women (13%) behind only prostate cancer (21%) and breast cancer (29%) respectively . In Malaysia, LC has been reported to be the second most common cancer affecting men and the third most common cancer affecting females with 2,100 Malaysians diagnosed each year .The diagnosis of lung cancer at an early stage, particularly when the tumour is discovered at its local site, has been shown to improve the survival rate of patients [3, 4]. Hence it is critical that high risk patients are screened. However, the established and widely used screening techniques, such as chest radiography and cytological examination, often give poor results in detecting small and resectable cancers .
Currently, the application of low dose computed tomography (LDCT) as an early stage lung cancer screening technique shows reduction in the number of lung cancer-based deaths . Yet, this method exposed patients to great risk as the high amount of radiation used can lead to several complications [4, 7]. Generally, conventional methods are invasive and might delay the therapy if the cancer is found [8, 9]. In addition, only selected hospital with the right expertise and facilities can perform such screening tests. Thus, a new screening approach based on the cell biology theory  using the analysis of volatile organic compounds (VOCs) linked to lung cancer has been receiving considerable attention from researchers. This new screening technique is non-invasive, reliable and inexpensive [10, 11].
The change in metabolic pathways (gene or protein changes) in cancerous cells during tumour growth may lead to peroxidation of the cell membrane and production of certain VOCs [12, 13]. These VOCs can be detected directly on the headspace of the cancer cells [8, 14], or exhaled breath of cancer patients [10, 15, 16]. In the case of exhaled breath air, VOCs generated by the cancer cells are released by blood and exchanged through the alveolus in the lung . The potential of detection of VOCs in the breath of lung cancer patients to be used as diagnostic or screening tools have been extensively analysed and studied for several years . However, in order to provide cellular and biochemical origin information of VOCs to clinicians for the decision on the specific treatment for the cancer, the analysis should also be compared with cancer cells (0either in-vivo or in-vitro) [19, 20].
Many studies of in-vitro cultured cells as a model system to demonstrate the discrimination between tumour and normal cells using spectrometric technique have been reported [21,22,23,24,25,26,27,28,29,30,31,32]. However, the results are somewhat equivocal and more studies are essential to identify VOC biomarkers of lung cancer . There are only few studies conducted using an array of sensors to distinguish types of lung cancer cells based on in-vitro cultured cell lines samples [8, 33, 34] as shown in Table 1. These reports show substantial results in term of performance of the sensors. However, the use of the right classification algorithms for e-nose performance with the aid of SPME-GCMS analysis is crucial to strengthen the findings and progress the aim of non-invasively cancer diagnosis [35, 36].
In this study, the VOCs signature of the two types of lung cancer cell lines which are A549 and Calu3 will be investigated. The normal lung cell line and the breast cancer cell line are used as control samples to differentiate the lung cancer-related VOCs. As to date, no known reported work investigating VOC patterns released by both lung and breast cancer cultured cell lines under the same conditions, environment and at different growth stages.
This paper presents new results distinguishing the VOCs generated by two types of cancer cell lines, namely lung cancer (A549 and Calu-3) and breast cancer (MCF7), as well as normal lung (WI38VA13) cell lines at different proliferation stages using the Cyranose320 e-nose device. Also presented are results of five different classifiers for the e-nose to perform the VOCs classification. To the best of author knowledge, this paper also presents a novel work by investigating the use of Naïve Bayes (NB) and One versus All-Support Vector Machine (OVA-SVM) to classify the VOCs emitted by the in-vitro cell lines using e-nose. Table 2 shows the parameters used in this study.
The Cyranose320 is an array of 32 conducting polymer coated carbon black sensor-based e-nose and the pattern of change in the resistance of the sensor array is used to identify smells . This feature can assist to detect even the slightest difference in headspace or complex volatile organic compounds (VOCs) emitted by the exhaled breath  or in vitro cultured cells [34, 39,40,41].The Cyranose320 was used to detect and discriminate the volatiles collected from the different cell lines with the aid of pattern recognition methods.
The VOCs collected were classified using different multiclass classifiers that best utilise the effectiveness of Cyranose 320 in distinguishing the lung cancer cells from control samples. GCMS-SPME analysis also performed for each sample. This pre-concentrated volatile compound extraction method was able to determine the specific compound emitted by each type of cells. The compounds were identified using NIST library and compared with e-nose data. Thus, the significance of this preliminary results and its support in the application in lung cancer clinical screening are discussed.
Cell culture preparation
Cancerous lung cell lines A549 (ATCC ® CCL-185™) and Calu-3(ATCC® HTB-55™), normal lung cell line WI38VA13 (ATCC® CCL75.1™) and breast cancer cell line MCF7 (ATCC® HTB-22™) were obtained from the American Type Culture Collection and being maintained at the Cell and Tissue Culture Engineering Lab (CTEL), Department of Biotechnology Engineering, IIUM. Table 3 shows the characteristics of the cell lines used in this project. Based on the Table 3, the A549 and Calu3 are representing same histology which is adenocarcinoma but claimed to be from different origin. Thus, the VOCs signature of both A549 and Calu3 will be also covered in this work.
The A549, WI38VA13 and MCF7 cells were revived and cultivated in DMEM (Dulbecco’s Modified Eagles Medium) supplemented with 10% (v/v) FBS (Fetal Bovine Serum). Meanwhile, the Calu-3 cell line was grown in Eagle’s Minimum Essential Medium (EMEM) with 10% (v/v) FBS. The cells were grown in 25cm2 T-flasks and incubated in a carbon dioxide (CO2) incubator at 37°C/5% CO2 [22, 23, 36].
Upon reaching 70-90% confluence, the cells were harvested and then seeded into new flasks with an initial density of 1×105 cells/ml in 5ml media for each cell line respectively. The culture condition was as reported in our previous work . The blank mediums, DMEM (without cells) and EMEM (without cells) samples were also triplicates respectively as control samples and incubated together with A549, Calu-3, MCF7 and WI38VA13. Same cell culture preparation and environmental conditions were maintained for both e-nose and SPME-GCMS measurement. The odour samplings were taken after 24 h of incubation using SPME fiber (Divinylbenzene/Carbonexen/Polydimethylsiloxane), while for Cyranose320, the measurement commenced at 24th, 48th and 72nd hours of incubation.
E-nose headspace sampling
The prepared samples in fully sealed T-flasks were placed in the biosafety cabinet. Then the flasks were connected to the inlet of Cyranose 320 for data collection. The sampling setup using e-nose is shown in Fig. 1. Table 4 shows the configuration of the data collection process using Cyranose 320. The baseline purge was set to be at 10 s before data collection. The odour samples were drawn for 180 s to allow it to cover all the 32 sensors. This duration will enable all the sensors inside the Cyranose320 to detect the VOCs in the odour. The sniffing process was set to be repeated for 5 times.
The collected data were then analysed using SPSS 17.0 and MATLAB R2012a to evaluate the e-nose performance. Each individual sample was described by a unique set of measurement known as features. The Cyranose 320 used in the work contains 32 conducting polymer sensors, and hence creates 32 features for each odour sample. Each feature forms a dimension in a space known as feature space. For each sample including the blank mediums, the experiments were replicated 3 times and each sniffing was repeated for 5 times at 24th, 48th and 72th hours respectively. For the e-nose analysis, each sample including blank mediums were replicated into three flasks, with datasets of two flasks used for training and the final one for testing. The sample datasets were divided into two parts and assigned as training and testing sets with a 2:1 ratio respectively. This study uses 18 different classes for classification purposes (total of six (6) classes multiplied by three varying incubation times).
The Savitzky-Golay filter was selected to remove noise from the gas sensor signal while preserving the height, width, amplitude and overall profile of the response [37, 39]. The datasets were normalized using fractional difference method as in Eq. (1) :
Where Ro is the baseline and the R is the steady state of the sensor response to the gas sample of the system. This fractional method helps to reduce the signal drift problem . All data were further normalized using sensor auto scaling global method, scaled to zero mean and standard deviation of one [42, 44].
The consideration of features extraction is essential to point out the discriminating information that would aid the improvement of classification performance .
Principal component analysis (PCA) and linear discriminant analysis (LDA) are two commonly used feature extraction techniques [45, 46]. In this present study, both techniques were conducted to evaluate the best method for reducing dimensionality by preserving the minimum information about the dataset. Hence the component and discriminants from PCA and LDA respectively were used for class separability visualisation. The PCA provides unbiased projection, which gives better information on the clustering behaviour of each class, while LDA maximizes the intergroup variance and minimizes within group variance. Further, the LDA data was considered as the input for different classifiers. This LDA data able to provide the highest possible discrimination between different classes of data and help to classify the data accurately [47,48,49,50].
Proposed classification algorithms
To date, various classification algorithms are proposed for cancer detection particularly those related to e-nose. In this study, the effectiveness and robustness of e-nose in distinguishing lung cancer cell lines were tested using several classification algorithms namely LDA with fisher criterion, K-Neighbour Neural Network (KNN), Probabilistic Neural Network (PNN), Naïve Bayes (NB) and Multi-class Support Vector Machine (SVM). The statistical significance of all 32 independent sensors was evaluated by comparing the mean score of 18 different groups using the Wilk’s Lamda method. A multi-class odour classification model (LDA-based classifier) was later proposed to evaluate the robustness of an e-nose system in classifying cancerous cell samples.
The LDA classification was conducted using leave-one-out approach for the error estimation. The fisher criteria was reported to be able to overcome the non-normally distributed data , hence being employed in this work.
PNN, which is defined as an implementation of Kernel discriminant analysis contains operations, which are organized into multi-layered feed forward network with four layers . Although PNN algorithm required a large memory for training, it requires less training time [52, 53]. The spread value (σ) was determined using 10-fold cross validation and a value of 0.1 were obtained as appropriate for the dataset with acceptable classification accuracy .
On the other hand, KNN classification is known as the simplest classification which uses neighbour characteristics to determine the class of the data samples. This classifier is able to rapidly evaluate the unknown inputs by calculating the distance between a new sample and mean of training data samples in each class weight by their covariance matrices . By considering the theoretical method the best k-value (one; 1) and the distance metric of Euclidean were selected as maximum accuracy obtained using these parameters .
Meanwhile, naïve Bayesian (NB) is a simple probabilistic classifier which applies Bayes’s theorem with naïve independence assumption. It is known as an efficient and effective classification technique to create models with predictive capabilities as the algorithm does not have several free parameter settings, does not require large amounts of data for training and computationally fast in decision making [56, 57]. In this study, the NB classification with normal (Gaussian) was chosen and the prior probabilities for the classes specified to empirical.
Finally, SVM analysis is a linear classifier which is able to find the best separating line between two classes in higher dimensions . However, the SVM can be directly used for binary classes only. For cases with more than two classes, the multi-class SVM can be implemented by dividing the single multiclass problem into multiple binary classification problems. There are three type of multi-class SVM, namely one versus all (OVA), one versus one (OVO) and Direct Acyclic Graph (DAG)-SVM . The OVA based SVM was used in this work to classify the 18 classes. This classification was trained with RBF kernel functions which were obtained from optimization method . Various pairs of box constraint (C) and sigma (σ) were tested for each dataset and the final obtained values were: C: 210 and σ: 2-3for this dataset.
The performance of each of the classifiers are presented using the accuracy (ACC) achieved. This is defined as the percentage (%) of correct classification over the total cases presented. However, since the accuracy alone might not give the best classification performance; sensitivity (SEN), specificity (SPE), precision (PREC) and Matthews Correlation Coefficient (MCC) measurements for each class were calculated to provide more relevant and interpretable information about the results [61, 62]. There are a few terms that are commonly used to measure the performance rate, namely, true positive (TP), true negative (TN), false positive (FP) and false positive (FP) .
The application of MCC in the multiclass case was originally reported in  which was used to measure the classification correlation. The value of MCC varies between -1 and 1 (where 1 is perfect prediction quality, while -1 is in the extreme misclassification of a confusion matrix and 0 specify random correlation) [62, 65]. This paper will report the accuracy, sensitivity, specificity, precision and MCC measures as well for all 18 classes for the best results.
Gas chromatography mass spectrometry- solid phase micro extraction (GCMS-SPME)
GCMS-SPME headspace sampling
The SPME-GCMS was used to identify the headspace VOCs that were released by each type of cultured cell lines (A549, Calu-3, WI38VA13 and MCF7) and blank mediums. Preheated solid phase micro extraction (SPME) was used to collect the VOCs released from the cells. The inner needle, which is the fiber of SPME or known as Divinylbenzene/Carbonexen/Polydimethylsiloxane (DVB/CAR/PDMS), was used in this work. The DVB/CAR/PDMS coated fiber was chosen as it has been optimized to extract a wide range of molecular range of molecular weight of both volatile and semi volatile molecules . The needle was exposed to headspaces of cell cultured in the 25cm2 T-flask for 15 min as shown in Fig. 4. At the end of the VOCs extraction time, the fiber was immediately inserted into GCMS Agilent 7890 sample point.
The DB-WAX capillary column (30 m x 250 μm x 0.25 μm) was used with the injector temperature of 250 °C to allow desorption of VOCs thermally. The oven temperature was initially set to be 50°C and held for 0.5 min, then ramped 10°C/minutes up to 180°C for 1 min and then again ramped 15°C per minute until it reached 250°C and held for 5 min. The carrier gas Helium flow rate was 1ml/min. The total analysis took 24.17 min to obtain the results. The MS analyses were done in full scan mode (TIC mode) with the scan range between 40 to 200 a.m.u and the electron impact ionization was done at 70eV to separate the compounds .
Identification of VOCs
The potential VOCs were only identified by using the spectral match in this study [29, 64]. The identity of each compound was determined using the Agilent Chem Station Software by searching on the “NIST” Mass Spectral Library 11 which provides the use of retention time and m/z of VOCs of interest. Each chromatograph was integrated and the peaks were matched and aligned in order to obtain a matrix that contains all peaks found in the whole set of measurements. The peaks or compounds that are missing in other replicate samples were eliminated. In this analysis, peaks less than 80% of the matching percentage to the NIST library (Qualitative) and peak area less than 3000 were excluded . Those peaks identified as arising from column, empty flask and fiber (siloxanes) were excluded in this study [19, 29]. The significant differences on the relative abundances of identified VOCS were conducted using the t-test and considered significant at P < 0.05.
Table 5 shows a representative result of Wilk’s Lambda test of day 1 dataset to show the contribution of variation in the discriminant function (df). The functions with p-value less than 0.05 (p < 0.05) were chosen, as this corresponds to the ability of the function to discriminate the groups.
Based on Fig. 5, the result shows that the samples of A549, Calu-3, MCF7, WI38VA13 and blank mediums were well separated with 100% discriminant function. The test data samples were matched closely with the distribution of different groups of cell lines in the training data. A significant clustering between lung cancer cell, breast cancer and the control samples was observed. This indicates that the different cell lines are emitting different profile of VOCs and that the e-nose is able to detect these variations. Both of the non-small lung cancer cells, A549 and Calu-3 ,were observed to be very close together but with a distinct separation. The scores of other samples were well distributed within each group, respectively with visible separation for the combination of all days.
PCA was performed on the data and the eigenvectors and eigenvalues were calculated using correlation matrix. The eigenvectors of eigenvalue higher than 1.0 can be selected as principal components (PC) and value lower than 1.0 can be considered to be excluded, in this study, the first three PCs with eigenvalue higher than 1.0, were selected for dataset at 24th, 48th and 72nd hours. Based on Fig. 6a, the samples were observed to be well separated. The total percentage of principal components (PC1, PC2, and PC3) in the PCA analysis as shown in Fig. 6a is 93.56%, which indicates that the each of the cell lines are separable. In order to emphasise the ability of sensors to distinguish the different lung cancer type, the PCA plot for Calu-3 and A549 were enlarged in Fig. 6b. The sensors managed to distinguish the 2 types of lung cancer each other might be due to the specific VOCs emitted from the cell lines since the origin of the A549 and Calu-3 cells are from epithelium and pleural effusion, respectively.
However, based on the PCA grouping behavior, it is observed that the features within the group were separated spatially compared to the LDA. The clustering of A459 and Calu-3 (lung cancer cells) observed to be significantly separated from the MCF7 (breast cancer cell) and WI38VA13 (normal cell) clusters. Overall, the extracted feature by LDA indicates good separability of different samples. Thus the LDA-based features were used to test the four different classifiers.
The LDA-based features were used to test the four classifiers (LDA, PNN, KNN, NB and OVA-SVM) using 10-fold cross validation. The performance of these classifiers was measured by their accuracy, sensitivity, specificity, precision and MCC of training and testing data. The performances of the e-nose and the classifiers on differentiating the VOCs emitted by lung cancer from the control samples were evaluated by comparing of the performance each classifier.
Tables 6, 7, 8, 9, 10 shows that three out of five LDA-based classifiers (SVM, PNN, KNN and NB) were able to achieve accuracy, sensitivity, specificity and precision of 90% while MCC has the value of 1 (high prediction quality). However, the OVA-SVM classifier gives the best results as compared to the other classifiers for classifying lung cancer cell lines volatile data. This algorithm shows high accuracy, sensitivity, specificity, precision and MCC in the testing phase. On the contrary, the LDA classifier has the least performance achieved and many samples were wrongly classified.
Although LDA-based OVA-SVM showed the best performance, the percentage of accuracy, sensitivity, specificity, precision and MCC values using PNN algorithm shows consistently high for every class. The prediction quality value (MCC) of DMEM using LDA-based PNN algorithm shows only 0.3 lesser than the SVM. To support this fact, a study conducted by F.Moderasi (2014), suggested that the PNN algorithm can be used as an appropriate alternative for SVM as the training process of the PNN algorithm is easier than SVM algorithm .
The performance of NB was observed to be less than SVM, KNN and PNN classifier because it is a generative classifier, and generally this classifier is not as accurate as the discriminative classifiers . However, the NB is still preferred to be used for the medical diagnosis application because of it is simple to build, easy to train and able to deal with the missing information [56, 57]. According to K. Huang (2005), the NB performance can be improved by training the NB classifier in a discriminative way  .Thus, this method can be considered in future work to obtain excellent results from NB classifier.
When the LDA-based OVA-SVM performance rate was investigated according to samples at different incubation time, it was found that the classification accuracy rate improved significantly, achieving approximately 99% for the growth features of 24th-hour incubation period. The performance rate was observed to also improve for samples at 48th and 72nd-hour of cell growth. These may indicate that the VOCs of each sample increased with prolonged incubation periods.
The low performance of OVA-SVM for the 24th-hour compared to the 2nd day data may due to the insufficient time for the metabolites or compounds to be released by the cells to into the headspace. This may also happen due to relatively low cell numbers which cause the lower production of VOCs compared to the 48th and 72nd-hour of incubations. This corresponds to a previous study on in-vitro lung cancer cells by Smith. D (2003), where a number of compounds in the headspace are directly proportional to number of cells. This problem can be overcome using more concentrated cell seeding that might also help the differentiation between the other cell lines at an early stage of growth .
Identification of the VOCs of lung cancer cell lines and normal cell lines by SPME-GCMS analysis
The VOCs related to lung cancer cell metabolism were investigated using SPME-GCMS analysis. The headspaces of cultured lung cells have been compared to the headspace of medium with breast cancer cells, the normal lung cells and without cells, respectively. The complete list of identified VOCs, based on the average peak of total chromatograms of three replicates of each sample is tabulated in Table 11. These 32 selected compounds are supposed to emitted from the both background culture media and the metabolic activity of the cells.
Statistical significance of the relative abundances of the VOCs released from the lung cancer cell lines and the blank mediums have been evaluated using the t-test by considering p value less than 0.05 as statistically significant. This analysis conducted to eliminate confounding VOCs which are due to the different substrates rather than to the cell metabolism. The results were shown in Table 12. The same analysis also has been conducted on the VOCs released by the different cancer cell (MCF7) and the normal lung cancer (WI38VA13). The compounds and their significant differences have been tabulated in Table 13.
Among the 32 VOC compounds detected, 20 are related to the lung cancer cell lines. Out of these, 18 are observed to be significantly more in the headspace of lung cancer samples compared to the blank medium (Table 12). Out of those 18, nine were observed to be absent from the blank samples. This indicates that these nine VOC compounds have specific association with the lung cancer cell metabolism.
In order to eliminate the influence of VOCs of culture media on the VOCs of lung cancer, the VOCs that found exclusively in the blank medium (statistically not significant) have been removed in the further analysis aimed at studying the properties of cancer cell lines. Furthermore, the aromatic compounds such as styrene, dimethyl silanediol, benzene and ethylbenzene are more linked to the contaminants [19, 50, 70, 71], thus these compounds are also eliminated for further analysis.
Overall, the 11 VOCs identified as statistically significant in previous analysis for the discrimination between normal lung cell and breast cancer cell line. The abundances of each VOC related to lung cancer cells was compared to both lung cells and breast cancer samples and tabulated in Table 13.
As seen in Table 13, four VOCs, namely dodecane, decanal 2-ethyldodecanol and heneicosane, are specific to lung cancer cells. They are absent from the control samples. The VOC whose abundance significantly decreases in the lung cancer cells are propylbenzene, nonanal, 3, 4-dimethylheptane, 2, 4-dimethylundecane and 2-ethylhexanol. The decane was observed to be increases significantly in the cancer related cell samples compared to normal lung cell line, indicating this compound more related to cancerous volatile. These results indicated that the headspaces of lung cancer cell lines are characterized by a specific VOCs signature.
The VOCs analysis in the medical field offered a great alternative approach to cancer diagnosis. However, till date the use of VOCs analysis in the clinical approach is still limited due to the lack of validation of cancer related metabolites and sensing performance of VOCs sensors. In this work, the VOCs emitted by the 2 different lung cancer cell lines and the controlled cell lines, both breast cancer cell and normal lung cell lines were analyzed using the commercialized CP gas sensors (Cyranose 320) and GSMS-SPME. This work is highlighting the potential of these analysis techniques in providing meaningful information in the clinical application of lung cancer diagnosis. The Cyranose 320 e-nose used to analyze the headspace of conditioned culture cell lines (in-vitro) in the proliferative conditions for 3 days to discriminate the VOCs patterns released in the headspace of the cell lines during normal and proliferation stage. Results from the e-nose analysis highlighted that the cancer cell lines are able to classified with high accuracy using the VOCs patterns even at the early stage of cell proliferation (24th hours of incubation time).
The ability for the Cyranose320 to be able to discriminate the VOCs of the cell samples with high accuracy even at the 24th hour of incubation provides a motivation to perform GCMS-SPME analysis. This allows the identification of the specific VOCs that are associated with the cancer cell growth. This was achieved by comparing the VOCs from lung and breast cancer cells to those of the blank mediums. Comparison of the chromatograms indicated that there were significant differences between the cell culture samples based on several compounds. There are total four specific VOCs identified as lung cancer related volatile, namely, heneicosane, dodecane, 2-ethyldodecanol and decanal.
The GCMS result also shows that higher alkanes group; heneicosane was found in both lung cancer cell lines, A549 and Calu-3, statistically significant from the controlled samples. This indicates that the heneicosane has high potential to be the lung cancer related biomarker. There are studies claimed the heneicosane as a candidate of the biomarker from lung cancer patients breath [28, 72, 73]. However, the origin of heneicosane in lung cancer cell remains unclear.
Another compound with a higher alkane group known as dodecane was observed to increases significantly in Calu-3 during the incubation period. There are few studies on lung cancer biomarker suggested n-dodecane to be associated with lung cancer in adenocarcinoma tissues , patient’s breath, especially in EGRF mutated adenocarcinoma patient’s breath . Dodecane also found to be related to breast cancer .
Among the detected VOCs, one specific compound, namely decane, which is also from the high alkanes group, was observed to be emitted by all of the three cancer cells. Similar results were obtained by Yishan. W and B G.Hyun. the decane is found in the lung cancer tissue of patients [29, 72]. Another study by Chen. X, using different lung cancer cells also found that decane to be one of the 11 compounds with higher concentrations compared to those of normal cells . Decane also considered as a lung cancer biomarker in a patient’s breath [77, 78]. A significant difference found in the concentrations of decane in the patient’s breath before and after surgery . Still, the origin of decane in breast cancer cell has never been reported in any previous studies.
According to a study by Meggie. H (2010), representative of hydrocarbon is reported as potential biomarker of lung cancer and suggested that these compounds are probably the outcome of oxidative stress . The alkanes are mostly produced from lipid peroxidation by reactive oxygen species (ROS) supported by few studies stating that alkanes and methylated alkanes are found in lung cancer [50, 70, 71, 80] and breast cancer [31, 34, 81].
A specific VOC released by A549 cell lines distinguished this cell line from other cell lines and blank medium which is decanal. A study in 2011 reported that decanal was used as a biomarker to detect non-small lung cancer using electronic nose with 95% sensitivity and 70% specificity . Decanal was used as one of the primary contributors to separate non-small cell lung cancer and small cell lung cancer as well, with 100% sensitivity and 75% specificity by Barash. O in a study conducted in 2012 . Whereas, there is only one specific VOC, 2-ethyldodecanol has been emitted by Calu-3.
The obvious VOCs emitted by MCF7 cell in this study were 3, 4-dimethylheptane, hexadecane and 2-phenyl-2-butanone. This finding is in line with one study which found hexadecane in the breath of a breast cancer patient . However, no previous published studies on volatiles from breast cancer have reported the existence of 3, 4-dimethylheptane and 2-phenyl-2-butanone. The normal cell WI38VA13 emitted four different VOCs which were Amphetamine, Xylene, 2, 4-dimethylundecane and heptadecane. The 2-ethydodecanal, 3, 4-dimethylheptane, 2-phenyl-2-butanone, Amphetamine, Xylene, 2-4-dimethylundecane and heptadecane have not reported to date as biomarker in any in-vitro studies. Thus, the significance of these compounds remains unclear. Besides, the measurement time for VOCs collection used was in contrast with previous studies, where the VOCs collected after 24 h of cell growth. This is to ensure the compounds were collected at proliferation stage.
Nonanal and 2-ethylhexanol from WI38VA13 cells were found to be significantly more than that from A549 and Calu-3. In contrast to results observed in this study, it has been reported that the detection of nonanal is significant [83, 84] and used to separate adenocarcinoma and squamous cell carcinoma . As for 2-ethylhexanol, the results here corresponds to other previous studies on lung cancer detection, and was never found to be one of the biomarkers. This indicates that these compounds might have a specific association related to cell metabolism. The WI38VA13 cells also share aromatic compounds with DMEM, which might be the reason for the overlapping of DMEM group in the WI38VA13 in the PCA and LDA analysis as shown in Figs. 5 and 6a.
In summary, the VOCs that exist in lung cancer cell lines but not in the control samples and those which exists in higher concentrations in the former may be considered as possible biomarkers as shown in Table 14. Decanal, dodecane, 2-ethyldodecanal and heneicosane may potentially be used to discriminate lung cancer cells from other type of cancer or normal cell lines. Decane on the other hand can potentially be used as a specific biomarker for cancer. These findings suggested that the identified VOCs are able to offer more information regarding in-vitro cultured cell line metabolism and aid the determination of lung cancer using the electronic nose technology. In order to reduce the possibility of false positive results, it is crucial to creating libraries of biomarkers for each of cancer cells and normal cells. This can be achieved by performing various chemometric or multivariate analysis to validate the biomarkers of the cancerous and normal cells of interest.
This study presents the possibility of using VOCs as biomarkers for cancer cells. Specific VOCs are verified to be specific to cancer cells compared to of the normal samples. The headspace of in-vitro cultured cell lines were analyzed using a Cyranose320 e-nose consisting of an array of sensors and GCMS coupled with SPME. Several classifiers were used to validate the ability of the e-nose to discriminate the cancer cells to that of the normal samples and blank mediums, namely the LDA, NB, KNN, PNN and OVA-SVM. The investigation was carried out to identify cell lines VOCs at three different proliferation stages under a normal laboratory condition.
The results from this study shows that the Cyranose320 was able to discriminate the VOCs released by the various cancer and healthy cells as well as the blank mediums. The classifiers tested were able to perform high levels of accuracy. The LDA based OVA-SVM records the best performance with 100% successful classification, even at the early stage of cell growth (24th hours of incubation) and managed to maintain this performance at 48th and 72nd hours.
The VOCs pattern collected from e-nose results were validated by the GCMS-SPME. The results show that particular cell lines produced specific VOCs. This study provides a list of possible VOCs, which is believed, can be specific biomarkers for lung cancer, even at the 24th hour of cell growth. The potential list of VOCs obtained from this study was compared with the previous studies as shown in Table 14. This also concludes that the e-nose in conjunction with GCMS-SPME is able to be a non-invasive screening tool at an early stage. This is particularly useful for the clinician to understand in the event any occurrences of overlapping groups in the e-nose results.
Besides, this study also shows that the use of existing tools such as GCMS-SPME and e-nose-based gas sensor array system promises the potentials to improve the cancerous VOCs detection system by optimizing the sensor selections. The sensors with higher selectivity and sensitivity are essential in order to capture the specific biomarkers. Therefore, further studies on optimizing the sensor system and using in-vivo studies (e.g. using breath samples) are underway with the ultimate goal to develop a complementary tool for clinical testing.
Lung cancer cell line
Artificial neural networks
Lung cancer cell line
Dulbecco’s modification of Eagle medium
Eagle’s minimum essential medium
Gas chromatography mass spectrometry
Human airway smooth muscle
Immortal bronchial epithelium
Linear discriminant analysis
Low dose computed tomography
Matthews correlation coefficient
Breast cancer cell line
Normal human diploid fibroblast
Non-small cell lung cancer
One versus all
One versus one
Principal component analysis
Probabilistic neural network
Maximum value of response
Root mean square roughness
Steady state of response
Small cell lung cancer
Savitzky-Golay smoothing filter
Solid phase micro extraction
Support vector machine
Total ion chromatogram
Volatile organic compounds
Normal lung cell line
American Cancer Society. Cancer Facts & Figures 2017. Alanta: American Cancer Society; 2017.
Types of Cancer. Natl. Cancer Soc. Malaysia. 2016. Retrieved from “www.cancer.org.my/national-cancer-society-malaysia-and-ibm-team-up-to-use-data-to-combat-cancer/” at 23rd March 2016
Hirsch FR, Franklin WA, Af G, PAJ B. Early detection of lung cancer: clinical perspectives of recent advances in biology and radiology. Clin Cancer Res. 2001;7:5–22.
Peng G, Hakim M, Broza YY, Billan S, Abdah-Bortnyak R, Kuten a, et al. Detection of lung, breast, colorectal, and prostate cancers from exhaled breath using a single array of nanosensors. Br. J. Cancer. 103:542–51.
Fossella FV, Komaki MR, Putnam MJB M Jr. Lung cancer. Texas: Springer-Verlag New York; 2002.
The National Lung Screening Trial Research Team. Reduced lung cancer mortality with low-dose computed tomographic screening. N. Engl.J.Med. 2011;365:395–409.
Culter DM. Are we finally winning the war on cancer? J Eco Perspec. 2008;22:3–26.
Barash O, Peled N, Hirsch FR, Haick H. Sniffing the unique “odor print” of non-small-cell lung cancer with gold nanoparticles. Small. 2009;5:2618–24.
Mazzone PJ. Exhaled breath volatile organic compound biomarkers in lung cancer. J. Breath Res. 2012;6:027106.
Amann A, Spanel P, Smith D. Breath analysis: the approach towards clinical applications. Mini Rev Med Chem. 2007;7:115–29.
Mazzone PJ. Analysis of volatile organic compounds in the exhaled breath for the diagnosis of lung cancer. J Thorac Oncol. 2008;3:774–80.
Singer SJ, Nicolson GL. The fluid mosaic model of the structure of cell membranes. Science. 1972;175:720–31.
Alberts B, Johnson A, Lewis J. Molecular biology of the cell. New York: Garl. Publ; 2002.
Bajaj A, Miranda OR, Kim I-B, Phillips RL, Jerry DJ, Bunz UHF, et al. Detection and differentiation of normal, cancerous, and metastatic cells using nanoparticle-polymer sensor arrays. Proc. Natl. Acad. Sci. U. S. A. 2009;106:10912–6.
Montuschi P, Barnes PJ. Analysis of exhaled breath condensate for monitoring airway inflammation. Thrends Phamacol. 2002;23:232–7.
Bajtarevic A, Ager C, Pienz M, Klieber M, Schwarz K, Ligor M, et al. Noninvasive detection of lung cancer by analysis of exhaled breath. BMC Cancer. 2009;9:348.
Horváth I, Lázár Z, Gyulai N, Kollai M, Losonczy G. Exhaled biomarkers in lung cancer. Eur. Respir. J. 2009;34:261–75.
Gordon SM, Szldon JP, Krotoszynski BK, Gibbons RD, Neill JO. Volatile organic compounds in exhaled air from patients with lung cancer. Clin.Chem. 1985;31:1278–82.
Lavra L, Catini A, Ulivieri A, Capuano R, Salehi LB, Sciacchitano S, et al. Investigation of VOCs associated with different characteristics of breast cancer cells. Sci. Rep. 2015;5:1–12.
Boots AW, Bos LD, Van Der SMP, Van SF, Sterk PJ. Exhaled molecular fingerprinting in diagnosis and monitoring: validating volatile promises. Trends Mol. Med. 2015;21:633–44.
Smith D, Wang T, Sulé-Suso J, Spanel P, Haj A E. Quantification of acetaldehyde released by lung cancer cells in vitro using selected ion flow tube mass spectrometry. Rapid Commun. Mass Spectrom. 2003;17:845–50.
Filipiak W, Sponring A, Mikoviny T, Ager C, Schubert J, Miekisch W, et al. Release of volatile organic compounds (VOCs) from the lung cancer cell line CALU-1 in vitro. Cancer Cell Int. 2008;8:17.
Sponring A, Filipiak W, Mikoviny T, Ager C, Schubert J, Miekisch W, et al. Release of volatile organic compounds from the lung cancer cell line NCI-H2087 in vitro. Anticancer Res. 2009;29:419–26.
Filipiak W, Sponring A, Filipiak A, Ager C, Schubert J, Miekisch W, et al. TD-GC-MS analysis of volatile metabolites of human lung cancer and normal cells in vitro. Cancer Epidemiol Biomarkers Prev. 2010;19:182–95.
Baranska A, Smolinska A, Boots AW, Dallinga JW, van Schooten FJ. Dynamic collection and analysis of volatile organic compounds from the headspace of cell cultures. J. Breath Res. 2015;9:047102.
Sponring A, Filipiak W, Ager C, Schubert J. Analysis of volatile organic compounds (VOCs) in the headspace of NCI-H1666 lung cancer cells. Cancer Biomarkers. 2010;7:3233.
Hanai Y, Shimono K, Oka H, Baba Y, Yamazaki K, Beauchamp GK. Analysis of volatile organic compounds released from human lung cancer cells and from the urine of tumor-bearing mice. Cancer Cell Int. 2012;12:7.
Yu J, Wang D, Wang L, Wang P, Hu Y, Ying K. Detection of lung cancer with volatile organic biomarkers in exhaled breath and lung cancer cells. AIP Conf. Proc. 2009:198–201.
Yishan W, Hub Y, Wanga D, Kai Y, Ling W, Yingchang Z, et al. The analysis of volatile organic compounds biomarkers for lung cancer in exhaled breath, tissues and cell lines. Cancer Biomarkers. 2012;11:129–0270.
Nozoe T, Goda S, Selyanchyn R, Wang T, Nakazawa K, Hirano T, et al. In vitro detection of small molecule metabolites excreted from cancer cells using a Tenax TA thin-film microextraction device. J. Chromatogr. B Anal. Technol. Biomed. Life Sci. 2015;991:99–107.
Wang C, Sun B, Guo L, Wang X, Ke C, Liu S, et al. Volatile organic metabolites identify patients with breast cancer, cyclomastopathy, and mammary gland fibroma. Sci. Rep. 2014;4:5383.
Calenic B, Filipiak W, Greabu M, Amann A. Volatile organic compounds expression in different cell types: an in vitro approach. Int. J. Clin. Toxicol. 2013;1:43–51.
Barash O, Peled N, Tisch U, Bunn P a, Hirsch FR, Haick H. Classification of lung cancer histology by gold nanoparticle sensors. Nanomedicine. Elsevier. 2012;8:580–9.
Gendron KB, Hockstein NG, Thaler ER, Vachani A, Hanson CW. In vitro discrimination of tumor cell lines with an electronic nose. Otolaryngol. Head. Neck Surg. 2007;137:269–73.
Broza YY, Haick H. Nanomaterial-based sensors for detection of disease by volatile organic compounds. Nanomedicine. 2013;8:785–806.
Marzluf BA, Krajc T, Mueller MR. Principles of lung cancer screening – exhaled breath analysis. Hamdan med. J. 2016;2016(9):17–38.
Bassey E, Whalley J, Sallis P. An evaluation of smoothing filters for gas sensor signal cleaning. Fourth. Int. Conf. Adv. Commun. Comput. 2014:19–23.
Pearce T, Schiffman S, Nagle H, Gardner J. Electronic nose technology. Hand B. Mach. Olfaction. Weinheim, Wiley-VCH; 2003.
Thriumani R, Jeffree AI, Zakaria A, Hasyim YZH-Y, Helmy KM, Omar MI, et al. A preliminary study on detection of lung cancer cells based on volatile organic compounds sensing using electronic nose. J. Teknol. 2015;77:67–71.
Thriumani R, Zakaria A, Jeffree AI, Hishamuddin NA, Omar MI, Adom AH, et al. A preliminary study on in-vitro lung cancer detection using E-nose technology. 2014. IEEE Int. Conf. Control Syst. Comput. Eng. 2014:601–5.
Thriumani R, Zakaria A, Jeffree AI, Hishamuddin NA, Omar MI, Adom AH, et al. Cancer detection using an electronic nose: a preliminary study on detection and discrimination of cancerous cells. Miri, Sarawak: IEEE Conf. Biomed. Eng. Sci; 2014. p. 752–6.
Dutta R, Hines EL, Gardner JW, Boilot P. Bacteria classification using Cyranose 320 electronic nose. Biomed. Eng. 2002;1:4.
Distante C, Siciliano PC, Persaud K. Dynamic cluster recognition with multiple self-Organising maps. Pattern Anal. Appl. 2002;5:306–15.
Scott SM, James D, Ali Z. Data analysis for electronic nose systems. Microchim. Acta. 2006;156:183–207.
Wei Z, Jin L, Jin Y. Independent component analysis. Statistics (Ber). New York: John Wiley & Sons; 2005. p. 504.
Stone JV. Independent component analysis: an introduction. Trends Cogn Sci. 2002;6:59.
Lu H, Plataniotis KN, Anastasios V. Multilinear Subspace Learning: Dimensionality Reduction of Multidimensional Data. illustrate. Herbrich R, Graepel T, editors. Boca Raton: CRC Press; 2013.
Xu Y, Lu G. Analysis on fisher discriminant criterion and linear separability of feature space. Int. Conf. Comput. Intell. Secur. ICCIAS. 2007;2006:1671–6.
Jin Z, Yang JY, Hu ZS, Lou Z. Face recognition based on the uncorrelated discriminant transformation. Pattern Recognit. 2001;34:1405–16.
Phillips M, Altorki N, Austin JHM, Cameron RB, Cataneo RN, Greenberg J, et al. Prediction of lung cancer using volatile biomarkers in breath. Cancer Biomark. 2007;3:95–109.
Li T, Zhu S, Ogihara M. Using discriminant analysis for multi-class classification: an experimental investigation. Knowl. Inf. Syst. 2006;10:453–72.
Mishra M, Jena AR, Das R. A probabilistic neural network approach for classification of vehicle. Int. J. Appl. or Innov. Eng. Manag. 2013;2:367–71.
Bhattacharyya N, Jana A. Incremental PNN Classifier for a Versatile Electronic Nose. 3rd Int. Conf. Sens. Technol; 2008. p. 242–7.
Antony R, Nandagopal MSG, Rangabhashiyam S, Selvaraju N. Probabilistic neural network prediction of liquid- liquid two phase flows in a circular microchannel. J. Sci. Ind. Res. 2014;73:525–9.
Al-Aidaroos K, Bakar A, Othman Z. Medical data classification with naive Bayes approach. Inf. Technol. J. 2012;11:1166–74.
Ashari A, Paryudi I, Tjoa AM. Performance comparison between Naïve Bayes, decision tree and k-nearest neighbor in searching alternative Design in an Energy Simulation Tool. Int. J. Adv. Comput. Sci. Appl. 2013;4:33–9.
Patil MRR. Heart disease prediction system using naive Bayes and Jelinek-mercer smoothing. Int. J. Adv. Res. Comput Commun. Eng. 2014;3:6787–9.
Seo N. A comparison of multi-class support vector machine methods for face recognition. 2007.
Naveen T. Word recognition in Indic scripts. International Institute of Information Technology; 2014.
Mishra A, Sankaran N, Ranjan V, Jawahar CV. Automatic localization and correction of line segmentation errors. Proceeding work. Doc. Anal. Recognit. - DAR ’12; 2012. p. 1–8.
Shao X, Li H, Wang N, Zhang Q. Comparison of different classification methods for analyzing electronic nose data to characterize sesame oils and blends. Sensors. 2015;15:26726–42.
Dehzangi A, Paliwal K, Lyons J, Sharma A, Sattar A. Proposing a highly accurate protein structural class predictor using segmentation-based features. BMC Genomics. 2014;15:1–13.
Zhu W, Zeng N, Wang N. Sensitivity, specificity, accuracy, associated confidence interval and ROC analysis with practical SAS® implementations. Northeast SAS users gr. 2010 heal. Care. Life Sci. 2010:1–9.
Zhang Y, Gao G, Liu H, Fu H, Fan J, Wang K, et al. Identification of volatile biomarkers of gastric cancer cells and ultrasensitive electrochemical detection based on sensing interface of au-ag alloy coated MWCNTs. Theranostics. 2014;4:154–62.
Jurman G, Riccadonna S, Furlanello C. A comparison of MCC and CEN error measures in multi-class prediction. PLoS One. 2012;7:1–8.
Schmidt K, Podmore I. Solid phase microextraction (SPME) method development in analysis of volatile organic compounds (VOCS) as potential biomarkers of cancer. Mol. Biomark. Diagnosis. 2015;6:1–11.
Modaresi F, Araghinejad S. A comparative assessment of support vector machines, probabilistic neural networks, and K-nearest neighbor algorithms for water quality classification. Water Resour. Manag. 2014;28:4095–111.
Huang K, Zhou Z, King I, R Lyu M. Improving naive Bayesian classifier by discriminative training. Taipei, Taiwan: Proc. Int. Conf. Neural Inf. Process. (ICONIP 05); 2005.
Smith D, Wang T, Sulé-Suso J, Španěl P, Haj A. Quantification of acetaldehyde released by lung cancer cells in vitro using selected ion flow tube mass spectrometry. Rapid Commun. Mass Spectrom. 2003;17:845–50.
Silva CL, Passos M, Câmara JS. Investigation of urinary volatile organic metabolites as potential cancer biomarkers by solid-phase microextraction in combination with gas chromatography-mass spectrometry. Br. J. Cancer. 2011;105:1894–904.
D’Amico A, Pennazza G, Santonico M, Martinelli E, Roscioni C, Galluccio G, et al. An investigation on electronic nose diagnosis of lung cancer. Lung Cancer Elsevier. 2010;68:170–6.
Byun H, Yu J, Huh J, Lim J, Nose E, Diseases L. Exhaled breath analysis system based on electronic nose techniques applicable to lung diseases. Hanyang Med. Rev. 2014;34:125–9.
Yu J-B, Lim J-O, Byun H-G, Huh J-S. Exhaled breath analysis of lung cancer patients using metal oxide sensor. Journal of Sensor Science and Technology. 2011;20:281–4.
Handa H, Usuba A, Maddula S, Baumbach JI, Mineshita M, Miyazawa T. Exhaled breath analysis for lung cancer detection using ion mobility spectrometry. PLoS One. 2014;9:1–13.
Phillips M, Cataneo R, Saunders C, Hope P, Schmitt P, Wai J. Volatile biomarkers in the breath of women with breast cancer. J. Breath Res. 2010;4:1–8.
Chen X, Xu F, Wang Y, Pan Y, Lu D, Wang P, et al. A study of the volatile organic compounds exhaled by lung cancer cells in vitro for breath diagnosis. Cancer. 2007;110:835–44.
Phillips M, Herrera J, Krishnan S, Zain M, Greenberg J, Cataneo RN. Variation in volatile organic compounds in the breath of normal humans. J. Chromatogr. B. 1999;729:75–88.
Yu H, Xu L, Wang P. Solid phase microextraction for analysis of alkanes and aromatic hydrocarbons in human breath. J. Chromatogr. B Anal. Technol. Biomed. Life Sci. 2005;826:69–74.
Poli D, Carbognani P, Corradi M, Goldoni M, Acampa O, Balbi B, et al. Exhaled volatile organic compounds in patients with non-small cell lung cancer: cross sectional and nested short-term follow-up study. Respir. Res. 2005;6:71.
Meggie H, Yoav YB, Orna B, Nir P, Michael P, Anton A, et al. Volatile organic compounds of lung cancer and possible biochemical pathways. Chem. Rev. 2012;112:5949–66.
Phillips M, Cataneo R, Ditkoff B, Fisher P, Greenberg J, Gunawardena R, Kwon C, et al. Volatile markers of breast cancer in the breath. Breast J. 2003;9:184–91.
Zheng Z, Lin X. Study on application of medical diagnosis by electronic nose. World Sci. Technol. World Science and Technology Press. 2012;14:2115–9.
Mochalski P, Theurl M, Sponring A. Analysis of volatile organic compounds liberated and metabolised by human umbilical vein endothelial cells ( HUVEC ) in vitro. Cell Biochem Biophyd. 2015;71:323–9.
Fuchs P, Loeseken C, Schubert JK, Miekisch W. Breath gas aldehydes as biomarkers of lung cancer. Int. J. Cancer. 2010;126:2663–70.
Nurlisa Y, Ammar Z, Mohammad IO, Shakaff AYM, Masnan MJ, Kamarudin LM, et al. In-vitro diagnosis of single and poly microbial species targeted for diabetic foot infection using e-nose technology. J. Med. Imaging Heal. Informatics. 2015;5:1251–4.
Filipiak W, Filipiak A, Sponring A, Schmid T, Zelger B, Ager C, et al. Comparative analyses of volatile organic compounds (VOCs) from patients, tumors and transformed cell lines for the validation of lung cancer-derived breath markers. J. Breath Res. 2014;8:1–13.
The author would like to thank School of Chemical Engineering and Analytical Science, University of Manchester, United Kingdom for their support and funding for this project. The authors also would like to thank Hospital Tuanku Fauziah (HTF), Kangar, Perlis for their collaboration in this project. Special thanks to Cell and Tissue Engineering Lab, Kulliyyah of Engineering, International Islamic University Malaysia for their guidance in cell culture technique and funding for the project.
This work is supported by the Centre of Excellence for Advanced Sensor Technology Research Board. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Availability of data and materials
The datasets collected and analyzed during this study are not publicly available due to the further analysis still on progress. Yet, the datasets are available from the corresponding author on reasonable request.
The authors declare that they have no competing interests.
Ethics approval and consent to participate
Not applicable as there was no research involving human or animal subjects in the study. The cell lines used in this study are purchased from American Types Culture Collection (ATCC).
Consent for publication
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.