Blood test shows high accuracy in detecting stage I non-small cell lung cancer

Background In a previous study (Goebel et. al, Cancer Genomics Proteomics 16:229-244, 2019), we identified 33 biomarkers for an early stage (I-II) Non-Small Cell Lung Cancer (NSCLC) test with 90% accuracy, 80.3% sensitivity, and 95.4% specificity. For the current study, we used a narrowed ensemble of 21 biomarkers while retaining similar accuracy in detecting early stage lung cancer. Methods A multiplex platform, 486 human plasma samples, and 21 biomarkers were used to develop and validate our algorithm which detects early stage NSCLC. The training set consisted of 258 human plasma with 79 Stage I-II NSCLC samples. The 21 biomarkers with the statistical model (Lung Cancer Detector Test 1, LCDT1) was then validated using 228 novel samples which included 55 Stage I NSCLC. Results The LCDT1 exhibited 95.6% accuracy, 89.1% sensitivity, and 97.7% specificity in detecting Stage I NSCLC on the blind set. When only NSCLC cancers were analyzed, the specificity increased to 99.1%. Conclusions Compared to current approved clinical methods for diagnosing NSCLC, the LCDT1 greatly improves accuracy while being non-invasive; a simple, cost-effective, early diagnostic blood test should result in expanding access and increase survival rate.


CT scans recommended for diagnosing LC
The US Preventive Service Task Force (USPSTF) recommends that low-dose computed tomography (LDCT) scans be used as a screening method for LC in high risk patients age 55-85 with a 30-year smoking history, who have not quit for more than 15 years. The recommendation was in part based on the National Lung Screening Trial (NLST) study which demonstrated that screening with LDCT reduces LC mortality by 20% compared to chest x-rays [3,4]. However, this approach is not ideal.
In the NLST study, a PN was detected in 1 of every 4 subjects that had LDCT scans. Of the 7191 subjects found to have suspicious nodules on LDCT scans, 88.6% had a follow-up test (e.g., imaging, 89.8%; biopsy, 1.9%; and surgery, 4.7%), and only 292 (4.1%) were confirmed to have LC. Of these 292 cases, 54.1 and 41.1%, turned out to be Stage I and II, respectively. The LDCT scans had a false positive rate (FPR) of 96.1% [4]. Obviously, there is a need for a test with a lower FPR. Deep learning algorithms show promise to reduce the false positives in interpreting these images [5].
PET scans increasing in use for LC follow-ups PET scans have better statistics than LDCT [6]. A multicenter observational study by Tanner, et al., [7] evaluating PN management shows an increase in PET scan use with additional follow-ups of patients with indeterminate lung nodules. The accuracy rate of PET scans is 74%, with an overall FPR of 39% (36-55%) and overall false negative rate (FNR) of 9% (8-10%), depending on node size. The study concludes that 25% of PNs referred to a pulmonologist were malignant; 46% had additional surveillance, 33.2% had a biopsy, and 20.4% underwent lung surgery. About 35% of patients who had surgery had benign masses.

Pulmonary nodule guidelines
Most solitary PNs are detected incidentally by chest radiography and CT scans that were ordered to investigate other diseases. Approximately 150,000 solitary PNs are detected annually in the United States of America [8].
Recommendations for managing intermediate PNs, found in PET/CTs using the Lung-RADS [7,9] or Fleischner criteria [10], are not always followed. Many physicians consider other factors, such as age, smoking status, gender, patient preference, and use their experience when deciding on follow-up procedures for that patient's specific clinical situation. In a multicenter observational study of 377 patients, Tanner et al., indicated that invasive procedures were performed in 44% of low risk nodules (< 5% probability of malignancy) [7]. Today, current guidelines for management of lung nodules try to incorporate other factors that may be unique to a patient [9,10]. Prospective research on physician adherence to new guidelines and outcome on performed PN follow-up procedures will need to be completed.

Evaluating biomarkers to detect LC
There is a growing trend to use genetic and protein biomarkers for disease diagnosis, prognosis, and the evaluation of treatment efficacy (e.g., Grail, Guardant, Myriad Genetics) [11,12]. Biomarkers are defined as 'any substance, structure, or process that can be measured in the body or its products and influence or predict outcome or disease' [13]. Thus, a biomarker can be of physical, chemical, or biological nature, such as measurements of blood pressure, temperature, inflammatory cytokines (proteins), genetic (DNA) markers, or metabolites [14]. In this paper, we will limit our discussion to DNA and protein biomarkers.

DNA biomarkers
DNA biomarkers have been used to assess risk for developing specific diseases or response to therapy. DNA provides genetic information of the individual. Nonetheless, the path from DNA to an observable physical trait (e.g., disease) is complex. For instance, somatic mutations in the TP53, EGFR, and KRAS genes are commonly found in LC patients [15]; yet, somatic mutations are often due to increased exposure to carcinogens (e.g., smoking, radon), environmental factors (e.g., pollution, secondhand smoke), age, and health history (e.g., chronic COPD). Inherited mutations following an autosomal dominant pattern predispose an individual to be at high risk, but need not always predict the development of LC. The pattern of inheritance, penetrance, and expressivity of genetic mutations, in addition to lifestyle, environmental factors, and even ethnicity, are important components in assessing cancer risk [16].

Protein biomarkers
In contrast, protein reflects phenotype: the observable end-trait (e.g., tissue) resulting from the interaction of genome and the environment [17]. Protein biomarkers provide quantitative data that can be compared between a healthy and a diseased individual. Proteomics has its own challenges. Proteins, like genes, are pleiotropic: meaning the same protein markers may contribute to different immune-related pathways for different diseases. For example, IL-8 is a pleiotropic cytokine and has also been linked to breast, prostate, lung, colorectal, and skin cancer [18]. Hence, using a single biomarker, protein or DNA, would not be sufficient for clinical diagnostic use.
Protein levels can fluctuate due to physiological stressors (e.g., disease, strenuous exercise) and samples (i.e., serum, plasma) are sensitive to environmental factors (i.e., pH, temperature) and degrade faster than DNA.
Moreover, analytic protein platforms require the use of antibodies which, in turn, exhibit lot-to-lot variations due to the idiosyncratic nature of antibodies.
Despite the intricacies, genome and protein biomarkers, have proven to be essential tools in the discovery of predictive, prognostic, and diagnostic markers in LC [19][20][21].

Machine learning in medicine
Advances in computing combined with an increase in the amount of data collected has enabled the application of various machine learning techniques, such as Neural Networks and Random Forests, to tease out complex and non-linear relationships in data. These methods can also assist radiologists to interpret x-rays, CAT scans, PET scans and other diagnostic imaging methods; diagnose patients with disease; and may lead to a general improvement in patient care [22].
While machine learning methods are powerful, they have drawbacks. No machine learning method can compensate for poor data (i.e., dirty data). Machine learning is unable to provide causal information on its own; they are simply a set of advanced statistical techniques that can improve our ability to find complex, non-linear relationships in data [23,24].
Further, statistical models can be impacted by bias, human error, sample population, poor technical design, misapplication, and disparate systems. It is important that appropriate machine learning techniques and algorithms are applied to each study, that the data is collected, cleaned and processed in a consistent manner, and that bias are scrutinized from all angles [25].
Our preliminary studies identified protein biomarkers that may significantly improve our ability to identify NSCLC so this study was undertaken to prospectively test that hypothesis.

Methods
This study is a continuation of our previous research that used 33 biomarkers [11]. Here we reduced the number of biomarkers to 21, ensured successful transfer of reagents, and retrained our algorithm.

Study population
This study was performed on biobank plasma samples from 486 subjects distributed into 5 cohorts (Table 1).
In previous studies, we demonstrated that our method detected early to late stage NSCLC. In this study, our focus was to detect stage I-II LC. Therefore, samples from patients with Stage I-II NSCLC ( Table 2) were used to train the LCDT1 algorithm and, subsequently, only Stage I NSCLC samples (Table 2) were used in a blind set to validate clinical efficacy.

Sample collection and handling
Human plasma samples were obtained from five blood banks: Asterand, BioReclamation, BioSource, Geneticist, and Proteogenex. All cancer samples were confirmed by histology. All samples were collected through an IRB approved protocol (e.g., Protocol #AST-FPB-003, Western IRB) or a signed Waiver of Consent form. Individuals under the age of 18 or those who cannot consent for themselves were not included in the study. Samples were collected in the United States between 2013 and 2015.
Clinical information such as age, gender, pathology and stage, race, origin, smoking status, and sample collection dates were obtained. Whole blood samples were collected in EDTA tubes and stored at − 80°C according to the biobank's protocol. Plasma samples were transported on dry ice overnight to our sample storage site in Michigan City, Indiana, USA. Vials were inspected visually for damage upon receipt and stored at − 80°C until analysis. Breast, colon-rectal, pancreatic, and prostate cancer, all stages; smoker or non-smoker The non-smoker and NSCLC served as negative and positive control for lung cancer, respectively. Asthma sufferer and COPDs were included to test whether the diagnostic test can differentiate lung cancer from those who may have other respiratory diseases which share similar symptoms. The smokers consisted of highrisk population for LC who were not diagnosed with any cancer. Other cancers (i.e., breast, prostate, pancreatic, and colon-rectal) were included to ensure that the diagnostic test was specific to NSCLC

Multiplexed immunoassay procedure
This study used a custom-made multiplexed immunoassay to measure the concentration of 21 biomarkers in human plasma samples. Sample collection and handling, and immunoassay procedure are consistent with our previous study (1, Supplementary Figure 1). Sample processing was performed by Eve Technologies Corporation (Calgary, Alberta, Canada). This assay reagent and format was validated against the 33-biomarker reagent used in the previous study [11] to ensure that all biomarkers performed similarly and maintained its congruity with the algorithm.

Algorithm and statistical analysis
The algorithm considers duplicate measurements of the biomarkers from a patient and classifies each measurement as having NSCLC or not having NSCLC. If any of the measurement is classified as being from a subject with NSCLC, the subject is classified as having NSCLC.
Since the implicit costs of allowing the disease to progress without treatment is greater than the cost of a false negative, the LCDT1 algorithm errs on the side of predicting that a subject has NSCLC. A 5-PL curve was used to acquire the calibration curve. Data was cleaned based on preset criteria of ±20% coefficient of variation and removal of extrapolated and out of range data. Median, rather than average, was used to represent the central tendency of the plasma concentrations due to the skewed distributions and outliers. Normalization of diseased cohorts to healthy cohorts was examined for pattern recognition. P-values were calculated using T-tests, adjusted using Benjamini-Hochberg's method for multiple comparisons [26]. The AUC was calculated for each biomarker and as a combined set of biomarkers. The ROC curve was used illustrate the performance of the model. Excel and R Version 3.4.4 were used for data analysis.

Results
Training set for optimizing the LCDT1 algorithm In this study, we included the 33-biomarker model to examine congruity in using a higher set of biomarkers versus a smaller subset. Table 3 illustrates the algorithm performance using 33 versus 21 biomarkers are analogous. The LCDT1 algorithm was developed with slight modifications using a smaller subset of biomarkers from the 21. This information is proprietary and a patent application was filed. Patterns of up and down regulation of biomarkers were similar to our previous study [11]. The median concentration in LC patients compared to healthy non-smokers, asthma sufferers, and smokers was more than 200% higher in SAA (771%), MMP-9 (743%), IL-8 (535%), CXCL9/MIG (482%), TNFRI (406%), Gro (331%), MPO (300%), Rantes (274%), Resistin (271%), TNFRII (266), and MIF (219%). IL-2 and IL-7 showed greater than a 50% reduction in signal (Table 5).

Validation set performance
A novel blinded sample set of 228 (N = 456) subjects were processed in duplicate using the LCDT1. Of 228 subjects, 55 were Stage I NSCLC samples (Table 2). Our proprietary algorithm accurately detected 49 of the 55 Stage I LC samples (Fig. 1). There were 6 positive samples that were not detected and 4 negative LC samples that showed up as positive. The 4 samples that were false positives consisted of 3 breast cancers and 1 asthma sufferer (Supplementary Table 1). We were unable to follow-up with the patients to confirm if the breast cancer had metastasized into the lungs [27] or whether the asthma diagnosis was erroneously reached for an individual actually suffering of LC [28]. Algorithm 33 and the LCDT1 exhibit a similar accuracy rate of 95.6%, sensitivity of 89.1%, and a specificity of 97.7% in the validation test (Table 4). When only NSCLC cancers were analyzed, the specificity of both algorithms improved to 99.1%. This validation shows that the results are comparable using the 33 markers (from the previous study) versus the 21 or the LCDT1 markers (Table 4). Additional biomarkers were unnecessary to achieve the same clinical performance.

ROC curves and P-values
The Area under the ROC Curve (AUC) is the probability that an observation with a higher probability of being positive is positive. In our model, a 'positive' means that the model predicts that the subject has NSCLC. Although the discriminatory power, using AUC, for each individual biomarker was examined, it was not the determining factor in our selection process. The ROC/AUC for Algorithm 33, Algorithm 21, and the LCDT1 are 0.965, 0.960, and 0.966, respectively (Fig. 2a). When only NSCLC cancers were analyzed, the AUC for each algorithm improved by 0.01 (Fig. 2b). Once more, the Pvalues (p < 0.05) imply that several biomarkers are able to discriminate NSCLC from other pathologies to a degree (Table 5). These results (e.g., patterns, ROC/ AUC, performance) provide a strong foundation for developing a clinical diagnostic test for NSCLC.

Discussion
Protein biomarkers have been extensively examined for diagnostic, prognostic, and therapeutic assessment of diseases and its treatments. Yet, many lab-developed assays never fully mature to penetrate the clinical setting [29]. Apart from the regulatory hurdles, there are many factors, such as sample collection, reagent manufacturing, and the acquisition of data, that may cause variability of end-results, which affects robustness and consistency,~a requisite of any biological test used for clinical utility [30,31]. Reducing the number of biomarkers was an important component of the present study as decreases complexity and the number of interactions between the antibodies, simplifies reagent production, and is more cost-effective [32].
In narrowing our list, the biological justification for the selection of biomarkers was critical in avoiding numerical quirks that may mask the true driver of a physiological process [11]. To elaborate, the statistical model in the previous study was a Random Forest (RF) model. When an RF model is fit, a measure of the variable's importance is calculated. In this case, the variables are the biomarkers. The variable's importance is defined as how well, on average, the biomarker increases the distinction of groups in the model (in our case NSCLC and not-NSCLC). Here, the Node impurity (how well the trees partition the data at each step in the algorithm) is measured using the Gini index [33]. Due to the naturally occurring relationships between the biomarkers examined, depending on variable's importance as the sole factor in determining if a biomarker should stay in the smaller set of biomarkers to develop the new model, is not viable. If any two biomarkers are highly correlated, then the 'importance' of one biomarker is masked by the other biomarker. This is because both biomarkers would provide the same information to the model thereby making the excluded biomarker redundant. Therefore the 'redundant' biomarker, seeming insignificant, could have served as a substitute for the included biomarker [34].
However, if the two biomarkers are statistically correlated, but only one is biologically related to the disease, we may not be able to determine which biomarker is truly more important to the underlying biological mechanisms. Thus, biological relevance and patterns weighed heavily.
Many of the markers in our set have been studied for decades and have been shown to have potential for diagnosing LC [35][36][37][38][39]. In our studies, certain biomarkers were elevated at higher levels or depressed depending on whether we were looking at early stage (I-II) or late stage (III-IV) NSCLC patients, e.g., the upregulation of CEA and CYFRA-21-1 (common cancer markers widely studied) [36] were not as prominent in early stage NSCLC. The occurrence of a lower expressed CYFRA in the early stages of NSCLC has been indicated by Guergova-Kuras M, et al. [37] using monoclonal antibodies to detect early stage NSCLC. This phenomenon of varying marker levels at different stages of NSCLC is not surprising as protein abundance reflects current physiological state of the disease.
Examples of the markers that were elevated in stage I-II NSCLC were IL-8, MMP-9, and SAA. The synergistic regulation and pathways of these markers correlates with previous scientific findings: For example, IL-8 is a multifunctional chemokine that induces chemotaxis and phagocytosis, promotes angiogenesis, and aids in maintenance of mesenchymal features in carcinoma cells [40,41]. Robust upregulation of CXCL8 (aka IL-8) has been observed in erlotinib-resistant cell lines [41] which also makes it a cancer therapy target. A study by Liu et al. using 141 NSCLC patients indicated that IL-8 may have up-regulated MMP-9 in lymph node metastasis of NSCLC patients [38].
MMP-9 is a widely studied protease that cleaves extracellular matrix (ECM) proteins to regulate ECM remodeling [42]. MMP-9 is involved in basement membrane degradation that furthers tumor invasion and metastases [42]. Past studies showed that MMP-9 s are highly elevated in LC patients, especially stage III-IV [43,44]. We also observed a correlation between IL-8 and MMP-9 levels in LC patients.
SAA is an apolipoprotein that is secreted during acute phase inflammation and is a known LC biomarker. Sung et al. measured 180 healthy and 170 lung adenocarcinoma plasma or serum samples and found a 14-fold increase of SAA levels in the LC patient [45]. Another by Biaoxue, R. et al. indicated that SAA alone could detect LC with 0.59 sensitivity and 0.92 specificity [39]. We measured a six-fold increase in SAA levels at all stages of NSCLC compared to healthy controls.
Proteins such as IL-8, MMP-9, and SAA are involved in physiological inflammatory processes. Some of these proteins are highly expressed in specific cancers, while others are inhibited. Independently, each protein has the ability of discriminating healthy from disease patients. When LC biomarkers are multiplexed and combined with an algorithm and additional demographic data, its diagnostic capability increases and could serve as a powerful clinical tool. False Negative (FN) 6 7 6 All entries show the statistical (95% CI). *Other cancer types were included in the analysis. Each subject consisted of two replicates (N = 2) or two data points processed by the algorithm. If one data point was positive, then the subject was considered positive for LC. Using biomarkers for diagnosing diseases requires constant revalidation to ensure that it remains applicable to the intended population. Like any method, biomarkers have limitations as they are affected by sample origin, ethnicity, gender, environmental and carcinogenic exposure, and reagent and platform variations. Strict quality assurance and processes from the bench (e.g., developing reagents) to the clinic (e.g., collecting samples)  to the acquisition of the end result (e.g., data cleaning and processing) are imperative. Furthermore, statistical and machine learning algorithms also need to be tested for bias and refined as new data are collected. Despite, these limitations, biomarkers in conjunction with machine learning methods serve as an important component in fighting cancer as they provide benefits. Such advantages include a means of a simple, noninvasive method in detecting cancer; acquiring prognostic information, and assessment of the efficacy of therapeutic methods.

Conclusions
We aimed to develop an accurate test that was specific to early stage NSCLC. A multi-cancer test, though remarkable, could increase patient anxiety and fiscal expense due to additional (possibly unnecessary) follow-up procedures. These concerns are mirrored in medical practitioners' reluctance to order full body imaging in asymptomatic patients [46].
This study shows that we were able to successfully reduce the number of biomarkers from 33 to 21, while maintaining a high performance in detecting early stage NSCLC. The LCDT1 is 97.7% specific for Stage I NSCLC even when other cancer types were present. An estimated 9 out of 10 (89.1% sensitive) early stage LC patients would be detected by the LCDT1. The LCDT1 is 95.6% accurate.
As a diagnostic test, physicians prefer tests with high sensitivity and sacrifice specificity. The argument is that not detecting "a" cancer is more detrimental than a false negative. A highly sensitive diagnostic test is important where the test is used to identify a serious but treatable disease; and a highly specific test avoids further subjection of the patient to unnecessary follow-up medical procedures. In the case of LC, current diagnostic methods (i.e., CT, PET) have high sensitivity but low specificity. If patients who are suspected to have a lung nodule on a CT are given a second test with a low (or high) sensitivity and high specificity, then nearly all of the false positives could be identified as disease free.
Our clinical goal is to decrease risks and unnecessary procedures to patients without delaying curative treatment [47] and increase access to communities with social and economic barriers. The LCDT1 is a simple blood test with great potential for clinical applications in detecting Stage I NSCLC. When used with gold standards such as the CT/PET scans in conjunction with algorithms and improved PN guidelines, could mean a significant reduction in the number of false negatives and an increase in early stage detection.
Additional file 2. Supplementary Table 1. Actual and predicted results using the LCDT1 Algorithm.