The development and validation of a radiomic nomogram for the preoperative prediction of lung adenocarcinoma

Background Accurate diagnosis of early lung cancer from small pulmonary nodules (SPN) is challenging in clinical setting. We aimed to develop a radiomic nomogram to differentiate lung adenocarcinoma from benign SPN. Methods This retrospective study included a total of 210 pathologically confirmed SPN (≤ 10 mm) from 197 patients, which were randomly divided into a training dataset (n = 147; malignant nodules, n = 94) and a validation dataset (n = 63; malignant nodules, n = 39). Radiomic features were extracted from the cancerous volumes of interest on contrast-enhanced CT images. The least absolute shrinkage and selection operator (LASSO) regression was used for data dimension reduction, feature selection, and radiomic signature building. Using multivariable logistic regression analysis, a radiomic nomogram was developed incorporating the radiomic signature and the conventional CT signs observed by radiologists. Discrimination and calibration of the radiomic nomogram were evaluated. Results The radiomic signature consisting of five radiomic features achieved an AUC of 0.853 (95% confidence interval [CI]: 0.735–0.970), accuracy of 81.0%, sensitivity of 82.9%, and specificity of 77.3%. The two conventional CT signs achieved an AUC of 0.833 (95% CI: 0.707–0.958), accuracy of 65.1%, sensitivity of 53.7%, and specificity of 86.4%. The radiomic nomogram incorporating the radiomic signature and conventional CT signs showed an improved AUC of 0.857 (95% CI: 0.723–0.991), accuracy of 84.1%, sensitivity of 85.4%, and specificity of 81.8%. The radiomic nomogram had good calibration power. Conclusion The radiomic nomogram might has the potential to be used as a non-invasive tool for individual prediction of SPN preoperatively. It might facilitate decision-making and improve the management of SPN in the clinical setting.


Background
The most common cause of cancer death around the world is the lung and bronchus according to the 2017 cancer statistics [1][2][3]. Patients with lung cancer usually have a bad prognosis because most of them are diagnosed at an advanced stage (III or IV) with no discriminating symptoms as compared to early stage [4]. In clinical practice, accurate diagnosis of early lung cancer from small pulmonary nodules (SPN) is challenging. The detection of SPN is increasing with years worldwide, mainly because of the wide use of low-dose chest computed tomography (CT) screening. In the Early Lung Cancer Action Project performed by Henschke et al. [5], the detection rate of SPN was as high as 23%, which increased to 39.5% in patients received lung operation [6]. For indeterminate solid and ground-glass nodules, they should be followed with CT at least 2 and 3 years, respectively, according to the international guidelines for the management of SPN [7,8].
Therefore, accurate diagnosis of SPN using advanced tool will reduce health costs and extensive CT examinations with no additional benefits. Also, clinicians need an noninvasive imaging tool to determine whether a patient needs surgery or long-term follow-up.
Recently, by high throughput extracting quantitative imaging features from standard-of-care medical images, radiomics provides us a promising and non-invasive tool in cancer research [9,10]. The radiomic features mined by sophisticated bioinformatics tools might involve the process of diagnosis, prognosis and prediction [11]. Radiomic signature constructed by significant features has been applied for precision diagnosis and treatment of cancer, which will promote the development of precision medicine. Currently, radiomics has been used to decode tumor phenotypes, histological subtypes and pathological response of lung cancer [12][13][14].
Therefore, the aim of this study was develop and validate a radiomic nomogram for the individual preoperative prediction of lung adenocarcinoma from benign SPN, which would improve the decision-making of SPN in clinical practice.

Patients and nodules
Our institutional review board approved this retrospective study and waived the need for informed consent from patients. A total of 197 patients with 210 SPN treated with surgical resection were included from January 2011 to March 2017. Inclusion criteria were as follows: (1) Patients had histopathologically-confirmed SPN ≤10 mm; (2) Patients had available clinical data; (3) Patients underwent baseline lung CT scan with the same imaging parameters and reconstruction slice thickness; and (4) Patients' lung CT performed within 1 month before surgery. The patients were excluded if: (1) Patients received surgery before CT scans; and (2) Patients' lung CT images have breathing artifacts. The patients were randomly divided into training and validation sets by a computer algorithm at a ratio of 7:3. Figure 1 illustrates the study inclusion pathway.
A total of 11 CT findings of each nodule were collected from the last CT scan before surgery, including the maximum diameter, location, involvement of pleura (pleural indentation with or without pleural thickness, absence), nodule consistency (ground-glass nodule [GGN], solid, part-solid GGN), shape (regular [e.g., round, oval] or irregular), margins (lobulation, spiculation, both, absence), cavity (presence or absence), calcification (presence or absence), intranodular changes (necrosis, consolidation, vacuoles, air bronchogram, absence), bronchial disruption (presence, absence, unclear), and vessel convergence sign (presence or absence). Two radiologists with 13 years and 18 years of clinical experience in lung cancer reviewed all of the CT images and reached a consensus.

Imaging acquisition
Contrast-enhanced CT images were obtained by a 64slice CT scanner (Siemens Definition AS + 128, Forchheim, Germany). The imaging parameters were as follows: 120 kV; 120 mA; rotation time = 0.5 s; detector collimation = 64 × 0.625 mm; the field of view = 500 mm; and matrix size, 512 × 512. All patients received intravenous administration of iodinated contrast agent (1-1.1 ml/Kg, Ultravist 370, Bayer Pharma AG, Berlin, Germany). The CT images were obtained after a 30 s delay and reconstructed with a slice thickness of 2 mm.
CT-based radiomic feature extraction and selection Figure 2 shows the radiomic workflow of this study. The regions of interest (ROIs) of pulmonary nodules were delineated by a junior radiologist using open-source ITK-SNAP software (www.itk-snap.org) and validated by a senior radiologist. Radiomic features were extracted from contrast-enhanced CT images by using an inhouse feature extraction algorithm applied in Artificial  We applied the least absolute shrinkage and selection operator (LASSO) regression to select the most significant features suggestive malignancy [15]. We performed 100 iterations of 10-fold cross-validation with minimal binomial deviance to select the optimal parameters in LASSO regression [16].
Training and validation of the conventional CT signature, radiomic signature and radiomic nomogram To determine the additional value of radiomic signature to conventional CT features, we developed and compared three models (i.e., conventional CT signature, Fig. 2 Radiomic workflow. Contrast-enhanced chest CT images are retrieved for radiomic feature extraction. ROIs of pulmonary nodules are segmented and the corresponding ROIs are stacked up to construct VOI of the nodules. Six categories of radiomic features are extracted from within the defined VOI, including histogram features, form factor features, and texture features radiomic signature and radiomic nomogram). Conventional CT signature was built based on the results of multivariate logistic regression analysis of 11 conventional CT features. Radiomic signature or radiomic score (Rad-score) was calculated by linearly fitting the selected radiomic features after weighted by their respective coefficients. Finally, radiomic nomogram was constructed by a multiple logistic regression using the selected conventional CT features and Rad-score.
The area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, and specificity were used to evaluate the performance of the three models in the validation dataset. Calibration curve and the Hosmer-Lemeshow test were used to assess the calibration and goodness-of-fit of the radiomic nomogram [17].
Training and validation of the conventional CT signature, radiomic signature and radiomic nomogram The radiomic signature achieved an AUC of 0.878 (95%CI: 0.813 to 0.943), accuracy of 85.0%, sensitivity of 90.1%, and specificity of 76.8% in the training dataset (  Fig. 5b).
After multivariate analysis, only two CT findings (nodule consistency and margins) remained (P < 0.001 and P = 0.026, respectively). The two CT features attained an AUC of 0.842 (95%CI: 0.779 to 0.906), accuracy of 73.5%, sensitivity of 62.6%, and specificity of 91.1% in the training dataset and an AUC of 0.833 (95%CI: 0.707 to 0.958), accuracy of 65.1%, sensitivity of 53.7%, and specificity of 86.4% in the validation dataset ( Table 2). The AUCs of conventional CT signature and radiomic signature were not significantly different (P = 0.292 and 0.586 in the training and validation datasets, respectively).
A radiomic nomogram incorporating radiomic signature, internal composition and margins of nodule was constructed (Fig. 6a). The radiomic nomogram yielded an AUC of 0.911 (95%CI, 0.858 to 0.965), accuracy of 87.1%, sensitivity of 87.9%, and specificity of 85.7% in the training dataset and an AUC of 0.857 (95%CI: 0.723 to 0.991), accuracy of 84.1%, sensitivity of 85.4%, and specificity of 81.8% in the validation dataset (Table 2), which indicated that the radiomic signature provides added value to the conventional CT features in terms of discriminatory efficacy. The AUC of radiomic nomogram was not significantly different from that of conventional CT features and radiomic signature in the validation dataset (P = 0.304 and 0.864, respectively). The calibration curve of the radiomic nomogram is shown in Fig. 6b. The Hosmer-Lemeshow test yielded P values of 0.738 and 0.111 in the training and validation Bronchial disruption a datasets, respectively, which indicated good calibration power.

Discussion
We trained and tested a radiomic nomogram based on the radiomic signature and the anatomical CT features for individualized preoperative prediction of lung adenocarcinoma, which showed good discriminative power and calibration. This study indicates that CT-derived radiomic features supplement the CT findings reported by radiologists in the prediction process. Note that, this study provides a non-invasive and effective prediction tool to determine those patients with a high probability of lung adenocarcinoma. Early diagnosis of cancer is associated with prolonged survival [18], for instance, the 5-year overall survival of breast cancer was 74. 8% between 1975 and 1977; between 2003 and 2009, the number has significantly increased to 90.3% [19]. This increase is mainly due to earlier detection because of the extensive application of mammography for cancer screening [19]. Currently, small pulmonary nodules are still a common and challenging clinical problem. The classification performance of CT is limited, especially in small nodules (≤10 mm in diameter). More accurate and reliable non-invasive diagnostic tool is urgently needed for precise treatment. Early diagnosis of malignant pulmonary nodules is crucial for the improvement of patient's longterm overall survival.
To date, radiologists diagnose lung cancer by largely depending on qualitative features of CT images, such as nodule diameter, evidence of spiculation, upper lobe location, and pleural indentation [20]. Low-dose CT screening for pulmonary nodules may reduce mortality, however, it also has the risk of overdiagnosis due to detect indolent tumors [5]. Some radiologists contended serial examinations for all serendipitous SPN on CT to render an timely lung operation for cure [7], which may be too aggressive. Excessive detection of SPN might has    potential adverse implications on current medical system and clinical practice, such as low utilization of limited resources, raised health care costs, increased radiation and risk for morbidity and mortality of patients [7]. CTguided percutaneous biopsy has commonly used to obtain tumor histological results due to the characteristics of peripheral location of most pulmonary nodules. However, in actual clinical practice, progressively smaller nodules often result in reduced sensitivity for percutaneous biopsy [21,22] and other factors also influence the accuracy of biopsy including nodule morphology and length of needle path [20]. In addition, percutaneous biopsy has several limitations, such as invasive nature and high risk for complications [23]. Therefore, non-invasive imaging-based biomarkers are needed to provide additional diagnosis information.
Recently, the increased training of medical image analysis and tools has driven additional studies investigating the radiomics of lung cancer. Radiomic signatures may help to mining bioinformatics behind lung cancer on medical image, for instance, tumor staging [24], gene expression patterns [25], treatment response [26,27], and patient survival [28,29]. Current determination of whether radiomic features can improve the prediction of pulmonary nodules as being malignant as opposed to conventional visual assessment on CT is a hot topic [30,31], but most studies have examined nodules smaller than 30 mm in diameter. In this study, 210 SPN less than 10 mm with surgery-proven malignancy or benign status were included for radiomic analysis. All radiomic features were extracted from a same CT scanner, with same imaging parameters and reconstruction slice thicknesses. As Wu et al. indicated, without control of the variability of factors such as imaging scanners, scanning parameters, the performance of radiomic features could be depressed [32]. An increased number of radiomic features has the potential ability to quantify intratumoral heterogeneity. However, most of highdimensional features are redundant, which will cause poor classification performance. We aimed to select the radiomic features that most associated with lung adenocarcinoma. Only five useful features were selected from 385 features by LASSO algorithm. Unlike previous studies, this study describes some important CT findings that contribute to the differential diagnosis of lung adenocarcinoma. After multivariate analysis, internal composition and margins were two independent clinical features of lung adenocarcinoma. Those nodules with GGN, lobulation and/or signs of speculation had a higher risk for malignancy, which was consistent with the radiologists' experience. The conventional CT signature attained a accuracy of 0.735 and 0.651 in the training and validation dataset, respectively. We hypothesized that radiomic features could further improve the diagnostic accuracy of a CT signature. Our study demonstrated the predictive performance of conventional CT features was improved by adding radiomic features, attaining accuracy of 0.871 and 0.841 in the training and validation datasets, respectively.
A number of risk models have been developed, of varying complexity for identifying risk of incident lung cancer among patients with visible lung nodules [33][34][35][36][37][38]. The models were based on significant patient and nodule characteristics. The accuracy and clinical utility of predictive models depends on the case mix of the population in which it was derived and the prevalence of malignancy in that population. The risk prediction models should be externally validated before they are used in a different clinical setting and population. The four validated models were the Mayo Clinic [33], Veterans Administration [34], Herder [39] and Brock [38]. The studies have shown AUC of 0.89 for Mayo Clinic model, 0.74 for Veterans Administration, 0.92 for Herder and 0.90 for Brock. Our radiomic model achieved similar performance, with an AUC of 0.857. Compared with previous models, our model didn't consider patient data, but included radiomic features extracted from CT images that could reflect intratumoral heterogeneity. However, our model lacks external validation. We hope to explore the added value of radiomics to the existing risk prediction models.

Conclusions
In summary, this study showed the potential of radiomic features extracted from unenhanced CT images for predicting lung cancer before surgery. Radiomic features showed the added value to the conventional CT features in differentiating lung adenocarcinoma from benign SPN. This study provides doctors a radiomic nomogram as a non-invasive tool for individualized prediction of lung cancer preoperatively. However, before applying in real-world setting, more studies are needed to validate the performance of the radiomic nomogram.
Abbreviations CT: Computed tomography; SPN: Small pulmonary nodules; VOI: Volume of interest; LASSO: Least absolute shrinkage and selection operator; AUC: Area under the receiver operating characteristic curve; CI: Confidence interval; GGN: Ground glass nodule; GLCM: Gray-level co-occurrence matrix; RLM: Grey level run-length matrix; GLSZM: Gray level size zone matrix Fig. 6 The radiomic nomogram for lung adenocarcinoma prediction. a Radiomic nomogram developed for the prediction of lung adenocarcinoma, which incorporates radiomic signature, internal composition and margins of nodule. Plots (b) and (c) present the calibration curves of the nomogram in the training and validation datasets, respectively. The calibration curve illustrates the calibration of the nomogram in terms of the agreement between the predicted risk of malignancy and the observed outcomes of malignancy. The 45°diagonal line represents a perfect prediction, and the red line represents the predictive performance of the nomogram. The red line has a closer fit to the diagonal line, which indicates better predictive accuracy of the nomogram

Availability of data and materials
All data generated or analysed are included in this article.
Ethics approval and consent to participate This study was approved by the ethics review board of the First Affiliated Hospital of Guangzhou Medical University, the need for informed patient consent for inclusion was waived.

Consent for publication
Not applicable.