This retrospective study was conducted in accordance with the Declaration of Helsinki and approved by the Institute Research Ethical Committee of National Taiwan University Hospital (NTUH 201710050RINA, 201904116RINC), which waived the requirement for informed consent from individual patients.
Local dataset and manual image segmentation
Patients with histologically- or cytologically-confirmed pancreatic adenocarcinoma were identified from the Cancer Registry of National Taiwan University Hospital (NTUH), a tertiary referral center with a large volume of PC. CT images of those PC patients and subjects with normal or unremarkable pancreas according to the formal radiologist reports were extracted from the imaging archive of NTUH for further review to construct the local datasets. If an individual patient underwent multiple CT examinations, only the one that immediately preceded the diagnosis of PC was used. In total, contrast-enhanced portal venous CT images of 546 confirmed PC patients between January 1, 2005, and December 31, 2019, and 1,466 subjects who underwent CT during the same period with a negative or unremarkable pancreas in the radiologist report were included in the local dataset. 733 controls were randomly selected from the local dataset to construct the nationwide test set (see below), whereas all cases and the remaining controls were randomly divided into a local training set (437 PCs, 586 controls) and a validation set (109 PCs, 147 controls) (Fig. 1).
CT examinations included in this study were performed with one of 47 scanners from six different manufacturers (GE Healthcare, Siemens Healthcare, Philips Healthcare, Toshiba, Picker International, and Hitachi Medical Corporation) with 100, 120, or 130 kV, automatic mA control, and without noise reduction. All CT images were obtained in the portal venous phase with intravenous administration of contrast medium (1.5 mL per kg body weight, with an upper limit of 150 mL). Each slice had a size of 512 × 512 pixels, with a thickness ranging between 0.7 mm and 1.5 mm that was reconstructed into 5-mm for subsequence analysis. The pancreas and tumor on the CT images of PC patients were manually segmented and labeled as regions of interest (ROIs) for model training by one of two experienced abdominal radiologists (P-TC and K-LL) who had over 8 years and 23 years of experience, respectively, with reference to the results of other examinations and surgical findings when needed.
Population-based test set
Taiwan’s National Health Insurance (NHI) is a single-payer compulsory health insurance program that covers inpatient and outpatient care for 99.8% of the population and is contracted with 92.7% (21,463/23,164) of the institutions in Taiwan [10]. Up to July 2021, around 2.9 million imaging studies performed in daily clinical practice were uploaded by institutions throughout Taiwan to the Applications of Artificial Intelligence in Medical Images Database of NHI. Patients diagnosed with PC throughout Taiwan were identified by searching for recipients of the Severe Illness Certificate issued for the International Classification of Diseases—10th Revision-Clinical Modification (ICD-10-CM) [11] code of C25 (malignant neoplasm of the pancreas) and its sub-items in the NHI Major Illness/Injury Certificate database, and the Applications of Artificial Intelligence in Medical Images Database was searched to retrieve CT studies of those patients with PC. If multiple CT studies were identified for an individual patient, only the study closest to the date of certificate issuance was retrieved, yielding CT studies of 671 cases with newly confirmed PC diagnosis between January 1, 2018, and July 31, 2019, throughout Taiwan. For controls with normal pancreas, CT studies performed for pre-donation evaluation of all kidney donors and liver donors during the same period were retrieved from the Applications of Artificial Intelligence in Medical Images Database and reviewed by a radiologist (P-TC) to confirm the absence of radiological abnormalities in the pancreas, yielding 73 control subjects. To balance the number of cases and controls, the 733 controls randomly selected from the local dataset were combined with the controls from the NHI dataset to form the nationwide test set (671 PCs, 806 controls) (Fig. 1).
Two- and three-dimensional radiomic analyses
The radiomics-based classification module consisted of three components: a 3D radiomics model (3D analysis), a 2D patch-based radiomics model (2D analysis), and a logistic regression model using the outputs of the former two radiomics models as independent variables (combined analysis) to predict the probability of PC. The 3D analysis examined the whole pancreas by 3D radiomic features, whereas in the 2D analysis the pancreas on each 2D slice was cropped into patches for subsequent extraction of 2D radiomic features.
For 3D analysis, the pancreas including the tumor (if present) was segmented using a previously trained automatic deep learning segmentation model [12]. This segmentation model was trained with CT images of 437 PC patients from the NTUH training and validation datasets and those of 393 subjects from three external datasets based on a model from coarse-to-fine network architecture search (C2FNAS) [13] (Additional file 1). For model training in the 2D analysis, segmentation of the pancreas and tumor was performed manually by radiologists in the images of PC patients and by the automatic segmentation model in the images of controls. During model testing, the automatic segmentation model was used for segmenting the pancreas and tumor in both 2D and 3D analyses.
All radiomic features in this study were extracted using an open-source platform (PyRadiomics) [14]. To eliminate bias resulting from the differences in spacing when extracting the radiomic features, all the images and segmentation labels were resampled to the spacing of 1 × 1 × 5 mm using linear interpolation and nearest-neighbor interpolation, respectively. The bin width for computing texture features was fixed at 16.
Training of classification models based on three-dimensional radiomic features
For 3D analysis, a 3D radiomics model was trained from the NTUH training set. The union of the pancreas and tumor, segmented by the automatic segmentation model, in the 3D CT volume served as the volume of interest (VOI) for subsequent extraction of 3D radiomic features. A total of 1183 features, including first-order and higher-order texture features on original and filtered images, were extracted from the VOI (Additional file 1). To differentiate whether a pancreas included a tumor based on 3D radiomic features, the 1183 features of all data from the training set were inputted into XGBoost, a widely used machine learning algorithm based on gradient boosting decision tree [15], to train a classification model. To mitigate overfitting, the training process was terminated when the area under the receiver operating characteristic curve (AUC) on the validation set was not increased for 30 iterations, and the model in the training process that had the highest AUC on the validation set was selected as the final model for 3D analysis. The loss function of this XGBoost model was set as logistic loss and the resultant probability served as the output of the 3D analysis.
Training of classification models based on two-dimensional patch-based radiomic features
In 2D analysis, every patch generated from the ROI (i.e. pancreas and tumor) in a CT study was analyzed by a 2D patch-based radiomics model to predict the probability of cancer in each patch, and the patient was predicted as with or without PC by jointly considering the predicted probabilities of all patches of the patient. The union of pancreas and tumor on the images was set as the ROI. After resampling, the images were cropped into 20 × 20 pixel square subregions (i.e. patches) on the axial (x–y) plane, using a moving window with a stride of five pixels. The patches which had more than 5% of the area overlapping with ROI were treated as valid patches and included in the analysis. The patches containing any portion of the tumor were labeled as cancerous patches, whereas the patches containing no tumor were labeled as non-cancerous. A total of 545 features were extracted from the ROI in each patch (Additional file 1). All the features of patches in the training set were then input into XGBoost with the logistic loss function for distinguishing cancerous patches from non-cancerous patches.
To determine whether a patient had a tumor, a heatmap was generated for each patient by aggregating the prediction from the 2D patch-based radiomic model. More specifically, all valid patches extracted from the ROI of a specific patient were input into the trained XGBoost model to obtain the probabilities of having a tumor, and then the prediction results of all patches were assembled into a heatmap. The value of each pixel in the heatmap was defined as the average of the predicted probabilities of the patches that contained this pixel. Then the area of the high-risk region of a CT study was defined as the area of the largest region formed by contiguously neighboring high-risk pixels among all the axial planes, with the threshold for classifying a pixel as high-risk set at the value corresponding to the highest AUC in differentiating between PCs and controls in the validation set, searched from 0.05 to 0.95 with a step of 0.01. The area of the high-risk region was used as the output of the 2D analysis.
Development of CAD tool based on automatic segmentation and two- and three-dimensional radiomic analysis combined
An automatic CAD tool for PC comprising a DL-based segmentation module and a radiomics-based classification module was developed (Fig. 2). The pancreas and tumor (if present) on CT images were first automatically segmented by the previously trained DL segmentation model [12] and then subject to the extraction of 3D and 2D radiomic features, which were subsequently analyzed by the trained 3D and 2D radiomic analysis models, respectively. The resultant probability of PC generated from the 3D analysis and the area of the high-risk region from the 2D analysis were then input into a logistic regression model trained with the validation set to yield the final prediction regarding whether the CT study harbored PC (combined analysis).
Model testing and statistical analysis
The performance of the automatic CAD workflow with 3D analysis, 2D analysis, and combined analysis was tested with the nationwide test set. The automatic segmentation was used as both VOI in 3D analysis and ROI in 2D analysis. For differentiating whether a patient had PC for each analysis, the cutoff was selected as the threshold that corresponded to the point with the highest Youden index (i.e. sensitivity + specificity—1) on the ROC curve plotted by the outputs of the corresponding analysis from all patients in the validation set. Model performance was assessed with ROC curves and associated AUCs. Sensitivity, specificity, and accuracy were ascertained with the respective exact confidence intervals (CI) calculated based on binomial distributions. For comparison between groups, Fisher exact test and Mann–Whitney U test were used for categorical variables and continuous variables, respectively. Comparison of AUCs between various analyses was conducted using the pROC package in R [16, 17] with paired Delong’s method.