Study design
A multi-vendor, retrospective clinical validation study comparing the performance of physicians before and after using the CAD was conducted to evaluate the capability of the CAD to assist physicians in detecting lung cancers on chest radiographs. Readers of varying experience level and specialization were included to determine if use of this model on regularly collected radiographs could benefit general physicians. This CAD is commercially available in Japan. The Osaka City University Ethics Board reviewed and approved the protocol of the present study. Since the chest radiographs used in the study had been acquired during daily clinical practice, the need for informed consent was waived by the ethics board. We have created this article in compliance with the STARD checklist [25].
Datasets
To evaluate the AI-based CAD, chest radiographs of posterior-anterior view were retrospectively collected. Chest radiographs with lung cancers were consecutively collected from patients who had been subsequently surgically diagnosed with lung cancer between July 2017 and June 2018 at Osaka City University Hospital, which provides secondary care. The corresponding chest CT, taken within 14 days of the radiograph, were also collected. Chest radiographs with no findings were consecutively collected from patients who reported no nodule/mass finding by chest CT taken within 14 days at the same hospital. Detailed criteria are shown in Additional_File_1. Since the study included patients who visited our institution for the first time, there was no patient overlap among the datasets. Radiographs were taken using a DR CALNEO C 1417 Wireless SQ (Fujifilm Medical), DR AeroDR1717 (Konica Minolta), or DigitalDiagnost VR (Philips Medical Systems).
Eligibility criteria and ground truth labelling
The eligibility criteria for the radiographs were as follows: (1) Mass lesions larger than 30 mm in size were excluded. (2) Metastatic lung cancer that was not primary to the lung was excluded. (3) Lung cancers showing anything other than nodular lesions on radiograph were excluded. (4) Nodules in the chest radiographs were annotated with bounding box, referring to chest CT images by two board-certificated radiologists, who had six years (D.U.) and five years (A.S.) of experience interpreting chest radiographs. Ground glass nodules with a diameter of less than 5 mm were excluded even if they were visible on CT, as they are not considered visible on chest radiographs. When there was disagreement between the annotating radiologists, consensus was achieved by discussion. Chest radiographs with lung cancer presenting nodules, their bounding boxes, and normal chest radiographs were combined to form a test dataset.
The artificial intelligence-based computer-assisted detection model
The AI-based CAD used in this study is EIRL Chest X-ray Lung nodule (LPIXEL Inc.), commercially available in Japan as of August 2020 as a screening device to find primary lung cancer. The CAD was developed based on an encoder-decoder network categorizing segmentation technique in DL. The CAD was configured to display bounding boxes on all areas of suspected cancer in a radiograph. In the process of internal CAD, the areas suspected of being cancer on chest radiograph were segmented, and the maximum horizontal and vertical diameters of the segmented area are displayed as a bounding box.
Reader performance test
To evaluate the capability of the CAD to assist physicians, a reader performance test comparing physician performance before and after use of the CAD was conducted. This CAD is certified as a medical software for use by physicians as a second opinion. In other words, physicians first read a chest radiograph without CAD, and then check the CAD output to make a final diagnosis. A total of eighteen readers (nine general physicians and nine radiologists from nine medical institutions) each interpreted the test dataset. The readers had not previously interpreted the same radiographs, did not know the ratio of malignant to normal cases, and clinical information regarding the radiographs was not made available to them. This process was double blinded for the examiners and the reading physicians.
The study protocol was as follows: (1) Each reader was individually trained with 30 radiographs outside the test dataset to familiarize them with the evaluation criteria and use of the CAD. (2) The readers interpreted the radiographs without using the AI-based CAD. If the reader concluded that there was a nodule in the image, then the lesion was annotated with a bounding box on the radiograph. Because the model was designed to produce bounding boxes on all areas that are considered to be positive, we instructed the readers to provide as many bounding boxes as they deemed necessary. (3) The CAD was then applied to the radiograph. (4) The reader interpreted the radiograph again, referring to the output of the CAD. If the reader changed their opinion, he or she annotated again or deleted the previous annotation. (5) The boxes annotated by the reader before and after use of the AI-based CAD were judged correct if the overlap, measured by the intersection over union (IoU), was 0.3 or higher. This value was chosen to meet a stricter standard based on the results from previous studies (Supplementary methods in Additional_File_1).
Statistical analysis
To evaluate the case-based performance of the readers and the CAD, the accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were evaluated. A lung cancer patient with annotations with an IoU greater than or equal to 0.3 for a ground-truth lesion on a chest radiograph was defined as a true positive (TP) case, a lung cancer patient with annotations with an IoU less than 0.3 for a ground-truth lesion on a chest radiograph was defined as a false negative (FN) case, a non-lung cancer case with no annotations on a chest radiograph was defined as a true negative (TN) case, and a non-lung cancer case with one or more annotations on the chest radiograph was defined as a false positive (FP) case.
To evaluate the lesion-based performance of the readers and the CAD, we also determined the mean false positive indications per image (mFPI). The mFPI was defined as the value of the total false positive (FP) lesions divided by the total number of images. Annotated lesions were defined as FP if they had an IoU less than 0.3 with a ground-truth lesion. All annotations on a chest radiograph without lung cancer were defined as FP lesions.
These definitions are visually represented in Additional_File_2. In order to assess the improvement of readers’ performance metrics for detection of lung nodules due to the CAD, we determined the metrics for cases with and without CAD using Generalized Estimating Equations [26,27,28]. For each prediction metric, the performance with the CAD was divided by the performance without the CAD to assess the improved ratio. The statistical inferences were performed with two-sided 5% significance level. Decisions of readers before and after referencing CAD output were counted to evaluate the CAD effect. Two of the authors (D.U. and D.K.) performed all analyses using R, version 3.6.0.