- Research article
- Open Access
- Open Peer Review
Support vector machine model for diagnosis of lymph node metastasis in gastric cancer with multidetector computed tomography: a preliminary study
BMC Cancervolume 11, Article number: 10 (2011)
Lymph node metastasis (LNM) of gastric cancer is an important prognostic factor regarding long-term survival. But several imaging techniques which are commonly used in stomach cannot satisfactorily assess the gastric cancer lymph node status. They can not achieve both high sensitivity and specificity. As a kind of machine-learning methods, Support Vector Machine has the potential to solve this complex issue.
The institutional review board approved this retrospective study. 175 consecutive patients with gastric cancer who underwent MDCT before surgery were included. We evaluated the tumor and lymph node indicators on CT images including serosal invasion, tumor classification, tumor maximum diameter, number of lymph nodes, maximum lymph node size and lymph nodes station, which reflected the biological behavior of gastric cancer. Univariate analysis was used to analyze the relationship between the six image indicators with LNM. A SVM model was built with these indicators above as input index. The output index was that lymph node metastasis of the patient was positive or negative. It was confirmed by the surgery and histopathology. A standard machine-learning technique called k-fold cross-validation (5-fold in our study) was used to train and test SVM models. We evaluated the diagnostic capability of the SVM models in lymph node metastasis with the receiver operating characteristic (ROC) curves. And the radiologist classified the lymph node metastasis of patients by using maximum lymph node size on CT images as criterion. We compared the areas under ROC curves (AUC) of the radiologist and SVM models.
In 175 cases, the cases of lymph node metastasis were 134 and 41 cases were not. The six image indicators all had statistically significant differences between the LNM negative and positive groups. The means of the sensitivity, specificity and AUC of SVM models with 5-fold cross-validation were 88.5%, 78.5% and 0.876, respectively. While the diagnostic power of the radiologist classifying lymph node metastasis by maximum lymph node size were only 63.4%, 75.6% and 0.757. Each SVM model of the 5-fold cross-validation performed significantly better than the radiologist.
Based on biological behavior information of gastric cancer on MDCT images, SVM model can help diagnose the lymph node metastasis preoperatively.
Gastric cancer is one of the leading causes of cancer-related deaths worldwide . Lymph node status is an important prognostic factor regarding long-term survival . The TNM staging system based on American Joint Committee on Cancer (AJCC) is accepted widely now . The 5-year survival rate of patients in the N0 stage after surgery was 86.1%, while the N1, N2, and N3 stage patients dropped to 58.1%, 23.3% and 5.9%, respectively .
At present, many imaging techniques have been used to assess gastric cancer, including abdominal ultrasound, endoscopic ultrasound (EUS), multi-slice spiral CT, conventional MRI, and FDG-PET. However, these imaging methods cannot reliably confirm or exclude the presence of lymph node metastasis . A meta-analysis showed that the average sensitivity and specificity in determining LN metastasis were as follows: 39.9% and 81.8% for abdominal ultrasound, 70.8% and 84.6% for endoscopic ultrasonography, 80.0% and 77.8% for MDCT, 68.8% and 75.0% for conventional MRI, 34.3% and 93.2% for FDG-PET, and 54.7% and 92.2% for FDG-PET/CT . Any single application of these imaging tools cannot satisfactorily assess the gastric cancer lymph node status. The reason is that we mainly diagnose LNM by the size of lymph nodes. The diagnostic criteria range from 5 mm to 10 mm . But the large lymph nodes may be caused by inflammation and the small lymph nodes may be metastatic. Many studies have shown that gastric cancer LN metastasis was associated with tumor size, depth of invasion, histological type and pathological lymphatic involvement [5–8]. There is no suitable method to combine lymph node size with the multiple factors described above to make a comprehensive analysis. How to integrate the complex factors affecting lymph nodes and improve the accuracy of diagnosing LNM is the topic of our study.
In the past decade, machine-learning methods, complementary to traditional statistical methods, have been used to predict complex biological phenomena. Support Vector Machine is a new generation of learning algorithms developed on the basis of statistical theory. The SVM algorithm has a strong theoretical foundation, based on the ideas of VC (Vapnik Chervonenkis) dimension and structural risk minimization. It has satisfied accuracy . SVM has been used in some medical applications, mainly in molecular biology and neuroimaging [10–12]. It can be used for classification and regression. Given a set of training examples, each marked as belonging to one of two categories, a SVM training algorithm builds a model that predicts whether a new example falls into one category or the other.
The purpose of this study is to use SVM method to analyze the MDCT imaging information related to the biological behavior of gastric cancer and establish the mathematical models to assess lymph node metastasis preoperatively.
This retrospective study was approved by our institutional review board. Between April 2006 and September 2008, 368 consecutive patients with newly diagnosed gastric cancer were administered preoperative contrast enhancement abdominal CT examinations and then received the gastrectomy at our hospital. The patients corresponded to the inclusion and exclusion criteria below were included in this study.
The patients received radical gastrectomy and D2 lymph nodes dissection. They were preoperatively examined with multi- detector row CT. All patients were confirmed as gastric cancer by postoperative histopathology.
The patients received preoperative neoadjuvant therapy. Distant metastasis was found in the preoperative examination or in the operation.
Finally, 175 patients (125 males, 50 females, mean age, 59.8 years; range, 30-85 years) comprised our study population. We obtained informed consent from all selected patients prior to the routine clinical course of CT examinations.
MDCT was performed using a 64-detector row CT scanner (LightSpeed 64; GE Healthcare, Milwaukee, Wis). Each patient fasted for more than 8 hours before the CT examination. To enable gastric distention and reduce gastric motility, the patients received 8 g gas-producing crystals orally and an intramuscular injection of 10 mg anisodamine 10-15 minute before the examination. Upper abdominal unenhanced CT scans from the diaphragmatic domes to 2 cm below the lower margin of the air-distended gastric body were acquired with a collimation of 0.625 mm, 120-140 kVp, and 300-350 mAs. Subsequently, a total of 100 ml of iopromide (Ultravist; Schering, Berlin, Germany) was administered intravenously through an 18-gauge angiographic catheter inserted into an antecubital vein at 3 mL/sec by using an automatic injector. Contrast-enhanced CT scans were performed in the arterial phase (30 seconds) and in the portal venous phase (70 seconds). We made the multi-planar reconstruction with the portal venous phase image.
Two radiologists, one with 3 yrs and the other with 8 yrs experience in abdominal CT performed image analyses jointly to agreement. If there was disagreement, they consulted with another radiologist who had 20 yrs experience in abdominal CT until agreement was achieved. We measured and counted the six indicators on MDCT images by hands as follows:
Tumor maximum diameter
Measure the diameter of gastric cancer in the axial, coronal, and sagittal images based the MPR images. And decide the tumor maximum diameter.
Early gastric cancer or Borrmann classification of advanced cancer in the MPR images was determined.
Axial and MPR images were simultaneously evaluated to determine the serosal invasion. The entire thickening stomach wall abnormally enhanced and linear or reticular structures in the fatty layer surrounding the stomach indicated serosal invasion .
Number of lymph nodes
The number of all the visible gastric regional lymph nodes in the MDCT images by groups was counted .
Maximum lymph node size
The short axis of the largest lymph node detected in CT images was measured.
Lymph nodes station
The lymph nodes station with MDCT images based on the Japanese classification of gastric carcinoma was determined .
Support vector machine
Support Vector Machine is a supervised machine learning technique that is widely used in pattern recognition and classification problems. SVM algorithm performs a classification by constructing a multidimensional hyperplane that optimally discriminates between two classes by maximizing the margin between two data clusters. This algorithm achieves high discriminative power by using special nonlinear functions called kernels to transform the input space into a multidimensional space . In this study, a free available SVM software called LibSVM 2.89 was used to generate the SVM model . The input indexes were the six indicators collected on MDCT images above. For these indicators, the measurement data could be entered to SVM model directly. While the count data should be defined as some numbers. For example, positive serosal invasion was defined as 1 and negative was -1. The output index was the lymph node metastasis of the patient. It was confirmed by the surgery and histopathology. If the patient had one or more lymph nodes metastasis, it was considered as positive LNM. We defined the positive LNM as 1 while the negative was -1. We selected the RBF Kernel to build the model. To train and test our SVM model, we used a standard machine-learning technique called k-fold cross-validation. Because the whole sample size of our study was not very large, we used 5-fold cross-validation. The whole data were divided into 5 equal and distinct subsets. Four of these subsets are combined and used for training, and the remaining one set is used for testing. This cross-validation process was repeated 5 times, allowing each subset to serve once as the test data set.
A univariate statistical analysis using the SPSS/PC+ statistical software package version 11.5 (SPSS Inc, IL, Chicago, USA) was performed to evaluate the differences of six imaging indicators between the patients who had LNM or not. The statistical analysis methods were the Independent-samples T test and Mann-Whitney U test. P < 0.05 was considered significant difference. Receiver operating characteristic (ROC) curve was used to evaluate the diagnostic performance of the SVM model. The Medcalc software version 11.2 (Medcalc, Medcalc Software, Ghent, Belgium) was used to make the ROC curves and compare them. In summary, we averaged the area under the curve (AUC) of the ROC curves of the 5-fold cross-validation. We also counted the means of sensitivity and specificity. To compare with the SVM model, we constructed the ROC curve for radiologist assessment by using maximum lymph node size as criteria to classify the LNM. The sensitivity and specificity of the best cut-off point were counted.
In these 175 cases, there were 134 cases which had lymph node metastasis and 41 cases had not. Patients' clinicopathological features were detailed at the Table 1. We collected the six indicators on MDCT images. The results of the univariate statistical analysis indicated that the all six indicators including serosal invasion, tumor classification, tumor maximum diameter, number of lymph nodes, maximum lymph node size and lymph nodes station were significant different between the LNM positive and negative group (P < 0.001). The means of tumor maximum diameter, number of lymph nodes, and maximum lymph node size in LNM positive group were 56.6 ± 19.5 mm, 10.0 ± 5.5 mm, and 12 ± 8, respectively. They were all higher than those of LNM negative group (Table 2).
The radiologist achieved an AUC of 0.757 as classifying lymph node metastasis of the patient by maximum lymph node size. The best cut-off point of maximum lymph node size was 7.7 mm. The sensitivity and specificity were only 63.4% and 75.6%. The SVM's means of the sensitivity, specificity and AUC with 5-fold cross-validation were 88.5%, 78.5% and 0.876, respectively (Table 3). Compared to the radiologist, each AUC of the 5-fold cross-validation SVM models performed significantly better (P < 0.05) than the radiologist (Figure 1,Table 3).
Lymph node metastasis affects the surgical treatment of patients with gastric cancer and is also an important factor in prognosis. At present, preoperative diagnosis mainly depends on various imaging methods. The standard for judging lymph node metastasis relies on morphological indicators. Lymph node size is the dominant indicator. However, Dorfman RE et al reported that upper limits of normal for lymph node size at abdominal computed tomography varied from 6 to 11 mm . They partly overlapped with the malignant lymphadenopathy. Fukuya T et al showed that CT attenuation and lymph-node configuration could aid in diagnosis of malignant adenopathy . On the contrary, Deutch SJ et al expressed that size, location, contour, density were not helpful in distinguishing benign from malignant lymphadenopathy . Lack of criteria for judging is the main constraint for the prediction of lymph node metastasis preoperatively.
The biological behavior of gastric cancer reflects the histopathological performance of the tumor's malignance and invasion. It affects lymph node metastasis directly or indirectly. The concrete manifestation of the biological behavior includes, for example, tumor size, depth of invasion, tumor invasion of other organs, lymph node metastasis and distant metastasis. MDCT can clearly display these pathological occurrences. Some studies have reported that the accuracy of gastric cancer T staging with MDCT combined with 3D reconstruction was 84-89% [20, 21]. Zhang XP et al reported that the number of lymph nodes detected by MDCT showed a significant difference between the lymph node metastasis group and no metastasis group in cardiac cancer . MDCT can also indicate the situation in other abdominal organs and the peritoneum. Therefore, MDCT imaging can accurately reflect the biological behavior of gastric cancer histopathology. Univariate analysis in our study showed that the 6 indicators of gastric cancer and lymph nodes information on CT images all have a relation to LNM. So we should consider these biological behaviour factors comprehensively in predicting LNM.
There were some other machine-learning methods used in medical studies. The mainly method was artificial neural network (ANN). ANN is considered to be an appropriate method for medical data analysis . Bollschweiler et al applied a single-layer perceptron, which is a kind of ANN, to predict lymph node metastasis in gastric cancer. The accuracy of ANN was 79% . However, the ANN had some disadvantages. ANN's model was prone to overfitting. It required lengthy development and time to optimize. They were more difficult to use in the field because of computational requirements . In consideration of the above reasons, we selected the SVM model instead. The SVM could produce lower prediction error compared to classifiers based on other methods like artificial neural networks . Compared with ANN, SVM may have the same even better predictive ability [27, 28]. At present, there are few reports about the application of SVM in gastric cancer lymph node metastasis. As a preliminary study, our results indicate that SVM model has better diagnostic capability for LNM than the traditional LN size criteria. The AUC has achieved a good diagnostic power. With further improvement, SVM may become an effective method to predict lymph node staging of gastric cancer.
Based on biological behavior information of gastric cancer on MDCT images, SVM model can help diagnose the lymph node metastasis preoperatively.
Tunaci Mehtap: Carcinoma of stomach and duodenum: radiologic diagnosis and staging. Eur J Radiol. 2002, 42 (3): 181-92. 10.1016/S0720-048X(02)00035-9.
Kwee RM, Kwee TC: Imaging in assessing lymph node status in gastric cancer. Gastric Cancer. 2009, 12: 6-22. 10.1007/s10120-008-0492-5.
Greene FL, Balch CM, Page DL, Haller DG, Fleming ID, Morrow M, Fritz AG: AJCC manual of staging of cancer. 2002, New York, NY: Springer-Verlag, 6
Zhang XF, Huang CM, Lu HS, Wu XY, Wang C, Guang GX, Zhang JZ, Zheng CH: Surgical treatment and prognosis of gastric cancer in 2613 patients. World J Gastroenterol. 2004, 10: 3405-8.
Fang Y, Zhao DB, Zhou JG, Cai JQ: Multivariate analysis of risk factors of lymph node metastasis in early gastric cancer. Zhonghua Wei Chang Wai Ke Za Zhi. 2009, 12 (2): 130-2.
Shen L, Huang Y, Sun M, Xu H, Wei W, Wu W: Clinicopathological features associated with lymph node metastasis in early gastric cancer: analysis of a single-institution experience in China. Can J Gastroenterol. 2009, 23 (5): 353-6.
Wu CY, Chen JT, Chen GH, Yeh HZ: Lymph node metastasis in early gastric cancer: a clinicopathological analysis. Hepatogastroenterology. 2002, 49 (47): 1465-8.
Nasu J, Nishina T, Hirasaki S, Moriwaki T, Hyodo I, Kurita A, Nishimura R: Predictive factors of lymph node metastasis in patients with undifferentiated early gastric cancers. J Clin Gastroenterol. 2006, 40 (5): 412-5. 10.1097/00004836-200605000-00009.
Pirooznia M, Deng Y: SVM Classifier - a comprehensive java interface for support vector machine classification of microarray data. BMC Bioinformatics. 2006, 7 (Suppl 4): S25-10.1186/1471-2105-7-S4-S25.
Klöppel S, Stonnington CM, Barnes J, Chen F, Chu C, Good CD, Mader I, Mitchell LA, Patel AC, Roberts CC, Fox NC, Jack CR, Ashburner J, Frackowiak RS: Accuracy of dementia diagnosis: a direct comparison between radiologists and a computerized method. Brain. 2008, 131 (Pt 11): 2969-74.
Das K, Giesbrecht B, Eckstein MP: Predicting variations of perceptual performance across individuals from neural activity using pattern classifiers. Neuroimage. 2010,
Mourão-Miranda J, Bokde AL, Born C, Hampel H, Stetter M: Classifying brain states and determining the discriminating activation patterns: Support Vector Machine on functional MRI data. Neuroimage. 2005, 28 (4): 980-95.
Kumano S, Murakami T, Kim T, Hori M, Iannaccone R, Nakata S, Onishi H, Osuga K, Tomoda K, Catalano C, Nakamura H: T staging of gastric cancer: role of multi-detector row CT. Radiology. 2005, 237 (3): 961-6. 10.1148/radiol.2373041380.
Japanese Gastric Cancer Association: Japanese Classification of Gastric Carcinoma -2nd English Edition -. Gastric Cancer. 1998, 1 (1): 10-24.
Yu W, Liu T, Valdez R, Gwinn M, Khoury MJ: Application of support vector machine modeling for prediction of common diseases: the case of diabetes and pre-diabetes. BMC Med Inform Decis Mak. 2010, 10 (16):
Chang CC, Lin CJ: LIBSVM --A Library for Support Vector Machines. 2009, [http://www.csie.ntu.edu.tw/~cjlin/libsvm]
Dorfman RE, Alpern MB, Gross BH, Sandler MA: Upper abdominal lymph nodes: criteria for normal size determined with CT. Radiology. 1991, 180 (2): 319-22.
Fukuya T, Honda H, Hayashi T, Kaneko K, Tateshi Y, Ro T, Maehara Y, Tanaka M, Tsuneyoshi M, Masuda K: Lymph-node metastases: efficacy for detection with helical CT in patients with gastric cancer. Radiology. 1995, 197 (3): 705-11.
Deutch SJ, Sandler MA, Alpern MB: Abdominal lymphadenopathy in benign diseases: CT detection. Radiology. 1987, 163 (2): 335-8.
Kim AY, Kim HJ, Ha HK: Gastric cancer by multidetector row CT: preoperative staging. Abdom Imaging. 2005, 30 (4): 465-72. 10.1007/s00261-004-0273-5.
Chen CY, Hsu JS, Wu DC, Kang WY, Hsieh JS, Jaw TS, Wu MT, Liu GC: Gastric cancer: preoperative local staging with 3D multi-detector row CT--correlation with surgical and histopathologic results. Radiology. 2007, 242 (2): 472-82. 10.1148/radiol.2422051557.
Zhang XP, Cui YH, Tang L: Research for the correlative indicators of diagnosing lymph nodes metastasis by helical CT in gastric cardiac carcinoma. Chinese Journal of Practical Surgery. 2007, 27 (1):
Patel JL, Goyal RK: Applications of artificial neural networks in medical science. Curr Clin Pharmacol. 2007, 2 (3): 217-26. 10.2174/157488407781668811.
Bollschweiler EH, Mönig SP, Hensler K, Baldus SE, Maruyama K, Hölscher AH: Artificial Neural Network for Prediction of Lymph Node Metastasis in Gastric Cancer: A Phase II Diagnostic Study. Ann Surg Oncol. 2004, 11 (5): 506-11. 10.1245/ASO.2004.04.018.
Ahmed Farid: Artificial neural networks for diagnosis and survival prediction in colon cancer. Mol Cancer. 2005, 4: 29-10.1186/1476-4598-4-29.
Byvatov E, Schneider G: Support vector machine applications in bioinformatics. Appl Bioinformatics. 2003, 2 (2): 67-77.
McQuisten KA, Peek AS: Comparing artificial neural networks, general linear models and support vector machines in building predictive models for small interfering RNAs. PLoS One. 2009, 4 (10): e7522-10.1371/journal.pone.0007522.
Lee HJ, Hwang SI, Han SM, Park SH, Kim SH, Cho JY, Seong CG, Choe G: Image-based clinical decision support for transrectal ultrasound in the diagnosis of prostate cancer: comparison of multiple logistic regression, artificial neural network, and support vector machine. Eur Radiol. 2010, 20 (6): 1476-8. 10.1007/s00330-009-1686-x. Epub 2009 Dec 17
The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2407/11/10/prepub
We thank Jie Li, Yong Cui, Li-Ping Qi, Xiao-Ting Li for editorial support and Jun Shan, Ning Wang, Ying Li, Shun-Yu Gao for reviewing the manuscript.
Project supported by the National Natural Science Foundation of China (Grant No. 30970825) and Beijing Municipal Natural Science Foundation (No. 7092020).
The authors declare that they have no competing interests.
XPZ carried out the study design and data acquisition. ZLW carried out the data interpretation and manuscript editing. LT, KC carried out the manuscript drafting for intellectual content. YSS, YG participated in the design of the study and the statistical analysis. All authors read and approved the final manuscript.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.