Skip to main content

Table 1 Performance of Random Forest classifier for HMB boundaries relative to other genomic regions

From: Distinct genomic and epigenomic features demarcate hypomethylated blocks in colon cancer

 

Sensitivity

Specificity

F-measure

AUC

Size of data set

Boundary vs. Inside

0.90

0.89

0.90

0.96

41,425

Boundary vs. Outside

0.84

0.81

0.83

0.91

41,430

Boundary vs. Promoter

0.98

0.97

0.98

0.99

31,051

Boundary vs. Promoter (SVM)

0.97

0.97

0.97

0.99

31,051

  1. ‘Inside’ and ‘outside’ refer to regions inside or outside HMBs, respectively. These regions were selected to match the length and CG content of HMB boundaries (see Methods). The last row contains the results of a Support Vector Machine classifier that was used to replicate the Random Forest result on the HMB boundary vs. Promoter region classification. In all cases, 70 % of the data was used as training, and 30 % was used for testing. Sensitivity, Specificity and F-measure were noted as the optimal F-measure