Skip to main content

Table 3 Number of transcripts after steps of filtration and time to run ML algorithms on them

From: Identifying novel transcript biomarkers for hepatocellular carcinoma (HCC) using RNA-Seq datasets and machine learning

 

Datasets

Known protein biomarkers

All data

Steps

All transcripts

Protein coding

Non-coding

All transcripts

Protein coding

Non-coding

Number of transcripts after expression filter; biomarkers no filter, all data > 10,000

410

262

149

16,173

13,688

2724

Number of highly correlated features (transcripts); correlation cutoff > 0.75

177

98

37

12,047

9866

1970

Number of transcripts after removing highly correlated features

234

165

113

4127

3823

755

Time to run (in seconds)

RF

10.77

8.09

6.44

196.25

169.31

32.60

NB

12.34

9.38

6.63

297.81

280.27

46.05

KNN

1.03

1.10

1.11

5.63

5.62

1.78

SVM

2.25

1,07

1.05

7.51

7.48

2.72

NNET

72.37

35.84

20.12

71,044.53

56,114.75

3125.74