Skip to main content
Fig. 1 | BMC Cancer

Fig. 1

From: Reference-free transcriptome signatures for prostate cancer prognosis

Fig. 1

Uniform procedure for signature inference based on k-mer or gene expression. a The discovery matrix is built from normalized k-mer counts or gene expression counts. Samples are labelled by their outcome (risk or relapse) status. Normalization is performed as count per billion for k-mers or count per million for genes. b Features are ranked according to their F1-score computed by cross validation using a Bayes classifier (BC). The top 500 features are retained. c Among the top 500, features are selected using lasso logistic regression combined with stability selection. A logistic regression is tuned on the selected features. d Features from the signature are measured in the count matrix from an independent dataset. e Performance of the signature (selected features + tuned logistic regression) is evaluated using Area Under ROC Curve (AUC) on the validation dataset. To deal with the specificity of k-mer matrices, extra steps A’ and D’ are introduced: a’ the k-mer matrix in converted into a much smaller contig matrix by merging overlapping k-mers with compatible counts. d’ k-mers are extracted from the signature contigs and their counts in the validation matrix are aggregated

Back to article page