Translation and face validity
The authors followed the European Organization for Research and Treatment of Cancer (EORTC) guidelines - Quality of Life Group Translation Procedure; and Guillermin et al. recommendations [18, 19]. Briefly, the original questionnaire was translated into European Portuguese and culturally adapted by two healthcare professionals with English fluency, knowledgeable of the translation purpose. This draft version was translated back into English, by two English translators, and compared to the original questionnaire by the investigators and the original VES-13 authors, to assess comprehension of the applied concepts and wording. No problems were identified at this stage.
Face validity of the translated questionnaire was assessed by six medical oncologists at our GI Cancer Clinic. They were asked to review the original and translated questionnaires and classify each question, according to comprehension and accuracy of the translation, using a numerical rating scale of 10 points (1 - poorly clear, to 10 - completely clear).
Cancer patients admitted at our Comprehensive Cancer Centre age ≥65 years with histologically confirmed GI Cancer, Portuguese fluency, and no history of previous systemic therapy for cancer were eligible for both the pilot and prospective validation cohort. Patients presenting cognitive impairment, confusional syndrome or who were illiterate or foreign individuals were excluded from the pilot study. The pilot study also excluded patients unable to read.
This work has been approved by the ethical committee of the “Instituto Português de Oncologia do Porto” in Portugal, institution where it was developed and all the subjects gave their informed consent.
Pilot study: cultural adaptation and test-retest reliability
The questionnaire was applied by one of the investigators to included consecutive patients (first pilot n = 20, second pilot n = 22) who were asked to rate each question for comprehension using the previously described numerical rating scale of 10 points. Each patient completed the VES-13 questionnaire twice within 1 to 30 days. At this point, a question was to be reviewed if it had a single rating ≤5 (corresponding to reasonably clear), or if any comprehension problem was noted by the interviewer. Concerns regarding question 3f made necessary a second pilot, after questionnaire adaptation.
Prospective cohort study: construct and criterion validity
After completion of the pilot study, the European Portuguese version of VES-13 was prospectively applied to a cohort of 200 patients to assess internal consistency and construct and criterion validity [20–22]. To assess construct validity we selected EQ-5D-5L as comparator . EQ-5D-5L is a generic health related quality of life questionnaire which includes five dimensions and a visual analogue scale (VAS) assessing general health. Each dimension is recorded in five severity levels (no problems, slight, moderate, severe and extreme problems, graded from 1 to 5, respectively). The VAS records an individual’s rating for their current health-related quality of life (ranging from 0 - worst imaginable health, to 100 – best imaginable health). Predefined hypothesis about relationships among dimensions of EQ-5D-5L and VES-13 were tested to assess construct validity. To assess criterion validity we used the clinical impression of a trained medical oncologist, blinded to the responses on VES-13, regarding patient’s vulnerability and performance status (PS). Each medical oncologist was instructed to consider the Eastern Cooperative Oncology Group classification – ECOG  and Karnofsky scale – KPS , and to categorize each patient into the following groups: fit, vulnerable or frail. Performance status was estimated according to exact ECOG PS and KPS scales definition (ECOG PS ranging from 0 –able to carry on all pre-disease performance without restriction, to 5 – dead; KPS ranging from 100 – normal, no complaints, to 0 – dead). Correlations among these criteria and VES-13 were evaluated to assess criterion validity.
Patient’s demographics and clinical characteristics were studied using descriptive statistics as appropriate. Numerical variables were described with means and standard deviation or with medians and interquartile ranges, depending on the asymmetry of their distributions. Categorical variables were described as absolute and relative (percentages) frequencies. Performance status was categorized as follows: ECOG ≤1 and ≥2 and KPS 100–80 and ≤70. Charlson comorbidity index (CCI) was used to estimate the burden of co-morbid conditions. When testing hypothesis about continuous variables, Student’s t-tests were used to compare two groups when normality assumptions were confirmed and Mann–Whitney U tests were used if normality could not be assumed). When testing hypothesis about categorical variables, Chi-square test and Fisher’s exact test were used as appropriate.
The test-retest reliability of the Portuguese version of VES-13 was assessed in the pilot study by calculating the Kappa statistic for each item to assess agreement between test and retest scores . This index takes values between −1 and 1; values near 1 show high test-retest reliability. The categorization by Landis and Koch was used for interpretation of κ values (<0.00 – no agreement; 0.01-0.20 – slight; 0.21-0.40 – fair; 0.41-0.60 – moderate; 0.61-0.80 – substantial and 0.81-1.00 – almost perfect agreement) . Additionally, we calculated the test-retest reliability coefficient (correlation coefficient) for VES-13 total scale score.
Internal consistency of translated VES-13 items was explored by analyzing the inter-item and item-total correlation matrices and calculation of Cronbach’s alpha coefficients. This coefficient ranges from 0 to 1, and larger values indicate higher internal consistency. As recommended by Nunnally and Bernstein, alphas ≥0.70 were considered adequate . An estimation of Cronbach’s alpha if an item were to be deleted from the scale was used to identify which items affected the questionnaire’s internal consistency the most.
Construct and criterion validity was assessed by calculating Spearman’s correlation coefficient between VES-13 and each EQ-5D-5L dimension, clinical judgment and performance status. Interpretation of correlation coefficients was based on the quantitative criteria and qualitative descriptors defined by Cohen  (low correlations for coefficients with absolute value between 0.10 and 0.29; moderate correlations for coefficients between 0.30 and 0.49 and high correlations for coefficients between 0.50 and 1.00).
Exploratory factor analysis for VES 13 European Portuguese version was performed using principal components analysis for factor extraction. The hypothesis of unidimensionality of VES-13 was assessed. Selection of the number of factors to retain took into account Kaiser’s criterion (eigenvalues larger than one); graphical analysis of the Scree-plot; and the total variance explained. If adequate, to improve interpretation of factors, orthogonal varimax rotations were to be applied. The Kaiser-Meyer-Olkin (KMO) measure and the Bartlett’s test of sphericity were assessed.
Finally, we performed a ROC curve analysis, to assess the best cutoff point for VES-13 total score for discrimination of Frail/Vulnerable elders, assuming the attending physician’s clinical judgment as the gold standard. Best cutoff selection criterion was based on the method of minimization of the distance to the left upper corner of the ROC plot, calculated as √(1-Sn)2 + (1-Sp)2.
A prospective cohort of 200 consecutively enrolled senior GI cancer patients (≥65 years), would allow an estimation of the prevalence of vulnerability/frailty with a 95 % confidence level margin of error of 0.07. This sample size would also allow an estimation of validity coefficients (correlation coefficients) larger than 0.20, with 95 % confidence level and 90 % power.
Statistical analysis was performed using the Statistical Package for the Social Sciences Version 20.0 for Windows (SPSS®). Whenever statistical hypothesis testing was used, a significance level of α = 5 % was considered.