Development of a well-defined tool to predict the overall survival in lung cancer patients: an African based cohort

Background Nomogram is a graphic representation containing the expressed factor of the mathematical formula used to define a particular phenomenon. We aim to build and internally validate a nomogram to predict overall survival (OS) in patients diagnosed with lung cancer (LC). Methods We included 1200 LC patients from a single institution registry diagnosed from 2013 to 2021. The independent prognostic factors of LC patients were identified via cox proportional hazard regression analysis. Based on the results of multivariate cox analysis, we constructed the nomogram to predict the OS of LC patients. Results We finally included a total of 1104 LC patients. Age, medical urgency at diagnosis, performance status, radiotherapy, and surgery were identified as prognostic factors, and integrated to build the nomogram. The model performance in predicting prognosis was measured by receiver operating characteristic curve. Calibration plots of 6-, 12-, and 24- months OS showed optimal agreement between observations and model predictions. Conclusion We have developed and validated a unique predictive tool that can offer patients with LC an individual OS prognosis. This useful prognostic model could aid doctors in making decisions and planning therapeutic trials.

tumor size, nodal status, and treatment-related factors, all of which could significantly play a role in individualized prediction survival [3,4].Indeed, because evidence suggests that tumor size and N stage are strongly related to the biological characteristic of the tumor, and because they are based on tumor depth invasion, they remain the most important tumor characteristics, and are therefore considered a robust risk factor for LC survival [5][6][7][8].
Various models have been developed and widely accepted as reliable tools to quantify risks, and predict survival by integrating and using key elements for oncological prognosis [9][10][11].However, dating back to the work of Thomlinson & Gray (1955), the first mathematical model in oncology fields was proposed for the avascular tumor growth of LC by demonstrating that the size of the observed histological pattern is consistent with what would be predicted if oxygen supply were the limiting factor determining the onset of necrosis [12].Based on multifactorial regression analysis, the prognosis of outcome differs from the used approach, and tool as well.In fact, the combination of multiple predictive factors to build and validate an individual prognostic tool such as nomogram, makes the results more reliable [13,14], and accessible in terms of patient prognosis [15].
In this study, we aim to build and validate a comprehensive prognostic evaluation system for LC patients based on multiple clinical and pathological prognostic inputs hoping to provide more reliable predictions.

Patients' selection and data elements
A single-institution registry consisting of 1200 patients has been diagnosed with Lung Cancer between January 2013 and December 2021 from the Medical Oncology Department of the Mohammed VI University Hospital of Marrakesh, Morocco was established.To retrieve all essential data, standardized LC patients' confirmed pathological characteristics including Age at diagnosis, Gender, Tabaco status, Cannabis status, Alcohol, Body Weight, Performance Status; Presence or absence of urgency at diagnosis including Superior Vena Cava Syndrome (SVCS) or Pleurisy syndrome; Comorbidities; Clinic-pathologic data including Pathological T, N, M categories, presence or absence of Liver Metastasis, Adrenal Metastasis, Bone Metastasis, and Brain Metastasis, Stage at diagnosis; EGFR, ALK, PDL-1; treatment-related data including Surgery, Chemotherapy, Radiotherapy; hematological toxicities reported during treatment including Anemia, Neutropenia, Thrombocytopenia; form was established.All patients' follow-up information was extracted from their most recent medical review, which included a clinical examination and/ or a review of computed tomography images.The eighth edition of the American Joint Committee on Cancer TNM classification system was used to determine pathological staging.Age and weight, as continuous variables, were transformed into a categorical variables based on quartiles.Weight is defined as body mass at diagnosis and is reflected by the unit of kilograms (Kg).
We define tobacco consumption as smoking cigarettes, whereas smoking marijuana is the definition of cannabis consumption.Due to the retrospective nature of the study, the exact quantity is not mentioned in all patient medical records, thus we could not define either patient is a heavy or light smoker.
Variables with more than 20% of missing values were excluded.In addition, patients were also excluded from the subsequent analysis if they missing important detail information such date of biopsy or survival date, information on treatments such as radiotherapy, chemotherapy, or surgery.Finally, 1104 eligible identified LC cases were selected for the study.
The main objective element of this paper was OS, which was defined as the interval time between the biopsy day to death without specific cause.

Statistical analysis
All LC patients were randomly assigned (n=730) for training and (n=374) for validation cohorts with a 2:1 ratio.Categorical variables were expressed as percentages.In the training cohort, a univariate cox analysis was performed to determine the variables related to prognosis.Then, the independent prognostic variables related to the OS of LC patients were determined using multivariate cox analysis, where only factors with a p-value less than 0.05 are considered statistically significant and were therefore incorporated to develop the nomogram.
Due to the necessity to test the reliability of the model, four key elements were established to assess the results performance of prediction probabilities for 6, 12, and 24 months.First, a 300 bootstrap resampling method was adopted to internally validate the nomogram.Second, the calibration curve was plotted to compare the consistency of projected clinical responses probability versus actual response proportion, which should be close to 45 degrees.Third, the area under the time-dependent receiver operating characteristic (ROC) was adopted to assess the discrimination.Fourth, the C-Index was used to judge the model's prediction accuracy, given the closer C-index to value 1, the greater precision is [16].
Survival curves for sex, age at diagnosis, medical urgency at diagnosis, PS, radiotherapy, and surgery values were generated using the Kaplan-Meier estimates.The log-rank test was adopted to compare the subgroups of these variables, as reflected by the p-value; the smaller the p-value, the greater the difference.
All statistical analyses to identify the independent prognostic factors and to build the model were performed using R-software version 4.1.3.Available from: http:// www.r-proje ct.org) with "survival", survminer", and "rms" [17] packages.

Patients' characteristics
Based on selected criteria, the 1104 enrolled LC patients' characteristics cases, divided into training (n=730) and validation (n=374) cohorts, are summarized in Table 1.We should note the significant absence of differences among these cohorts.In the training set, the vast majority of patients were male (n=654), diagnosed above 66 years old, and most of them died during treatment.Meanwhile, in terms of tumor characteristics, LC patients were often diagnosed at advanced T4, and N2 stages, M1b and (27.3%) with bone metastasis, followed by brain, adrenal, and liver metastasis at diagnosis.Most of the patients were diagnosed at late stage IVA ( n = 448, 61,4%) and IVB (n = 212, 29%).Moreover, adenocarcinoma was the most appearing histological type (49.1%), and SVCS (5.3%) was the most present urgency at diagnosis.As for treatment, most of patients had not received radiation therapy (86.6%), and surgery (95.2%), but most of them received chemotherapy (57.1%).Regarding hematological toxicities reported during treatment, most patients did not report anemia, neutropenia, or thrombocytopenia with (21.3%), (35.5%), and (39.9%), respectively.

Survival analysis
Figure 1 presents the differences in survival between the subgroups, involving radiotherapy, age at diagnosis, urgency at diagnosis, and surgery.The median OS for the entire cohort was 934 (95% CI: 634, 1176) days.In total, 291 deaths were registered.

Independent prognostic factors
The following variables have been subjected to univariate Cox analysis (UNCA): sex, age, Tabaco smoking, cannabis smoking, alcohol, comorbidities, histology type, T stage, N stage, M stage, liver metastasis, adrenal metastasis, bone metastasis, brain metastasis, medical urgency at diagnosis, PS, weight, chemotherapy, radiotherapy, surgery, anemia, neutropenia, and thrombocytopenia.The results of UVCA showed that age, comorbidities, M stage, brain metastasis, medical urgency at diagnosis, PS, weight, radiotherapy, chemotherapy, surgery, and anemia were prognostic factors for LC patients (Table 2).These UVCA results were subsequently interred in a multivariate Cox analysis (MVCA).Finally, 5 factors were identified as independent prognostic ones including:

Prognostic nomogram for OS
The independent prognostic factors derived from the MVCA were used to build the nomogram to predict the OS for LC patients (Fig. 2).As shown in Fig. 2, performance status and medical urgency at diagnosis have the greatest contribution to prognosis, followed by radiotherapy, and surgery with the same moderate impact on prognosis, while age at diagnosis has the minimal effect on prognosis.Each variable subtype assigned a score on the point scale.We were easily able to draw a straight line down to determine the expected likelihood of survival at each time point by adding up the total score and projecting it onto the total point scale.

Evaluation of nomogram
The ROC plots showed that the AUC of the clinical predictive model for 6-, 12-, and 24-months OS scored 0.97, 0.93, 0.92 in the training set, and 0.91, 0.91, 0.81, in the validation set respectively, demonstrating a better discriminative ability (Fig. 3).Furthermore, the calibration plots for 6-, 12-, and 24-months OS showed an excellent agreement in both, the primary and validation cohorts between observed probabilities and nomogram predicted probabilities (Fig. 4).Stratification into different subgroups demonstrates a distinction between Kaplan-Meier curves for LC patients' prognosis.

Discussion
Due to the heterogeneity related to individual LC patients [18], predicting survival using demographic, clinic, biologic, and pathologic characteristics is imprecise.Several prognostic models have been developed and discussed based on a specific cohort and outcome, but no nomogram has been constructed based on a purely well-defined African cohort.Thus, we sought to establish a convenient predictive model based on 1104 enrolled cases with 5 independent prognostic factors identified by Cox regression analysis to predict 6-, 12-, and 24-months OS of LC patients.The data were extracted and collected manually from the registry of a single public institution.This institution is the only leading public medical center representing   central and southern Morocco, and contains all the standard technical care accepted in the kingdom.We found, in this research, through a subsequent multivariate Cox analysis that age, medical urgency at diagnosis, performance status, surgery, and radiotherapy were the prognostic factors related to progression, which were consistence with previously reported results [19][20][21][22][23][24][25][26][27].The integration of various clinical, pathological, and biological characteristics related to each patient into a mathematical model could be holistic in terms of probability prediction based on the primary outcome [28,29].
Differences in median OS depend on the population studied, the stage diagnosed and the treatments received.In our case, the median OS obtained was 934 (95% CI: 694, 1176) days.Based on German data, Hardtstock et al. [30] found that the median OS of NSCLC patients was 351 days.Meanwhile, David et al. [31] found that the median OS for LC patients who had undergone surgery was 9.1 months and 4.2 months for those who had not.Depending on age group stratification; Wu et al. [32] and Torre et al. [33] proved that patients diagnosed over 60 years of age were more likely to be associated with worse survival, which is somewhat contradictory to our results, as the division of age into categories was based on quartile and not risk group stratification.
We should note that not all LC patients can benefit from surgery [34], but the majority of those who do, have undergone radiotherapy, and chemotherapy [35].Interestingly, however, chemotherapy is not found to be an independent prognostic factor (p = 0.8) indicating its little effect on prognosis.For the past 30 years, and based on natural compounds, chemotherapy has been considered as an essential therapy for appropriate LC patients [36], with no proven benefits when is used alone or in patients with fourth stage of the disease, but it may adduce benefits when used in concomitant with radiotherapy, surgery, [37] -and targeted therapy.Performance status (PS), as a subjective composite to evaluate the patient's wellness, is a key factor reflecting the patient's ability to carry on normal activities.Several previous studies have reported the role of PS as a prognostic signature impacting the survival rate in different age categories [38][39][40][41].Regarding medical urgency associated with late diagnosis of advanced disease, we found that SVCS, as well as pleurisy syndrome, were all associated with poor survival in patients at the different stage categories of the disease.In a retrospective study conducted by Fahem et al, [42] they concluded that SVCS was a predictive factor for mortality in bronchopulmonary cancer in addition to pleurisy syndrome.Furthermore, pleurisy syndrome had also an impact on  Even though literature recognizes the importance of the histological type signature in terms of disease prognostication and impacting survival, [43,44] we did not find any convergence to with the literature when differentiating the disease categories by dividing into epidermoid cancinoma, neuroendocrine carcinoma, adenocarcinoma, adenosquamous carcinoma, and small cell carcinoma, and taking adenocarcinoma as the reference.Based on the IASLC paper, which indicates among all the histological subtypes of LC, adenocarcinoma remains the more favorable prognostic predictor than the other subtypes [45].Furthermore, several studies have found, based on different types of analyses, depending on the objective element of the study, that histology type is an independent prognostic predictor and have therefore been integrated to construct the nomogram [35,46].We decided to exclude both clinical M category, and comorbidity variables from subsequent MVCA because they would have a bad impact on the total assigned model by being biased, even if they were significant in the results and had been declared independent prognostic factors.
We did not add stage at diagnosis into the Cox analysis for the straightforward reason that stage is mirrored by the combination of T, N, and M categories, and when it is included in the analysis, it results in a substantial bias in the model without any relevance due to information redundancy.
To the best of our knowledge, this is the first nomogram for predicting survival for patients diagnosed with LC based on a North African cohort and long-term follow-up, reflecting the characteristics of the African population in terms of disease response and survival.
However, the creation of clinical prediction models is more significant for enhancing patient prognosis when compared to the analysis of independent risk factors.More importantly, all of the indicators used in this study can be acquired and determined clinically.As a result, the model has improved prediction capabilities and increased dependability, making it a useful tool for clinical decision-making, risk assessment, and patient consultation.This scoring system should make it easier for doctors to deal with these problems.Additionally, this tool might offer data for patient categorization in clinical research design, thereby improving comparability between study arms.Compared to the TNM staging system and certain previous prognostic models, we believe the developed nomogram provides more accurate results.
We should note that this study contains certain limitations.First, this tool needs to be externally validated by an African cohort to make sure the prognostic factors are the same across the continent.Second, due to lack of access to emerging technologies, some molecular aberrations such as EGFR mutation, ALK-EML4 fusion, PDL-1, ROS1, mTOR, are not included in the study as they are not routinely requested until the end of thelast year (2021).Third, our model is still limited by the nature of retrospective data and inability to extract convenient parameters such as vascular invasion, perineural invasion, and lymphatic permeation.Fourth, the patient's medical records do not contain information on systemic treatments, including type of surgery and radiation dose.To enhance this model, extra work should be done on prospective data gathering, patient follow-up, expanding the recruitment area, and inclusion of additional variables.

Conclusion
In conclusion, we built a clinical prediction model to determine each LC patient's unique prognosis.With this tool, clinicians can more precisely predict individual patient survival rates, and treatments strategy.We seek to further develop personalized treatment by conducting quantitative analysis of prognostic-related parameters.

Fig. 2
Fig. 2 Nomogram predicting 6-, 12-, and 24-months OS.The total points were calculated by adding the points of each prognostic factor, and correspond to the possibilities of 1-year, 2-year, and 3-year OS of LC patients.Sd = Syndrome, SVCS = Superior Vena Cava Syndrome, OS = overall survival

Fig. 3
Fig. 3 AUROC Curves of training (A-B-C) and Validation (D-E-F) set of the Nomogram for predicting 6-months, 12-months, and 24-months OS

Fig. 4
Fig. 4 Calibration plot for training (A-B-C) and validation data (D-E-F)

Table 1
Demographic, clinic, pathologic characteristics for LC patients in training and validation cohorts

Table 2
Univariate and multivariate cox regression analysis of prognosis for LC patients