Skip to main content

Factors to improve distress and fatigue in Cancer survivorship; further understanding through text analysis of interviews by machine learning



From patient-reported surveys and individual interviews by health care providers, we attempted to identify the significant factors related to the improvement of distress and fatigue for cancer survivors by text analysis with machine learning techniques, as the secondary analysis using the single institute data from the Korean Cancer Survivorship Center Pilot Project.


Surveys and in-depth interviews from 322 cancer survivors were analyzed to identify their needs and concerns. Among the keywords in the surveys, including EQ-VAS, distress, fatigue, pain, insomnia, anxiety, and depression, distress and fatigue were focused. The interview transcripts were analyzed via Korean-based text analysis with machine learning techniques, based on the keywords used in the survey. Words were generated as vectors and similarity scores were calculated by the distance related to the text’s keywords and frequency. The keywords and selected high-ranked ten words for each keyword based on the similarity were then taken to draw a network map.


Most participants were otherwise healthy females younger than 50 years suffering breast cancer who completed treatment less than 6 months ago. As the 1-month follow-up survey’s results, the improved patients were 56.5 and 58.4% in distress and fatigue scores, respectively. For the improvement of distress, dyspepsia (p = 0.006) and initial scores of distress, fatigue, anxiety, and depression (p < 0.001, < 0.001, 0.043, and 0.013, respectively) were significantly related. For the improvement of fatigue, economic state (p = 0.021), needs for rehabilitation (p = 0.035), initial score of fatigue (p < 0.001), any intervention (p = 0.017), and participation in family care program (p = 0.022) were significant. For the text analysis, Stress and Fatigue were placed at the center of the keyword network map, and words were intricately connected. From the regression anlysis combined survey scores and the quantitative variables from the text analysis, participation in family care programs and mention of family-related words were associated with the fatigue improvement (p = 0.033).


Common symptoms and practical issues were related to distress and fatigue in the survey. Through text analysis, however, we realized that the specific issues and their relationship such as family problem were more complicated. Although further research needs to explore the hidden problem in cancer patients, this study was meaningful to use personalized approach such as interviews.

Peer Review reports


In Korea, the number of cancer patients has increased gradually over the years [1]. However, due to the development of therapeutic options, cancer mortality rates have decreased since the early 2000s. Consequently, the number of patients who survive after cancer treatment (cancer survivors) has continuously increased. In Korea, patients diagnosed with cancer between 2013 and 2017 showed a 5-year relative survival rate of 70.4%, which lead to a prevalence of approximately 1.87 million cases at the end of 2017. Although the population of cancer survivors is substantial, the importance of care for cancer survivors is underestimated.

After cancer treatment, the cancer survivors primarily screened for treatment outcomes, including physical complications. However, previous reports described that even after finishing cancer treatment, cancer survivors still experience various issues in physical, mental, and even practical aspects of life [2,3,4,5]. Although distress and fatigue are included in the guidelines for cancer survivorship, many survivors have often reported distress and fatigue regardless of disease progression [6]. In most cancer survivorship studies, analyses are based on simple surveys or reports from patients, which may be insufficient for detailed analysis to suggest solutions. Moreover, although many hospitals and public organizations show an interest in cancer survivorship [7], it has practical limitations on budget and resources to provide appropriate care for cancer survivors individually.

This study aimed to find specific and individualized causes behind cancer survivors’ distress and fatigue in existing survey and interview data of the survivorship program through the newly approached machine learning technique for text analysis of the Korean language.


The manuscript of this study was prepared according to Strengthening the Reporting of Observationally Studies in Epidemiology (STROBE) guidelines [8].

Korean Cancer survivorship pilot project

The Korean Cancer Survivorship Pilot Project was launched in 2017 across Regional Cancer Centers in Korea. Adult cancer survivors who completed active cancer treatment such as surgery, radiotherapy, or chemotherapy, excluding palliative care, were recruited for a prospective and observational cohort study approved by IRB (AJIRB-MED-MDB-17-008). One of the regional cancer centers covering Kyung-gi province near Seoul, our institution joined the pilot project as one of the participating regional cancer centers.

After signing informed consent, enrolled cancer survivors filled out the surveys on quality of life (Supplementary File 1). Once each survivor’s survey answers were reviewed, an experienced nurse conducted individual interviews for about an hour where each patient was asked further detailed questions about subjects listed in the Survivorship Assessment (NCCN Guidelines Version 2016) [9]. Furthermore, the nurse interviewed the patients more specifically focused on the issues with at least the three worst scores in the surveys. During the interviews, the nurse transcribed the consultation details as a text form, which was used to identify each patient’s issues and ultimately match an appropriate program for each issue. Categories and process of cancer survivor support programs provided at the center and patients’ participation records are shown in Table 1 and Fig. 1. The center’s programs cover a wide range of topics such as nutrition, exercise, meditation, family care, and art therapy. The second round of the survey for evaluating the change of the scores was performed when the patients visited the hospital 1 month after the enrollment. If the patients could not visit the center at the time, the nurse tried the survey by the phone calls.

Table 1 The provided cancer survivorship programs and the number of participants (N = 322)
Fig. 1
figure 1

Flow chart of the process in the cancer survivorship program

Sample size

A total of 561 patients initially signed up at our center between May 2018 and July 2019 to participate in the Korean Cancer Survivorship Pilot Project. The analysis excluded surveys filled out by ten patients as they did not thoroughly answer their questionnaires. Among those 551 patients, 322 patients completed the second round of surveys a month after the initial survey. To track changes in survey scores, we used survey sets from those 322 patients instead of those of 551. On the other hand, we used all of the individual interviews conducted on 551 patients for text analysis to ensure that machine learning had enough data to draw conclusions. Separate IRB approval for further analysis, including the text analysis, was obtained.

Measurements and outcomes

The survey questions were designed to assess patients’ distress, fatigue, pain, insomnia, anxiety, depression based on the following tools: the National Comprehensive Cancer Network (NCCN) distress thermometer, a Korean version of the problem list and a five dimensional tool for visual analogue scales of anxiety, depression, fatigue, pain, and outcome domain of need for help [10, 11]. The score systems were scaled from 0 (representing not at all or in the best condition) to 10 (in the worst condition). Also, EuroQol Visual Analogue Scale (EQ-VAS) was measured, ranging from 0 to 100 [10,11,12]. These survey measurements were developed and validated in previous Korean studies [10,11,12]. The survey also asked for general information about survivorship such as disease status, cancer treatment, socioeconomic condition, lifestyle behaviors and their physical, psychological, and practical needs, and problems. “High distress” was defined with a score 4 or higher, commonly considered severe distress [13].

This study’s primary endpoint was the improvement in distress and fatigue scores at the time of the one-month follow-up. We grouped and compared the patients based on the improvement of the scores; the “improved” group indicated the patients had a higher score at follow-up visits, and the “not improved” group was defined by a follow-up score that was the same as or less than the initial score.

Text analysis

Machine learning-based text analysis was conducted to identify the survivors’ hidden needs that the survey might not have captured. The text analysis process used Word2Vec, which is an advanced version of the neural network language model (NNLM) that provides a word embedding function by turning words into vectors [14]. Word2Vec uses previously mentioned words or context to predict ensuing words and content. Recognizing that Korean words are not separated by spaces like in English, the part of speech (POS) tagging was used to distinguish each individual word. The study specifically used the Python (version 3.4) package genism model from the Word2vec method and Konlpy for the POS tagging of Korean [15]. Once words are tokenized, we extracted nouns exclusively, which was followed by the post-processing stage extracted list was reviewed to add or delete additional words to the library. Afterwards, the nouns were extracted in the order of the keywords by using the Most_similar function of the gensim model. The dimensionality of the word vector was 300, the minimum frequency of appearance was 50, and the window size, which means the number of surrounding words, was 8. After the extraction of words, the top10 ranked words were selected according to the relative similarity scores of each keyword including Health, Stress, Fatigue, Pain. Insomnia, Anxiety, and Depression. To quantify the result, mentioning the selected words were checked in the text of the interview for each patient. The counts for 10 words related with each keyword such as Stress and Fatigue, by frequency of mentions, were included in the statistical analysis as well as the survey results.

A network map was made with nodes and edges to visualize the relationship among the words. The nodes meant each word and their diameters reflected the frequency of mentions in the interviews, and the weights of the edges showed the relative similarities between two words. As the most basic centrality analysis measure, the degree centrality was defined as the count of the number of unique edges connected to the node [16, 17]. This technique was used to determine the network’s central node by measuring the degree of the edge between the nodes constituting the network and other nodes directly connected to the network. In this study, the word with the highest degree centrality score implies the frequency along with other words during the interview. Thus, significant keywords can be identified through degree centrality analysis.

Statistical analysis

Appropriate tests were selected for comparison of characteristics between the groups of high and low distress levels and univariable analysis for the improvement of the scores; Fisher’s exact test for discrete variables, and Mann Whitney U-test for continuous variables. For the comparison between the groups based on the improvement of scores, multivariable analysis was performed using the logistic regression including the variables at p < 0.1 from the univariable analysis. Values were considered statistically significant when p < 0.05. Statistical analyses were performed using R 3.6.2 (R Development Core Team, Vienna, Austria,


Patient characteristics

Table 2 displays the characteristics for the 322 patients who performed the 1-month follow-up survey. The majority of the patients were young (< 50 years, 81.1%), female (79.5%), less than 6 months from the end of treatment (70.5%), diagnosed with breast cancer (86.0%), early stages (75.2%), and previously healthy patients without comorbidities (74.8%). Among these patients, almost 83% of the patients participated in the cancer survivor support programs or clinics (n = 266, 82.6%) as shown in Supplementary Table 1 (Supplementary File 2). Initial and follow-up mean scores for EQ-VAS, distress, fatigue, pain, insomnia, anxiety, and depression are shown at the bottom of Table 2. All scores showed improvement after 1 month. Improvement at the 1 month follow-up was noted in about 50% of the patients in all categories. In the initial survey (Supplementary Table 1 in Supplementary File 2), participants’ most frequently experienced physical problems were in exercise (33.2%), followed by difficulties in memory or concentration (25.5%), and in nutrition (24.5%). Among emotional problems, more than 30% of the patients expressed difficulties with sleep and worry. Practical problems were relatively less common than physical or emotional problems. As for needs, 50.9 and 44.7% of the patients sought help in nutrition and exercise, respectively, which was higher than the percentage of patients actually expressing physical problems with nutrition and exercise (24.5 and 33.2%, respectively). However, for sleep or emotional problems, only a part of the patients experiencing those problems expressed the need for help.

Table 2 Characteristics of the cancer survivors

We checked the different characteristics between patients with high and low distress (Table 2). Compared to low distress patients (less than 4), patients with higher distress (4 or higher) were younger (p = 0.043), more previous counseling history (p = 0.012), and less family caregiver (p < 0.001). Though the one-month follow-up scores of the initially high distress group were still worse, they showed a greater improvement rate (the number of improved patients in all patients) in distress (84.9% versus 33.0%, p < 0.001), anxiety (68.5% versus 49.9%, p < 0.001), and depression (71.2% versus 34.7%, p < 0.001) at the one-month follow-up than the initially low distress patients.

Text analysis for individual interviews

The top 10 ranked vocabularies were selected by correlation with each keyword, Health, Stress, Fatigue, Pain, Insomnia, Anxiety, and Depression, from the text analysis (Fig. 2). Even after finishing cancer treatment, many of the words correlating with the keywords that the patients used during the interviews were therapeutic-related. Interestingly, family-related words such as family, child, and husband showed the highest correlation with the keyword Stress. Family-related words also correlated with Fatigue, Anxiety, and Depression. Emotional words were mostly related to Insomnia and Depression. In comparison, physical words were frequently associated with Fatigue.

Fig. 2
figure 2

The keywords (blue nodes) and the top 10 ranked correlated words (gray nodes) from the text analysis with the individual interviews; Gray nodes with yellow boundary, family-related words; and the numbers, the ranks of words

During the interviews, the mean counts for mentioning the ten words related to Stress and Fatigue were 2.4 and 2.8, respectively. The counts were compared according to the improvement of distress and fatigue scores, and both were significantly different between the improved group and not-improved group (p < 0.001 and 0.036, respectively, Tables 3 and 4).

Table 3 Univariable analyses and multivariable logistic regression for improvement of distress
Table 4 Univariable analyses and multivariable logistic regression for improvement of fatigue

The keyword network map for the seven keywords and the extracted words showed multiple rings linked with each other or spherical form in the center of the network and some branches (Fig. 3). It implied that the words had a very complex relationship. In the center, the keywords Fatigue and Stress were positioned, which were the main outcomes of this study. The degree centrality of the seven keywords was higher than other words (Supplementary File 2). Among all the words on the keyword network map, Fatigue and Stress recorded the highest centrality score, implying that these two keywords frequently mentioned and tightly connected with other words when the patients described their conditions during the interviews.

Fig. 3
figure 3

Networks among the keywords (blue nodes) and their related words (gray nodes, except yellow family-related words) from the text analysis with individual interviews; the size of each circle (node) was the frequency of the word mentioned in the interviews, and the length or weight of line (edge) between two circles were their relative correlation

Analysis of the scores

We performed uni- and multi-variable analysis with the characteristics, survey data, and the result of the text analysis. The significant factors of improvement in distress by the analyses are shown in Table 3. Multivariable analysis using logistic regression showed that dyspepsia (odds ratio (OR) 3.192, p = 0.006), the initial scores of distress (OR 2.777, p < 0.001), and anxiety (OR 1.253, p = 0.043) had a significantly positive relationship with the improvement in distress, and the initial score of fatigue (OR 0.681, p < 0.001), and depression (OR 0.753, p < 0.013) were negatively related to the improvement in distress.

Table 4 shows the results for the improvement in fatigue by the uni- and multivariable analysis. After adjusting other factors, initially higher scores of fatigue (OR 1.517, p < 0.001), participation in family care programs (OR 3.895, p = 0.022), and mention of family-related words during the interview (OR 1.983, p = 0.033) were significantly related to the improvement of fatigue. On the other hand, practical problems with economy or finance (OR 0.409, p = 0.021), need for rehabilitation (OR 0.440, p = 0.035), and participation in any clinic or other program in the center (OR 0.491, p = 0.017) were unfavorable factors for the improvement in fatigue.


Cancer survivors not only face cancer-related problems but also numerous physical, mental, and/or practical problems and challenges during and after cancer treatment. Subsequently, the needs of cancer survivors are frequently unmet during life after treatment. Although previous studies have addressed and tried to improve these issues, no effective solutions have been reported [18,19,20]. This could be due to the complexity of the problems that survivors experience and the various background and histories of individuals. Thus, the solutions from the previous studies using simple survey data may be insufficient to apply to a real-world situation. In this study, we used the latest technique, keyword network mapping using deep learning, and the conventional statistical analysis to better understand each individual cancer survivor with the survey and text review from the individual interviews. The main endpoints of this study were distress and fatigue, which are generally the most common chief complaints that in clinic patients express. Interestingly, distress and fatigue were also the most essential words in the keyword network map with the highest centrality scores. Statistical analysis showed that patients who participated in the family care program and those who used family-related words during the interview were significantly associated with fatigue improvement.

Distress, one of the main endpoints in this study, is recommended as one of the main indicators for managing cancer survivors in the NCCN guidelines [6, 13]. High emotional distress has been reported with a wide range from 9 to 50% [21,22,23], and our result was comparable. Since many patients were enrolled directly after finishing treatment, it may be seen that the psychological stress was mainly affected by the cancer treatment [24,25,26]. Additionally, the patients tended to focus on more physical or practical issues, not mental problems or stress issues, and high distress patients were younger and stayed without a family caregiver in our study. Though distress can be an overall indicator, its score may not show the detailed individual factors and its relationships. Furthermore, a recent report suggested that it was irrational to identify the cancer survivors’ condition by distress level alone [27]. An interesting point in our study was that dyspepsia was one of the factors related to distress; it may have been due to the residual side effects after radiotherapy to left-sided breast or systemic therapy. Alternatively, the possibility of somatization disorder can also be considered. Although it is not currently used as a psychiatric diagnosis, one of the previous diagnoses in DSM-IV names called “Hwabyung” was used as a characteristic psychosis code in Koreans [28]. The diagnosis of Hwabyeong included ambiguous physical symptoms such as dyspepsia while showing anxiety or depression [28]. The cohort of this study included many middle-aged Korean housewives who were susceptible to Hwabyeong. In Korea, many Koreans who have less tendency to express themselves psychiatry-wise reflect their issues by the vague physical symptoms.

Fatigue associated with cancer or cancer treatment, another main outcome of this study, requires some time to be recovered after treatment [3, 29]. However, long-term fatigue in cancer survivors is also majorly affected by factors other than cancer-related factors [30]. Many studies reported little association between persistent fatigue in cancer survivors and cancer treatment [31]. Thus, fatigue that persists after treatment may have different individual causes in aspects of lifestyles, practical situations, or even personalities [32]. Similarly, it is understandable that fatigue might not be improved if there were practical difficulties such as financial problems or lack of transportation [33]. For this reason, it is imperative that the cancer survivorship care team considers other various social or medical conditions of the patient such as physical recovery, moving distance, or family situation. Correspondingly, some reports suggest that individual life care in the community where the patients live or even private tele-cares through telephone or the Internet may be needed for effective personal supports [34, 35].

In our study, like distress, the improvement of fatigue was positively related to few factors. Interestingly, the participation of any programs or clinics was not a good factor for fatigue improvement. We believe that it was not a worse factor, but most of the patients were enrolled just after the end of treatments, and some patients needed more time to rest. Although practical problems of economy or finance and the need for rehabilitation were negatively associated with improvement in fatigue, participating in the family care program was a positive factor for fatigue improvement. The family care program was not education for the family members as caregivers, as family members of cancer survivors. The program’s purpose was to help the patients adapt to their role in the family after cancer treatment. In this study, most of the patients were diagnosed with breast cancer, and characteristics of the breast cancer population in Korea is that the median age is younger compared to the Western population, with many breast cancer patients younger than 50 years old. Accordingly, our results may reflect the situations of many young female cancer patients in which they need to maintain their lives as the caregivers for other family members such as their babies, husband, or older parents, rather than expressing their need to their family and society [36]. Even after finishing cancer treatment, their expected role and performance in the family may be the same as before their diagnosis. This could stem from the cultural notion of the close and united relationships as a family being one of the most critical values rather than a separate individual in Korea. Therefore, it is important to note that some cancer survivors need thoughtful considerations and programs for their role in their family and community.

Our study’s uniqueness is that we used a new approach for analysis through text analysis. At the time of registration, each participant was interviewed individually by a nurse, trained for this project. Because time or budget limitations, few other centers could not interview all cancer survivors in depth. Even if the interviews were performed in our institution, it was a challenge to find an appropriate method to analyze the text data. Recently in Korea, analysis techniques for natural languages or free text analysis were developed [15], and have been used widely in non-medical parts such as marketing. However, the studies in medicine related to text analysis are scarce, and no significant results have yet been reported [37]. Although many researchers have tried to work with the electric health records (EMR) or similar systems in the hospitals, limitations existed due to the rigid structure of EMR and its information security issue. Our dataset included simple and large sets of interview results in text form as well as the survey form from the project, and we were able to perform text analysis without considering the limitations arising from the structure of EMR or security issues.

In the survey, patients were asked to answer questionnaires for each category of distress, fatigue, pain, insomnia, anxiety, depression, and EQ-VAS. However, there were very complex links in the text analysis that were difficult to interpret among the keywords in the interviews. Using the keyword network map, we showed the relationship among the words, which was previously hard to show using the survey results. In the keyword network map, each keyword showed a different frequency and centrality, and the keywords at the center with the highest centrality were fatigue and distress. Additional text analysis showing the top 10 words correlated with the keywords (health, stress, fatigue, pain, insomnia, anxiety and depression) (Fig. 2) indicated that there were still many treatment-related statements even after finishing treatment. Considering that the patients were enrolled directly after finishing the treatment, it seemed that the patients were still affected by cancer at the survey and interview time [36]. In severe cases, some patients may be affected by cancer, like a trauma that may last for a long time and need psychiatric interventions [38]. Therefore, further emphasis of post-treatment care and support for cancer survivors is required. Another finding in this additional text analysis was that the family-related vocabularies such as children and husband were important words related to stress, anxiety, and depression. This may be due to the concerns that the patients feel responsible about taking care of other family members. In addition, multivariate analysis showed that family-related vocabulary and participation in the family care program were related to fatigue improvement. Although only 15% of the patients expressed difficulties for child rearing in the initial survey, which may lead to no statistical significance, text analysis was able to catch the importance of issues related to family for cancer survivors. Thus, it should be noted that individual interviews are important to grasp the complex issues of that are not comprehensible through simple survey. To provide practical support to cancer survivors, a simple follow-up with indicators such as distress level or short clinic sessions may not be sufficient. Therefore, our institution has been carrying out surveys and rather lengthy interviews whenever possible and when necessary to help more practical problems of each patient. As this study confirmed that the family care program helped to improve fatigue, we will continue to take further individual actions.

There were some limitations in this study. First, the population of the study had a selection bias. We used the existing data with the registered patients in the project, making it impossible to compare to other groups such as non-registered patients or other specific disease sites. Additionally, due to the limitations in human resources and budget, individual interviews for all patients in real-world situation might be difficult. In the future, personalized care in our cancer survivorship center will be established with criteria for an effective operation to fully support those who are in need. Finally, the deep learning techniques and analytic methods for the text analysis were relatively limited due to the small number of patients.


Our cancer survivorship project showed favorable results in most aspects, including distress and fatigue. Particularly, considering the large portion of young breast cancer survivors and Korean culture, the participation in the family care programs was effective for improving fatigue. Moreover, text analysis with deep learning to use the individual interviews showed that the mention of family-related words during interviews was associated with fatigue improvement. It implied that the interviews had positive effects on the patients to participate in the family care program, which consequently lead to improvement in fatigue. This study will help health professionals provide more effective and personalized approaches to cancer survivorship.

Availability of data and materials

Sharing the datasets generated and/or analyzed in the current study is not available due to patient-identifiable privacy information.



National Comprehensive Cancer Network


EuroQol Visual Analogue Scale


Neural network language model


Part of speech


Odds ratio


Electric health records


  1. Hong S, Won YJ, Park YR, Jung KW, Kong HJ, Lee ES. Cancer statistics in Korea: incidence, mortality, survival, and prevalence in 2017. Cancer Res Treat. 2020;52(2):335–50.

  2. Fong DY, Ho JW, Hui BP, Lee AM, Macfarlane DJ, Leung SS, et al. Physical activity for cancer survivors: meta-analysis of randomised controlled trials. BMJ. 2012;344(jan30 5):e70.

    Article  PubMed  PubMed Central  Google Scholar 

  3. Minton O, Berger A, Barsevick A, Cramp F, Goedendorp M, Mitchell SA, et al. Cancer-related fatigue and its impact on functioning. Cancer. 2013;119(Suppl 11):2124–30.

    Article  PubMed  Google Scholar 

  4. Osborn RL, Demoncada AC, Feuerstein M. Psychosocial interventions for depression, anxiety, and quality of life in cancer survivors: meta-analyses. Int J Psychiatry Med. 2006;36(1):13–34.

    Article  PubMed  Google Scholar 

  5. Stepanski EJ, Walker MS, Schwartzberg LS, Blakely LJ, Ong JC, Houts AC. The relation of trouble sleeping, depressed mood, pain, and fatigue in patients with cancer. J Clin Sleep Med. 2009;5(2):132–6.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Denlinger CS, Sanft T, Baker KS, Broderick G, Demark-Wahnefried W, Friedman DL, et al. Survivorship, version 2.2018, NCCN clinical practice guidelines in oncology. J Natl Compr Cancer Netw. 2018;16(10):1216–47.

    Article  CAS  Google Scholar 

  7. Lee JE, Shin DW, Cho BL. The current status of cancer survivorship care and a consideration of appropriate care model in Korea. Korean J Clin Oncol. 2014;10(2):58–62.

    Article  Google Scholar 

  8. Von Elm E, Altman DG, Egger M, Pocock SJ, Gotzsche PC, Vandenbroucke JP, et al. The strenthening the reporting of observational studies in epidemiology (STROBE) statement: guidelines for reporting observational studies. Plos Med. 2007;14(10):e296.

    Article  Google Scholar 

  9. Denlinger CS, Ligibel JA, Are M, Baker KS, Broderick G, Demark-Wahnefried W, et al. NCCN guidelines insights: survivorship, version 1.2016. J Natl Compr Cancer Netw. 2016;14(6):715–24.

    Article  Google Scholar 

  10. Shim EJ, Hahm BJ, Yu ES, Kim HK, Cho SJ, Chang SM, et al. Development and validation of the National Cancer Center psychological symptom inventory. Psychooncology. 2017;26(7):1036–43.

    Article  PubMed  Google Scholar 

  11. Shim EJ, Shin YW, Jeon HJ, Hahm BJ. Distress and its correlates in Korean cancer patients: pilot use of the distress thermometer and the problem list. Psychooncology. 2008;17(6):548–55.

    Article  PubMed  Google Scholar 

  12. Kim S, Won CW, Kim BS, Kim S, Yoo J, Byun S, et al. EuroQol visual analogue scale (EQ-VAS) as a predicting tool for frailty in older Korean adults: the Korean frailty an aging cohort study (KFACS). J Nutr Health Aging. 2018;22(10):1275–80.

    Article  CAS  PubMed  Google Scholar 

  13. Riba MB, Donovan KA, Andersen B, Braun I, Breitbart WS, Brewer BW, et al. Distress management, version 3.2019, NCCN clinical practice guidelines in oncology. J Natl Compr Cancer Netw. 2019;17(10):1229–49.

    Article  Google Scholar 

  14. Mikolov T, Chen K, Corrado G, Dean J: Efficient estimation of word representations in vector space. arXiv preprint arXiv:13013781 2013.

    Google Scholar 

  15. Park EL, Cho S. KoNLPy: Korean natural language processing in Python. In: Proceedings of the 26th Annual Conference on Human & Cognitive Language Technology: 2014; 2014. p. 133–6.

    Google Scholar 

  16. Elbirt B. The nature of networks: a structural census of degree centrality across multiple network sizes and edge densities. M.A diss. Buffalo: State University of New York; 2007.

  17. Opsahl T, Agneessens F, Skvoretz J. Node centrality in weighted networks: generalizing degree and shortest paths. Soc Networks. 2010;32(3):245–51.

    Article  Google Scholar 

  18. Mewes JC, Steuten LM, Ijzerman MJ, van Harten WH. Effectiveness of multidimensional cancer survivor rehabilitation and cost-effectiveness of cancer rehabilitation in general: a systematic review. Oncologist. 2012;17(12):1581–93.

    Article  PubMed  PubMed Central  Google Scholar 

  19. Grunfeld E, Julian JA, Pond G, Maunsell E, Coyle D, Folkes A, et al. Evaluating survivorship care plans: results of a randomized, clinical trial of patients with breast cancer. J Clin Oncol. 2011;29(36):4755–62.

    Article  PubMed  Google Scholar 

  20. Hershman DL, Greenlee H, Awad D, Kalinsky K, Maurer M, Kranwinkel G, et al. Randomized controlled trial of a clinic-based survivorship intervention following adjuvant therapy in breast cancer survivors. Breast Cancer Res Treat. 2013;138(3):795–806.

    Article  CAS  PubMed  Google Scholar 

  21. Dabrowski M, Boucher K, Ward JH, Lovell MM, Sandre A, Bloch J, et al. Clinical experience with the NCCN distress thermometer in breast cancer patients. J Natl Compr Cancer Netw. 2007;5(1):104–11.

    Article  Google Scholar 

  22. Trask PC, Paterson A, Riba M, Brines B, Griffith K, Parker P, et al. Assessment of psychological distress in prospective bone marrow transplant patients. Bone Marrow Transplant. 2002;29(11):917–25.

    Article  CAS  PubMed  Google Scholar 

  23. Strong V, Waters R, Hibberd C, Rush R, Cargill A, Storey D, et al. Emotional distress in cancer patients: the Edinburgh Cancer Centre symptom study. Br J Cancer. 2007;96(6):868–74.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Banks E, Byles JE, Gibson RE, Rodgers B, Latz IK, Robinson IA, et al. Is psychological distress in people living with cancer related to the fact of diagnosis, current treatment or level of disability? Findings from a large Australian study. Med J Aust. 2010;193(S5):S62–7.

    Article  PubMed  Google Scholar 

  25. Crist JV, Grunfeld EA. Factors reported to influence fear of recurrence in cancer patients: a systematic review. Psychooncology. 2013;22(5):978–86.

    Article  PubMed  Google Scholar 

  26. Ebede CC, Jang Y, Escalante CP. Cancer-related fatigue in Cancer survivorship. Med Clin North Am. 2017;101(6):1085–97.

    Article  PubMed  Google Scholar 

  27. Jewett PI, Teoh D, Petzel S, Lee H, Messelt A, Kendall J, et al. Cancer-related distress: revisiting the utility of the national comprehensive cancer network distress thermometer problem list in women with gynecologic cancers. JCO Oncol Pract. 2020;16(8):e649–59.

  28. Pang KY. Hwabyung: the construction of a Korean popular illness among Korean elderly immigrant women in the United States. Cult Med Psychiatry. 1990;14(4):495–512.

    Article  CAS  PubMed  Google Scholar 

  29. Thong MSY, van Noorden CJF, Steindorf K, Arndt V. Cancer-related fatigue: causes and current treatment options. Curr Treat Options in Oncol. 2020;21(2):17.

    Article  Google Scholar 

  30. Ryan JL, Carroll JK, Ryan EP, Mustian KM, Fiscella K, Morrow GR. Mechanisms of cancer-related fatigue. Oncologist. 2007;12(Supple 1):22–34.

    Article  CAS  Google Scholar 

  31. Prue G, Rankin J, Allen J, Gracey J, Cramp F. Cancer-related fatigue: A critical appraisal. Eur J Cancer. 2006;42(7):846–63.

    Article  CAS  PubMed  Google Scholar 

  32. Wang SH, He GP, Jiang PL, Tang LL, Feng XM, Zeng C, et al. Relationship between cancer-related fatigue and personality in patients with breast cancer after chemotherapy. Psychooncology. 2013;22(10):2386–90.

    Article  PubMed  Google Scholar 

  33. Servaes P, Gielissen MF, Verhagen S, Bleijenberg G. The course of severe fatigue in disease-free breast cancer patients: a longitudinal study. Psychooncology. 2007;16(9):787–95.

    Article  CAS  PubMed  Google Scholar 

  34. Garrett K, Okuyama S, Jones W, Barnes D, Tran Z, Spencer L, et al. Bridging the transition from cancer patient to survivor: pilot study results of the Cancer survivor telephone education and personal support (C-STEPS) program. Patient Educ Couns. 2013;92(2):266–72.

    Article  PubMed  PubMed Central  Google Scholar 

  35. Portier K, Greer GE, Rokach L, Ofek N, Wang Y, Biyani P, et al. Understanding topics and sentiment in an online cancer survivor community. J Natl Cancer Inst Monogr. 2013;2013(47):195–8.

    Article  PubMed  Google Scholar 

  36. Im EO, Lee EO, Park YS. Korean women's breast cancer experience. West J Nurs Res. 2002;24(7):751–65; discussion 766-771.

    Article  PubMed  Google Scholar 

  37. Kim DS, Park AH, Kang NJ. An analysis of cancer survival narratives using computerized text analysis program. J Korean Acad Nurs. 2014;44(3):328–38.

    Article  PubMed  Google Scholar 

  38. Carreira H, Williams R, Müller M, Harewood R, Stanway S, Bhaskaran K. Associations between breast Cancer survivorship and adverse mental health outcomes: a systematic review. JNCI. 2018;110(12):1311–27.

    Article  PubMed  PubMed Central  Google Scholar 

Download references


Not applicable.



Author information

Authors and Affiliations



KY, JK, MC, and MA designed the concept of this study. EC, JP, and MJ collected the datasets. KY and JK analyzed the data and wrote the manuscript. MC revised the manuscript. All authors have read and approved the manuscript. KY and JK contributed equally to this work as the first authors. Furthermore, KY worked for Ajou University School of Medicine with other authors as a clinical fellow.

Corresponding author

Correspondence to Mison Chun.

Ethics declarations

Ethics approval and consent to participate

The datasets for the patients of our institute previously registered in the Korean Cancer Survivorship Center Pilot Project with IRB approval (AJIRB-MED-MDB-17-008) was used in this study, in which the written informed consents for all participants were obtained. Additionally, separate IRB approval for text analysis using repository data was obtained for this study (AJIRB-MED-MDB-20-008).

Consent for publication

Not applicable.

Competing interests


Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1. 

The questions for the survey.

Additional file 2: Supplementary Table 1

. Survey results (N = 332). Supplementary Table 2. Degree centrality in the network with the extracted words.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, K., Kim, J., Chun, M. et al. Factors to improve distress and fatigue in Cancer survivorship; further understanding through text analysis of interviews by machine learning. BMC Cancer 21, 741 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: