Understanding the patient journey to diagnosis of lung cancer

This research describes the clinical pathway and characteristics of two cohorts of patients. The first cohort consists of patients with a confirmed diagnosis of lung cancer while the second consists of patients with a solitary pulmonary nodule (SPN) and no evidence of lung cancer. Linked data from an electronic medical record and the Louisiana Tumor Registry were used in this investigation. REACHnet is one of 9 clinical research networks (CRNs) in PCORnet®, the National Patient-Centered Clinical Research Network and includes electronic health records for over 8 million patients from multiple partner health systems. Data from Ochsner Health System and Tulane Medical Center were linked to Louisiana Tumor Registry (LTR), a statewide population-based cancer registry, for analysis of patient’s clinical pathways between July 2013 and 2017. Patient characteristics and health services utilization rates by cancer stage were reported as frequency distributions. The Kaplan-Meier product limit method was used to estimate the time from index date to diagnosis by stage in lung cancer cohort. A total of 30,559 potentially eligible patients were identified and 2929 (9.58%) had primary lung cancer. Of these, 1496 (51.1%) were documented in LTR and their clinical pathway to diagnosis was further studied. Time to diagnosis varied significantly by cancer stage. A total of 24,140 patients with an SPN were identified in REACHnet and 15,978 (66.6%) had documented follow up care for 1 year. 1612 (10%) had no evidence of any work up for their SPN. The remaining 14,366 had some evidence of follow up, primarily office visits and additional chest imaging. In both cohorts multiple biopsies were evident in the clinical pathway. Despite clinical workup, 70% of patients in the lung cancer cohort had stage III or IV disease. In the SPN cohort, only 66% were identified as receiving a diagnostic work-up.

Routine screening for lung cancer has not yet been broadly adopted despite evidence that screening high risk patients (i.e., 30 pack years of cigarette smoking) with the use of low-dose CT reduces mortality from lung cancer [3,4]. Engaging primary care clinicians and garnering support from payors to ensure routine lung cancer screening adoption will have a significant impact on overall patient survival [5]. Currently, most lung cancer (84%) is diagnosed with regional or distant extension of the disease, limiting treatment options and reducing relative survival [2].
Patients with lung cancer may follow a variety of routes within the health-care system. Traditionally, consultation and management have been achieved with referrals to specialists occurring in a sequential fashion. Symptoms prompt a physician visit, followed by a referral to a specialist culminating in a diagnosis and treatment. This process may be slow, and delays are common. Asymptomatic patients with a solitary pulmonary nodule may suffer even longer delays as the guidelines recommend watchful waiting rather than more aggressive diagnostic procedures especially for those nodules < 8 mm [6]. Previous studies of diagnostic delay in lung cancer (i.e., abnormal chest imaging to diagnosis) have shown a median delay between 25 and 53 days [7][8][9] but delays have been shown to be skewed with some patients having much longer delays [9,10]. Currently, lung cancer diagnosis and staging have increased in complexity due to an expanding number of options such as PET imaging, endobronchial ultrasound, as well as navigational and robotic bronchoscopy. These tools allow for more extensive tissue sampling and accurate diagnosis and staging. Patients with early-stage disease are often eligible for potentially curative therapy with minimally invasive surgery and advanced radiotherapy techniques. For patients with more advanced disease, molecular profiling for tumor mutations may guide the use of targeted therapies such as ALK, BRAF, ROS1, RET, NTRK, MET (with EGFR inhibitors) and anti PD-1/PD-L1 immune checkpoint inhibitors [11]. Such molecular analysis may require additional tissue sampling or sample analysis at a remote specialty lab. These factors, among others in the patient journey, indicate there is no singular pathway to a diagnosis of lung cancer and add to the variability of timely diagnosis [12][13][14][15].
We conducted a retrospective cohort study using data from the electronic health records of Tulane Medical Center and Ochsner Health System linked to the Louisiana Tumor Registry to document the patient pathway to lung cancer diagnosis and report the pattern of procedures and physician visits. For patients with lung nodules ultimately diagnosed as malignant we report the median time to diagnosis. For patients with lung nodules diagnosed as benign, we report on diagnostic lung procedures performed during the subsequent 1-year from nodule identification.

Methods
This is a retrospective cohort study, using linked data information systems, designed to document the patient pathway from (a) identification of a pulmonary nodule to lung cancer diagnosis, and (b) identification of a pulmonary nodule to 1-year follow-up among those with no lung cancer diagnosis.

Datasets
The Louisiana Tumor Registry (LTR) is a populationbased cancer registry that is authorized by law to collect data on cancer diagnosis, treatment, and survival from all healthcare facilities and providers. The LTR is a participant of the NCI's Surveillance, Epidemiology, and End Result Program and the CDC's National Program of Cancer Registries. The LTR's data have met the data completeness, timeliness, and quality standards for both SEER and NPCR programs and the North American Association of Central Cancer Registries [16].
The Research Action for Health Network (REACHnet) is a partnership of health systems, academic centers, and public health organizations that constitute an innovative data network for conducting multi-site research with enhanced efficiency in real-world healthcare delivery systems. With national and local collaborators, REACHnet implements research that addresses healthcare questions of critical importance to patients and clinicians and contributes to the evidence base that will inform more effective healthcare decision-making and improve population health. The network includes clinical records for more than 8 million patients across multiple partner health systems located in Louisiana, Texas, and California. For the current study, data were obtained through REACHnet (https://reachnet.org/) for qualifying patients receiving care at Ochsner Health System and Tulane Medical Center in Louisiana.

Data linkage
Data linkage between these two information systems was performed using a Global Patient Identification (GPID), provided by REACHnet. A software application, Distributed Common Identity for the Integration of Regional Health Data (DCIFIRHD), was used to perform secure, cross-site aggregation and linkage of patient records between LTR and REACHnet. Patient identification information (e.g., first name, last name, date of birth, SSN, gender), was hashed in various combinations with a sitespecific password and a shared seed to create up to 17 hashes for each patient. These Seeded Hash Identifiers (HashIDs) were sent to an honest broker site, REAC Hnet, where the hashed output was merged using a deterministic algorithm that applied sets of rules to the Seeded HashIDs of common patient identifiers to identify matches between records. Two records from REAC Hnet and LTR with the same Seeded HashID were considered a "match" (i.e., be the same patient) if they had the same value for certain Seeded HashIDs. Unique study IDs were then used to replace the matched HashIDs [16]. A primary lung cancer diagnosis in REAC Hnet was identified using ICD-9-CM (International Classification of Diseases, Ninth Revision, Clinical Modification) and ICD-10-CM (International Classification of Diseases, Tenth Revision, Clinical Modification) diagnosis codes. Patients with primary lung cancer diagnosis in LTR were identified using ICD-O-3 (International Classification of Diseases for Oncology, Third Revision) codes C340-C349.

Study population
Our goal was to identify all patients in Tulane and Ochsner healthcare systems who had either a malignant or benign lung nodule between 2013 and 2019. Figure 1 shows the patient selection process and the resulting study populations. Patients with (a) a qualifying biopsy defined by Current Procedural Terminology (CPT) codes or (b) a diagnosis of Suspicious Pulmonary Nodule (SPN) or (c) Low-Dose Computed Tomography (LDCT) defined by ICD-9 or ICD-10 codes in REACHnet between 2013 and 2019 were identified and the first recorded date of SPN or LDCT or chest imaging was defined as the index date (see Appendix 1 for code lists).
Patients were excluded from further assessment if (a) they had a Biopsy only N = 972 (2.74%), (b) they had a SPN with pleural biopsy as first biopsy N = 903 (2.55%), (c) their index date occurred prior to 1/7/2013 N = 2181 (6.15%), (d) their index date was after 12/31/2017 N = 70 (0.20%) (linkage to LTR was not available after 12/31/ 17), (e) their diagnosis with primary lung cancer diagnosis in REACHnet occurred in 2018-2019 N = 196 (0.55%), (f) lung cancer was a secondary diagnosis N = 527 (1.49%), (g) they were not Louisiana residents N = 3 (0.01%), (h) they were diagnosed through Death Certificate only (DCO) N = 25 (0.07%) and (i) they were diagnosed with lung cancer only at autopsy N = 1 (0.003%). A total of 4878 patients met these exclusion criteria and were eliminated from further study. (See Fig. 1) A total of 30,559 patients from REACHnet were linked to the LTR including 2929 patients with a primary lung cancer diagnosis in REACHnet and 27,630 patients without a lung cancer diagnosis in REACHnet between 2013 and 2017. Of the 2929 patients with a lung cancer diagnosis in REACHnet, 1496 had a primary lung cancer diagnosis in both the Tulane or Ochsner health care systems and in the LTR. These 1496 subjects constitute the lung cancer patient population. Records for these 1496 patients were assembled to document patient care from index date to diagnosis date. The diagnosis date from the LTR was used as the study endpoint for the lung cancer cohort and was defined using the earliest date of cytology, or biopsy, or staging biopsy date in the LTR. If none of these dates were available, then the diagnosis date as recorded in the LTR was used. Of the 27,630 patients without a lung cancer diagnosis in REACHnet between 2013 and 2017, 3490 had no evidence of SPN at baseline (i.e., qualified for study inclusion based on a biopsy or LDCT) or during the 1 year of follow-up. In addition, a total of 150 patients had evidence of lung cancer following data linkage with LTR, presumably having left Tulane or Ochsner healthcare systems to receive care elsewhere. Of the remaining 23, 990 patients with SPN at baseline or during the 1-year follow-up, a total of 15,978 had continuous enrollment in REACHnet during the 1-year follow-up period and no evidence of a lung cancer diagnosis. These subjects constitute the SPN patient population. The medical records for these 15,978 patients were assembled to document care from the first recorded SPN (index date) to 1 year from the initial SPN diagnosis date.

Patient pathway
Utilization of invasive and non-invasive procedures until diagnosis in the lung cancer cohort and during the first year following SPN diagnosis in REACHnet were extracted using ICD-9-CM procedure codes, ICD-10-CM procedure codes, and Current Procedural Terminology (CPT) codes. Procedures under investigation in this study included Evaluation and Monitoring (E&M), Biopsy (i.e., surgical biopsy, CT guided biopsy, and bronchoscopy biopsy), Positron Emission Tomography-Computed Tomography (PET CT), Brain Magnetic Resonance Imaging (MRI), and Bone Scan (Appendix 1).

Statistical analysis
Descriptive analysis for demographic characteristics was performed. Demographic characteristics for patients with lung cancer were compiled from data from the LTR data, while data for patients with SPN diagnoses without lung cancer were compiled from REACHnet's electronic health record (EHR) data. Continuous variables were reported in mean and standard deviation (SD). Categorical variables were reported in frequency and percentage distribution. Health services utilization rates were reported in frequency and percentage distribution. Time from index date to diagnosis date by stage was analyzed using the Kaplan-Meier product limit method. All analyses were performed in SAS 9.4 (SAS Institute, Cary, North Carolina, USA). Table 1 shows selected demographics by patient cohort. While direct comparisons between cohorts are problematic given the differences in selection criteria and followup, patients in the lung cancer cohort were older with a preponderance of men relative to patients in the SPN cohort. As would be expected, patients in the lung cancer cohort had, on average, more invasive procedures relative to the SPN cohort however patients in the SPN cohort had more non-invasive procedures relative to the lung cancer cohort attributable, in part, to the longer observation period for patients in the SPN cohort. Table 2 shows the staging and clinical measurements for lung cancer patients (n = 1496) from index date until date of diagnosis stratified by subsequent clinic visit. (Test utilization stratified by stage can be seen in the online appendix). Approximately 15% of lung cancer patients received a diagnosis of lung cancer on their index date while only 8% had biopsy information on their index date. Reasons for the lack of concordance between diagnosis and biopsy information is unknown but likely attributable to patients receiving their index scan and/or biopsy outside of the REACHnet system. The most common clinical measures recorded at this time point were a chest x-ray and an E&M. The presence of other clinical measurements in the medical record was very low. Sixty-nine percent of the patients diagnosed on the index date were Stage 3 or 4. Subsequent visits showed a similar pattern. For example, at visit 1 following the index date, 18% received a diagnosis of lung cancer and 76.6% of these patients were stage 3 or 4. The clinical measurements at visit 1 following the index date showed a decrease in the frequency of chest imaging and increases in the frequency of E&M and biopsy. The frequency of EBUS, PET CT, Brain MRI, and bone scan also increased although the frequency of these measurements was still low (0.6-3.0%). Approximately 38% of all lung cancer patients (N = 564) received their diagnosis on or after the fourth visit. Sixty-three percent of these individuals were stage 3 or 4. Chest imaging and biopsy were the most frequently recorded clinical measures followed by E&M, PET CT and EBUS. The use of LDCT, brain MRI, and bone scan was infrequent throughout the patient visitation process. Figure 2 shows the cumulative probability of diagnosis for lung cancer patients stratified by AJCC stage. Approximately 80% of all stage 3 and 4 lung cancer was diagnosed within a 45-day time period from index date. Eighty percent of stage 2 lung cancers were diagnosed within a period of 60 days and 80% of stage 1 cancers were diagnosed within a period of 90 days. The median time to diagnosis for all lung cancer patients was 13 days with an interquartile range of 2 to 46 days and varied significantly by cancer stage. Approximately 50% of lung cancer patients were diagnosed as stage 4 with a median time to diagnosis of 7 days while stage 3 (22% of the population), stage 2 (7% of the population), and stage 1 (21% of the population) patients' median times to diagnosis were 16, 16.5 and 33 days respectively. (data not shown). The time from index date to diagnosis was significantly shorter for those with stage 3 or 4 lung cancer compared to those diagnosed at an earlier stage (Log rank test, p < 0.0001). Despite the differences in time to diagnosis by lung cancer stage, the tests and procedures on the index date and by visit subsequent to index date are remarkably comparable (see supplemental Table). For example, chest imaging on index date was 67, 68, 73 and 78% for stages 1-4 respectively while biopsy on the index date was 7.6, 13.0, 7.7 and 6.7% respectively. The pattern of clinical measurements on index date and subsequent visits from index for SPN patients is shown in Table 3. A total of 15,978 patients received a SPN diagnosis and were followed for 1 year to determine subsequent work up characteristics. Ten percent of this population (n = 1612) had no evidence of a SPN workup following the index SPN diagnosis. The remaining 14,366 (89.9%) had some evidence of an SPN workup in their medical record the majority of which was E&M and additional chest imaging. Among this group, the recording of LDCT, EBUS, or bone scan was infrequent (< 1%) while the use of biopsy, PET CT and brain MRI ranged from 3 to 6%. Approximately 65% of this patient cohort was seen 4 or more times during the follow-up period and this group tended to have more procedures and documentation of SPN workup compared to those with 3 or fewer visits. Table 4 shows the distribution of biopsies by type, frequency and cohort (lung cancer, SPN only). A total of 1167 (79.4%) lung cancer patients had biopsy information in their EHR, 48.3% of which were bronchoscopic, 42.8% CT guided and 8.9% surgical. A total of 25.7% had multiple biopsies and the average number of biopsies was 1.25 per patient. A total of 932 (5.83%) of SPN patients had biopsy information, either from their index visit or during follow-up in their EHR, 69.1% of which were bronchoscopic, 19.3% CT guided and 11.6% surgical. A total of 35.5% of these patients had multiple biopsies and the average number of biopsies was approximately 1.36 per patient. The overall complication rates following biopsy procedures was low in both cohorts; 3.2% in the lung cancer cohort and 2.68% in the SPN cohort. (Appendix 2) Pneumothorax following CT guided biopsy was the most common complication reported in 4.3 and 5.2% of lung cancer and SPN patients respectively.

Discussion
Numerous studies have documented the timeliness of diagnosis and treatment of lung cancer patients, the majority of which come from European Union member countries. In a systematic review of the literature, Olsson et al. identified 49 studies in which at least one timeinterval in lung cancer care was reported [17]. Median times to diagnosis ranged from 8 to 60 days dependent upon the starting point in the patient journey; some studies began at the identification of symptoms while others started at the identification of a SPN on x-ray or CT. The US studies included in this systematic review were three, small, single center VA hospital chart reviews that reported a median time from CT to diagnosis of 45 days [18] and median times from "consultation" to surgery of 82 days and 104 days [19,20]. More recently, Miaga et al. conducted a chart review of 265 veterans who underwent cancer resection from 2005 to 2015 to assess time intervals between nodule identification, diagnosis, and surgical resection; changes in nodule radiographic size over time; final pathologic staging; and reasons for delays in care [21]. They reported a median time from nodule identification to resection of 98 days with an interquartile range of 66-139 days. Yorio and colleagues identified 482 lung cancer patients treated at three hospitals associated with the University of Texas (UT) Southwestern Medical Center and reported a  [23]. The difficulty in using first recorded symptoms as the starting point in a patient journey is their lack of specificity. Cough, shortness of breath and chest pain were the most frequently reported symptoms prior to diagnosis in a Canadian study of lung cancer patients; symptoms common to a variety of disease states from asthma to cardiac disease [24]. Our study focused on the time period from first identified SPN, CT image or qualifying biopsy to diagnosis and found a median of 13 days with an interquartile range of 2 to 46 days. However, we found that there are significant differences in the timeliness of diagnosis by cancer stage; stage 3 and 4 cancers being diagnosed more quickly than stage 1 or 2 cancers. In addition, there is a great deal of dispersion irrespective of clinical stage with some 5-20% of patients requiring more than 6 months to get a diagnosis.
While guidelines exist regarding the evaluation of individuals with pulmonary nodules and lung cancer there are few guidance documents recommending timelines for lung nodule identification to diagnosis [6,25,26]. The NHS has recently published a handbook  recommending that the time from nodule identification to diagnosis should be no more than 28 days [27]. Similar recommendations come from Canada and Norway [28,29]In this study, approximately 75-80% of patients with stage 3 or 4 cancer were diagnosed within a 30-day period from nodule identification. The timeliness of diagnosis for those with stage 1 or 2 lung cancer was less with approximately 55-65% receiving a diagnosis within 30 days of nodule identification. These differences in timeliness of diagnosis by stage were statistically significant at the p < 0.001 level. This could be attributable to symptoms associated with advanced disease as well as the size and location of the more advanced tumors making biopsy access easier. In addition, differences in timeliness to lung cancer diagnosis between early stage and late stage tumors may have also been affected by a willingness on the part of the clinician or patient to "monitor" the lung nodule and obtain a repeat chest CT after a specific time interval following the index visit. Nonetheless, depending on tumor stage, some 25-45% of lung cancer patients exceeded these timing recommendations.
The results from our study suggest that future guidelines on timeliness and quality of diagnosis should take into account clinical stage at presentation. Among those patients with an SPN diagnosis but no evidence of lung cancer, follow-up procedures, the majority of which were referral visits or additional chest xrays, were performed on the majority of this patient population. This contrasts with the findings of Pyenson et al. who studied administrative claims data (i.e., Mar-ketScan) and reported that only 36% received follow-up care [30]. However, Pyenson and colleagues did not include chest imaging or E&M as part of their algorithm and these constituted 60-90% of follow-up procedures in our study. Ideally imaging should be conducted before the E&M but the data from this study highlight that this is not always the case. Coding and clinical care are not that similar and these administrative databses lack the granularity to address this issue. The use of PET, nonsurgical biopsy and surgical resection as follow-up measures were quite similar between our study and theirs (3.4% vs 3.6, 2.9% vs 4.9, 0.4% vs 1% respectively).
The distribution of biopsies among lung cancer patients in our study mirrored those of others [31][32][33][34]. The pattern was slightly different for the SPN cohort in that approximately 70% of the biopsies were bronchoscopies compared to 48% in the lung cancer cohort. In addition, the biopsy rate was much lower in the SPN cohort relative to the lung cancer cohort (6% vs 79% respectively). Multiple biopsies were common in both groups occurring in 25% of the lung cancer group and in 35.5% of the SPN cohort. The frequency of multiple biopsies in the lung cancer cohort was lower in our study relative to others possibly attributable to the fact that we did not include biopsies performed after the diagnosis date (i.e., for staging purposes) [33][34][35]. Complications from biopsy procedures were rare in both patient cohorts with pneumothorax occurring in 4-5% of those undergoing CT guided biopsy attributable, in part, to the use of a conservative algorithm to insure an identified complication was the result of a specific procedure. (see Appendix 1 for codes).
Currently most lung cancer patients present with advanced disease. Our study is no different in that 69.2% of patients were diagnosed as stage 3 or 4. As screening is introduced more widely, the number of small nodules detected will increase, and most of these will be benign [36]. Guidelines that use a one-timeline for all can create counter-productive incentives, leading to unnecessary biopsies and ignore the value of careful observation and follow-up of low probability nodules. Guidelines on timeliness of care should take into account clinical stage at diagnosis and a more nuanced approach should be applied. Finally, technological improvements in the ability to access and biopsy small nodules in peripheral locations will facilitate more aggressive management of the SPN without compromising patient safety.
This study is not without limitations. This is a retrospective study that is associated with confounding factors for biopsies and treatment procedures. The patient population comes from two Louisiana healthcare systems and the patient pathways from nodule identification to diagnosis may not be generalizable to other healthcare systems throughout the country. However, findings regarding the stage at diagnosis are consistent with those previously reported throughout the literature. In addition, the linking of data from REACHnet with the Louisiana Tumor Registry resulted in many patients being excluded from the final cohort per protocol. For example, almost half of the patients with a diagnosis of lung cancer in REACHnet were excluded from the final sample because they had no record in the LTR or no record in the Tulane or Ochsner healthcare system. The impact of these patients on the patient pathway results is unknown.

Conclusions
There are significant differences in the timeliness of diagnosis by cancer stage; stage 3 and 4 cancers being diagnosed more quickly than stage 1 or 2 cancers. In addition, there is a great deal of dispersion irrespective of clinical stage with some 5-20% of patients requiring more than 6 months to get a diagnosis. These results suggest that future guidelines on timeliness and quality of diagnosis should take into account clinical stage at presentation. More aggressive screening and referral coupled with timely diagnosis of earlier stage lung cancer and more aggressive follow-up of SPN will have the largest impact on patient outcomes.