The EPICURE study: a pilot prospective cohort study of heterogeneous and massive data integration in metastatic breast cancer patients

Background Breast cancer is the most common cancer in women and the first cancer concerning mortality. Metastatic breast cancer remains a disease with a poor prognosis and about 30% of women diagnosed with an early stage will have a secondary progression. Metastatic breast cancer is an incurable disease despite significant therapeutic advances in both supportive cares and targeted specific therapies. In the management of a metastatic patient, each clinician follows a highly complex and strictly personal decision making process. It is based on a number of objective and subjective parameters which guides therapeutic choice in the most individualized or adapted manner. Methods/design The main objective is to integrate massive and heterogeneous data concerning the patient’s environment, personal and familial history, clinical and biological data, imaging, histological results (with multi-omics data), and microbiota analysis. These characteristics are multiple and in dynamic interaction overtime. With the help of mathematical units with biological competences and scientific collaborations, our project is to improve the comprehension of treatment response, based on health clinical and molecular heterogeneous big data investigation. Discussion Our project is to prove feasibility of creation of a clinico-biological database prospectively by collecting epidemiological, socio-economic, clinical, biological, pathological, multi-omic data and to identify characteristics related to the overall survival status before treatment and within 15 years after treatment start from a cohort of 300 patients with a metastatic breast cancer treated in the institution. Trial registration ClinicalTrials.gov identifier (NCT number): NCT03958136. Registration 21st of May, 2019; retrospectively registered.


Disease background
Breast cancer is the most common cancer in women with 58,459 new cases in France in 2018. It is the first cancer concerning mortality with 12,146 deaths in 2018, but mortality rate is decreasing in France since the last 15 years. This decreasing rate is in relation with early detection, screening and adjuvant therapies [1].
Metastatic breast cancer remains a disease with a poor prognosis with a 5-year survival less than 20%, and a median-survival of 24 to 30 months after metastasis diagnosis. Each year 5 to 10% of new breast cancers are diagnosed with a metastatic staging. About 30% of women diagnosed with an early stage will have a secondary progression. Metastatic breast cancer is an incurable disease despite significant therapeutic advances in both supportive cares and targeted specific therapies (anti-HER2, anti-estrogenic) and cytotoxic molecules [2][3][4][5]. This therapeutic arsenal improves clearly quality of life of patients, and sometimes a gain in terms of overall survival.

General management of therapies in metastatic breast cancer
In the management of a metastatic patient, each clinician builds his own decision algorithm. It is based on a number of objective and subjective parameters which allow the therapeutic decision making process to become the most individualized or adapted: -Extrinsic objectives parameters are currently based on EBM (evidence-based-medicine): the age of the patient, the aggressiveness of the disease, previous therapies (neoadjuvant, metastatic), relapse time to initial diagnosis, hormone receptor (HR) expression, estrogen (ER +) and / or progesterone (PR +), overexpression of the oncogene HER2 (HER2 +), mutation of PIK3CA, ESR1or BCRA1/2, expression of PDL1 and previous clinical trial results (overall survival, time to progression). -Intrinsic subjective parameters are taken into account in decision-making: parameters that are linked to the oncologist's assumptions, such as, for example, the sensitivity to the theoretical efficacy of treatments and the definition of sensitivity. From the point of view of the patient, the choice is influenced by her more or less pregnant social life, the experience of a previous treatment, her age, her psychological state, her symptoms and the survival hoped gain.

Current therapeutic strategies
Currently, the clinician rationalizes these therapeutic indications according to the prediction of the treatment response from the "phenotypic classification" [6][7][8]. This immunohistochemistry (IHC)-based classification includes three subtypes: breast cancers defined as luminal by HR positivity, HER2 + cancers and triple-negative cancers (HR and HER2 -). The targeting of oncogenic addictive pathways by anti-estrogenic therapies (SERM -Selective Estrogen Receptor Modulators, SERD -Selective Estrogen Receptor Degradation and aromatase inhibitors) or HER2 inhibitory approaches (trastuzumab, pertuzumab, TDM-1,lapatinib, neratinib) induces mitigate signals of death, survival, and cell proliferation [9][10][11]. However, initially, the signals of death and cellular arrest are predominant and then they reverse under therapeutic pressure. The tumor escapes by adapting to its new environment induced by the treatment. The identification of resistance or adaptive pathways led to the development of additive strategies. This strategy with a strong EBM literature has been shown to be effective in both ER + (CDK, mTOR and PI3Kinase inhibitors) and HER2 + patients (pertuzumab, lapatinib, TDM-1, neratinib) [12]. This strategy derives directly from the first strategy, via the identification (by DNA sequencing technique) of anomalies for which there is a specific therapy [13][14][15]. However, the decision algorithms, described and based on a target-one treatment, are not optimal and it is now necessary to define a new therapeutic strategy based on a systemic approach for a complex disease.

Research hypothesis
As explained above, current therapeutic strategies are based on a reductionist approach, and they do not meet the expected success. Cancer is a complex disease relying on multiple parameters in dynamic, organized and evolving interactions, and analysis of a complex system requires a systemic approach (Fig. 1).
Thus, we need to evolve from a reductionist, disjunctive, analytical view of the characterization of cell components (genes, transcripts, proteins, etc.) to a global, systemic, conjunctive and organizational vision: distinct datasets are linked and we need to unravel these underlying links.

Massive data
In our current and modern clinical practice with new innovative and numeric tools, physicians collect massive data relative to the patient. Multi-omics approach is now described in literature [16][17][18].
In a global approach it seems important to collect the most exhaustive global information about the patient and not only the biological characteristics. However, these data are usually heterogeneous, quantitative versus qualitative, possibly censored or missing.
To our knowledge, little literature exists about the exploitation of such massive and heterogeneous data in metastatic breast cancer field.
We thus intend to integrate a massive and curated database with dynamic data overtime that will allow us to model the metastatic cancer during its various stages of progression, and will help us to understand it and better individualize the treatments.
Nevertheless, the heterogeneity, the censured character of the data, and above all, the very large number of variables with respect to the number of patients involve the use of statistical methods which have the ability to remain efficient despite these constraints (see the mathematical section below for details). As a consequence, it seems of first importance to associate the expertise of several teams in order to provide a satisfying method to decide which treatment process is the most adapted to each patient.  PATIENT REPORTED OUTCOME (PRO)

Rationale for conducting this study
Resistance to treatment in metastatic breast cancer remain poorly understood. The hypothesis on the multifactorial mechanisms of resistance must include tumor datas, patients and environment datas and need to be prospectively studied. This hypothesis explains the building of this prospective database concerning metastatic breast cancer patients. This database contains epidemiological, socioeconomic, clinical, biological, imaging, pathological and multi-omics data in order to take into account this multifactorial hypothesis.
With this project, we want to demonstrate the ability to exploit complex data in healthcare and in particular in cancer management. We chose a specific metastatic breast cancer model with no literature available for mathematical development in this application field. By sharing dynamic expertise in massive data and mathematics with different units, we want to enhance therapeutic management in the actual metastatic breast cancer example chosen.
Justification for this study is based on the following 3 points: Prediction and new modelling of breast cancer outcome from complex data sources Creation of algorithms and expertise to use massive data in cancer management Interdisciplinary databases and co-working for data collection and analysis.

Study objectives Short term and main purpose
To prove feasibility of creation of a clinico-biological database prospectively by collecting epidemiological,

EDPU-12 Treatment
Legend: EPDU EPICURE data producing unit, CT computed tomography, PET positron emission tomography socio-economic, clinical, biological, imaging, pathological and multi-omics data before treatment and within 15 years after treatment start. At the end of feasibility period, this database will contain a complete view for 300 patients. Once proved the feasibility, further prospective inclusions will permit in the mid-term to identify with sufficient statistical power the independent prognostic parameters for 15 yr-overall survival among the environmental, clinical, biological, imaging, bio-pathological and omic collected characteristics of patients with metastatic breast cancer.

Exploratory objective
We will use in silico methods to integrate together complex data (epidemiological characteristics, clinical, biological, imaging, bio-pathological, and microbiota characteristics of each patient [19]) from this cohort in order to define an algorithm of individual decision for the prediction of the treatment response with needs to develop new statistical and modeling tools.

Study design
This is a prospective uncontrolled cohort study of patients with metastatic breast cancer (Fig. 2).
Patients are followed in the institution (ICO cancer center, Nantes and Angers) with the usual therapeutic care and additional samples for 15 years.

Study population
Three phenotypic groups are identified on IHC done at inclusion: on metastatic sites or breast tumor if local recurrence, usual treatment protocols are often guided by the following groups: -patients with history of adjuvant therapy.
-patients with de novo metastatic disease. For statistical analysis we will define a specific subgroup of BRCA mutation patients studied to highlight specific elements according to the main objective.

Inclusion criteria
1. Written informed consent obtained from the patient prior to performing any protocol-related procedures, including screening biopsy, blood sample, faeces and questionnaires 2. Men or women > 18 years old at time of written consent 3. Patient with histologically confirmed breast cancer 4. Breast cancer metastatic disease or locally advanced not eligible for local curative treatment intent with or without personal history of adjuvant therapy for this cancer (chemotherapy, radiotherapy, surgery …) Data collection ( Table 2) Data management For clinical data management the platform used to collect and manage the database will be centralized and hosted with the entire control of the institution. All access to all data (entry, modification or simple consultation) is only possible with a password and is plotted in the database.
According to the recommendations of regulatory authorities, procedures have been defined and implemented to ensure the physical and computer security of the data: -Access is protected -Equipment hosting the database are dedicated and deposited in a private bay of the secure data center. -Backup of the computer system -Measures ensure the safeguarding of the computer system -Measures ensure the confidentiality of the data during the development of the computer application -Measures ensure the confidentiality of data during the maintenance of software or equipment -Authentication / Identification of the persons authorized to access the application

Sample size and statistical analysis
Determination of sample size: the primary endpoint is to detect predictive factors (profile), based on clinical and molecular analyses, and associated with 5 years-overall survival.
With experience of observational studies in our institution like ESME [21], the accrual rate of patients meeting inclusion criteria is 165 patients per year. According to observed events (OS) occurring within 60 months of follow-up, the proportion of patients alive at 60 months is 15%.
To provide a power of 80% to detect a clinico-biological profile that reduced OS with a hazard ratio equal to 1.5 and to concede a 5% first species error rate alpha, we plan to include 300 patients. Indeed, the number of event would be around 254, which allow analysing 20 profiles.
For BRCA mutation we will collect this information to define a sub-group for specific analysis.
Statistical analysis process: Table 3 Discussion The EPICURE study aims to prove feasibility of creation of a dynamic and longitudinal clinico-biological database prospectively by collecting epidemiological, socioeconomic, clinical, biological, pathological, multi-omic data. It offers a systemic and "more exhaustive possible" approach to collect all data available without "a priori" on its interests and with large variety of data in the longitudinal way of real-life. Enrollment started in December 2018. This cohort and its databases serve different research programs: SIRIC-ILIAD project (Imaging and Longitudinal Investigations to Ameliorate Decision-making in Multiple Myeloma and Breast Cancer) on imaging and biological research approaches; program supported by INCA DGOS and INSERM.
FEDER program on molecular imaging technological approaches and data integration specific questions.
Several specific scientific projects; based yet on one or several clinical, biological or omic compartiment data.