Skip to main content

Breast-cancer detection using blood-based infrared molecular fingerprints

This article has been updated



Breast cancer screening is currently predominantly based on mammography, tainted with the occurrence of both false positivity and false negativity, urging for innovative strategies, as effective detection of early-stage breast cancer bears the potential to reduce mortality. Here we report the results of a prospective pilot study on breast cancer detection using blood plasma analyzed by Fourier-transform infrared (FTIR) spectroscopy – a rapid, cost-effective technique with minimal sample volume requirements and potential to aid biomedical diagnostics. FTIR has the capacity to probe health phenotypes via the investigation of the full repertoire of molecular species within a sample at once, within a single measurement in a high-throughput manner. In this study, we take advantage of cross-molecular fingerprinting to probe for breast cancer detection.


We compare two groups: 26 patients diagnosed with breast cancer to a same-sized group of age-matched healthy, asymptomatic female participants. Training with support-vector machines (SVM), we derive classification models that we test in a repeated 10-fold cross-validation over 10 times. In addition, we investigate spectral information responsible for BC identification using statistical significance testing.


Our models to detect breast cancer achieve an average overall performance of 0.79 in terms of area under the curve (AUC) of the receiver operating characteristic (ROC). In addition, we uncover a relationship between the effect size of the measured infrared fingerprints and the tumor progression.


This pilot study provides the foundation for further extending and evaluating blood-based infrared probing approach as a possible cross-molecular fingerprinting modality to tackle breast cancer detection and thus possibly contribute to the future of cancer screening.

Peer Review reports


Breast cancer (BC) represents the most frequent cancer in women with a global incidence above 2 million, and an annual mortality above 600,000 patients in 2018 [1, 2]. The cure rate remains correlated with the stage at diagnosis; therefore, early detection and screening programs are crucial [3,4,5,6]. Often, BC screening is based upon radiologic approaches, mostly mammography [4]. These screening modalities, predominantly applied in developed countries, are associated with a significant reduction in mortality (19% overall reduction of the relative risk [1]). However, major limitations and debatable cost-effectiveness of these approaches persist [4, 6]. Due to the limited sensitivity and specificity of current medical diagnostics, cancer can either be overlooked (false negatives) or falsely detected (false positives), leading to either delayed interventions or unnecessary, potentially harmful investigations or psychological stress [7]. Also, BC screening in certain regions of the world remains rudimentary despite grim global projections suggesting a doubling of BC cases within the coming 20 years, mostly in these countries [1].

This concerning situation calls for additional strategies for BC screening, as detection of early-stage BC bears potential to significantly reduce mortality. Hence, there is a high need for complementing current medical diagnostics with efficient, non-invasive or minimally-invasive methods that could possibly lead to new easily implementable high-throughput screening and detection approaches, prior to tissue-biopsy-based diagnostics and molecular profiling [8].

Liquid biopsies have attracted interest over the past decade as a non-invasive approach for disease detection, screening and cancer monitoring [9]. Molecular analyses of human blood derivatives, such as plasma or serum, provide systemic molecular information, and enable novel routes of diagnostics [8, 10]. So far, most liquid biopsies predominantly rely on the analysis of a few pre-selected analytes and biomarkers. Although the emergence of highly sensitive and molecule-specific methods in the fields of proteomics [11,12,13], metabolomics [14, 15], and genomics [16,17,18] has led to the discovery of thousands of different biomarker candidates, only a few of them have been validated and transferred to the clinic so far [19]. Moreover, given the complexity of the disease as well as its etiology, increasing the number of analytical methods for cancer detection, such as in multi-omics, could potentially lead to higher detection rates at early stage. However, practically, this will lead to unfeasibly high costs for broad clinical use. It is thus evident that methods that have the capacity to capture information across the entire molecular landscape would be advantageous.

Infrared molecular spectroscopy may be very beneficial here − it detects signals from all types of molecules in a sample in a single time- and cost-effective measurement in a label-free manner [20, 21]. When applied to blood plasma (or serum) samples, infrared spectroscopy delivers infrared molecular fingerprints (IMFs) reflecting the chemical composition of a sample, i.e. the person’s molecular blood phenotype [22, 23]. Even though the IMF of molecularly highly complex blood plasma can only partially be traced back to its molecular origin [24], it may be sensitive and specific to the health state of an individual. In a recent longitudinal study, we have shown that defined workflows to collect, store, process and measure human liquid biopsies lead to reproducible IMFs in healthy, non-symptomatic individuals that are stable over clinically relevant time scales [22, 23]. Numerous studies have shown the potential of blood-based IMFs for the detection of breast cancer [25,26,27,28]. Despite these promising initial results, the majority of these studies had a high risk of bias due to patient selection [29]. In fact, it was shown that IMFs are susceptible to external confounding factors, such as those related to sample handling and data collection, as well as to inherent biological variations (e.g. age, body-mass index) that can however affect cancer detection [30]. Since many cancer-related therapies may leave footprints in the chemical composition of peripheral blood, it is essential to evaluate the extent of infrared fingerprint differences at the time when cancer patients have only been diagnosed with malignancy, prior to any cancer-related therapy. This has not been assessed previously, and the estimation of a blood-based infrared fingerprinting approach as a new BC screening modality was not evaluated. In this work, we measured intact blood plasma samples, with FTIR transmission spectroscopy directly in liquid form, prior to any cancer-related therapy, along with non-symptomatic reference individuals, which have been carefully matched to BC cases. By applying support vector machine (SVM) algorithms to train models for binary classification, we obtained a detection efficiency of about 0.79 (area un- der the receiver operating characteristic (ROC) curve, AUC). The present study provides a first estimation of feasibility to directly probe liquid blood plasma for minimally-invasive BC detection, an approach that is easily implementable and could be extended to high-throughput BC screening applications.


Study population and sample collection

Presented results are based on a prospective, single center, observational clinical study. The aim of the study was to assess whether the combination of infrared spec- troscopy of liquid biopsies (blood plasma) with machine learning infrared spectral analyses has any capacity to detect breast cancer (BC). For this purpose, a cohort of female patients diagnosed with BC at the Oncology Centre, King Saud Univer- sity Medical City (KSUMC), Riyadh, Saudi Arabia, was compared with a cohort of women without BC, reference individuals. Inclusion criteria for participation in the study were as follows: Asymptomatic reference individuals were adult females participating in organized or voluntary BC screening, assessed with mammography and (if necessary) breast ultrasound and/or magnetic resonance imaging (MRI). Patients with BC were included after confirmation of pathological diagnosis of invasive breast cancer and prior to any therapeutic intervention for breast cancer. Subjects included in the trial were identified by a trial-specific code, guaranteeing their anonymity.

For the purpose of the study up to 19,6 ml of venous blood was collected per enrolled subject. The tubes were centrifuged for 10 min at 7000 g at a temperature of 4 °C and the supernatants of blood plasma were then aliquoted into 1.5 ml tubes (1 ml plasma each) and stored at 80 °C. These procedures were carried out at the KSUMC. The 8 aliquots of each sample were numbered anonymously. The correspondence list between the subject number and the aliquot number were maintained by the clinical research associate (CRA) coordinator at KSUMC. Samples were processed using same standard operating procedures and shipped from the KSUMC to measurement laboratories at the LMU on dry ice. They have all been processed simultaneously, and have all undergone the same number of freeze-thaw cycles. Once all the samples have been collected and stored (from all individuals involved), these have been all defrosted and measured as liquids within the same measurement campaign along the same procedure. Standardization of procedures and workflows applied assured for minimalization of possible noise due to sample preparation as well as facilitated sufficient reproducibility.

The BC patient group (n = 26) consisted of patients diagnosed at KSUMC with the following characteristics: mean age: 49 years (30-62), previous pregnancies: 17 patients (65.4%), pre/peri-menopausal: 11 patients (42.3%), operable non-metastatic BC (stage IA-IIIA): 16 patients (61.5%), invasive ductal carcinoma: 24 patients (92.3%), estrogen receptors positive: 14 patients (53.8%) and HER2 positive: 17 patients (65.4%). It is important to note that patients are regularly referred to KSUMC from secondary hospitals where cancer medications are not readily available (e.g. anti-HER2 monoclonal antibodies). Therefore, the breast cancer accrual at KSUMC does not reflect the usual split between breast cancer molecular subtypes and thus leads to, in particular, an excess of HER2-positive molecular subtypes.

Achieving covariate balance between cases and controls is a standard procedure in observational studies for minimizing the effect of confounding factors and limiting the bias throughout all derived results. In this work, we seek balance in terms of age and BMI. This is achieved by pairwise matching. Out of the 67 samples of the initial control group (collected within BC screening programme), given our criteria only 26 individuals of these were selected for inclusion into a control group that is in covariate balance with the collected BC cases. Table 1 shows the characteristics of the balanced cohort, used for further analysis. In addition, a detailed anonymized file (metafile.xlsx) that lists all available information of the recruited individuals (28 potential cases and 67 potential controls, before matching) is provided along with the manuscript.

Table 1 Characteristics of the balanced cohort

Spectroscopic analysis

The spectroscopic measurements were performed in liquid phase with an automated FTIR device (MIRA-Analyzer, micro-biolytics GmbH) with a flow-through transmission cuvette (CaF2 with 8 μm path length). The spectra were acquired with a resolution of 4 cm 1 in a spectral range between 950 cm 1 and 3050 cm 1. A water reference spectrum was recorded after each sample measurement to reconstruct the IR absorption spectra. To track potential experimental errors throughout the entire experiment [31], a measurement of pooled human plasma (BioWest, Nuailĺe, France) was performed after every 5 samples. Negative values of absorbance, which occurs because the liquid sample contains less water than the reference (pure water), was corrected for by a previously described approach [22]. It is known from measurements of dried plasma that there is no significant absorption in the wavenumber region 2000-2300 cm 1, resulting in a flat absorption baseline. This is also confirmed to approximately hold for the case of liquid plasma. We used this fact as a criterion for adding to each spectrum a previously measured water absorption spectrum to account for the missing water in the sample measurement and minimize the average slope in this region in order to obtain a flat baseline. All spectra were truncated to 1000-3000 cm 1 and removed the entire silent region (1800-2800 cm 1). Finally, to correct for experimental (instrumental/measurement) variations that can affect the overall absorbance of a fingerprint, all spectra were normalized as vectors, using Euclidean (L2) norm. Panel (a) of Fig. 1 shows the distributions of measured spectra (after water correction) of the BC cases and their associated controls. The infrared spectral pre-processing was performed similarly to a previous work [22].

Fig. 1
figure 1

Infrared spectra and classification. a Distributions of measured spectra (after water correction) for cases and controls. Solid lines indicate the means of all measurements in each group and shaded areas depict the corresponding standard deviations. b Average ROC curves extracted from a repeated 10-fold cross-validation over 10 times for binary classification using linear SVM

Data analysis

To derive classification models, we used Scikit-Learn (v. 0.23.2), an open-source machine-learning framework in Python (v.3.7.6). We trained binary-classification models using linear SVM. Performance evaluation was carried out using repeated stratified 10-fold cross-validation and its visualization using the notion of the ROC curve. The results of the cross validation are reported in terms of descriptive statistics: the mean value of the resulting AUC distribution and its standard deviation. For statistically comparing two groups of spectra (i.e. cases, references), we followed three approaches. First, we calculated the “differential fingerprint” (differential infrared spectrum), defined as the difference between the mean absorbance per wavenumber of the cases a contrasted against the standard deviation of the reference group for obtaining a visual understanding of which wavenumbers are potentially useful for distinguishing/classifying the two populations. Such a graph serves as a visual representation of what is known as the “effect size” [32], which can be obtained by standardizing the differential fingerprint and has an evident relation to the AUC per wavenumber. Secondly, we performed t-test (testing the hypothesis that two populations have equal means) for extracting two-tailed p-values per wavenumber. As a last, third step, we make use of the Mann–Whitney U test (also known as Wilcoxon rank-sum test) for extracting the U statistic and calculating the AUC per wavenumber by the relation AUC = U/(n1 × n2), where n1 and n2 are the sizes of the two groups.


Infrared molecular fingerprinting for classification of breast cancer

To evaluate whether IMF probing of liquid plasma has any capacity to detect BC, we performed binary classification for distinction between the BC patients and the matched asymptomatic reference individuals (Table 1 and Fig. 1a). The detection efficiencies achieved on the test sets correspond to an AUC value of 0.79 for normalized FTIR spectra. A higher AUC value of 0.81 could be achieved using non-normalized spectra (Fig. 1b). Despite the higher AUC obtained for non- normalized spectra, we consider the analysis of normalized data to be more reliable. Vector normalization reduces measurement uncertainty which can be a major factor of bias, especially in cases of small sample sizes. Overall, these results deliver the first evidence that the molecular differences between reference individuals and matched therapy-naive BC patient females can be detected with infrared fingerprinting of fluid blood plasma.

Infrared spectral probing of breast cancer

In order to understand infrared spectral information responsible for BC identification, we have evaluated the infrared spectral signatures that are relevant for distinguishing breast cancer cases from the reference, control individuals. For this purpose, we evaluated the differential fingerprints that we defined as the difference between the mean IMF of the case cohort and that of the reference cohort (Fig. 2a). This quantity, when compared to the standard deviation of the reference group (shaded area in Fig. 2a), reveals the locations along the spectrum for which the difference between the means of the two groups is larger than the sample standard deviation. These differences become even more apparent in Fig. 2b, which depicts the effect size, defined as the differential fingerprint divided by the standard deviation of the reference group. We reveal that at specific spectral locations, the effect size exceeds the barrier of one standard deviation, indicating potentially significant differences between the sample means of the two distributions.

Fig. 2
figure 2

Spectral features. a Mean absorbance difference per wavenumber between cases and references (differential fingerprint) b Effect size per wavenumber. This quantity is known as the Cohen’s d in signal detection theory and corresponds to the standardized difference between the mean absorbance of the cases and references. The dashed line indicates effect size of one standard deviation. c P-values per wavenumber, by performing local two-sided t-tests. d ROC AUC extracted by the Mann-Whitney U-test. The dashed line corresponds to the AUC value of the trained SVM model. The shaded rectangular areas, in all panels, indicate spectral regions where highly-significant differences have been identified

To evaluate the statistical significance of the differences detected in latter analysis when comparing two groups of data, we additionally determined the p-value per wavenumber by performing two-sided t-tests. Importantly, we find that p-values of highest significance, as low as 10 4, are observed in the spectral regions that directly correspond to large effect size (Fig. 2c). Moreover, to further examine the comparison, we calculated the AUC per wavenumber using the U statistic of a Mann-Whitney U test (as described in the Methods section). We observe that the AUC per wavenumber follows a similar pattern as the effect size (Fig. 2d). Interestingly, for the wavenumbers with the lowest p-values and the most significant differences, the single-feature AUC reaches (and in some cases exceeds) the one obtained from the application of the SVM model trained on the entire spectrum (dashed line in Fig. 2d).

The results we provide are the first indication that the presented approach is feasible for the purpose of BC detection and that the predictive power of machine learning can be further leveraged in future analyses requiring larger sample sets. Our presented feasibility evaluation is instrumental for the establishment of a lower bound of the AUC and motivates the collection of larger data and sample sets which shall increase the prediction performance and capacity of the approach.

Efficiency of breast cancer detection at different stages of malignancy

Cancer detection is challenged by the enormous biological and clinical complexity of cancer, and detection is further complicated by the significant intra-tumor heterogeneity as well as by the impact of the tumour micro-environment [33]. To evaluate whether the blood-based IMFs are sufficiently sensitive to detect tumors at different stages of progression, we first investigated whether the IMF characteristics depend on the stage of the tumor, characterized in terms of clinical TNM (tumor node metastasis) staging [34]. For this purpose, we split the BC cases into two groups and compared them separately with the non-symptomatic, reference individuals. The first group corresponds to the non-metastatic (M0) patients (stages I, II, III) and the second group to metastatic (M1) patients at tumor stage IV. The characteristics of the two groups are shown in Table 2.

Table 2 Breakdown of cases in terms of cancer staging

Panels (a) and (b) in Fig. 3 depict the differential fingerprints, and the effect size per wavenumber and the area enclosed by the differential fingerprint, for each case group compared separately to the controls. P-values lower than 10 2 are observed in the spectral regions that correspond to large effect size (3 c). Altogether, we observe that the differences between cases and references are much more pronounced across the entire shown spectral range for the metastatic cases with stage IV tumours.

Fig. 3
figure 3

Tumor staging. a Mean absorbance difference per wavenumber (differential fingerprint) between cases and references, for metastatic and non-metastatic patients. The inset shows the relative sizes of the area enclosed by the two differential fingerprints. b Effect size per wavenumber, for metastatic and non-metastatic patients. The dashed line indicates effect size of one standard deviation. c P-values per wavenumber, by performing local two-sided t-tests


This study provides the first indication that the molecular differences of blood plasma between reference individuals and matched therapy-naive breast cancer females have the potential to be detected with infrared fingerprinting of crude, native liquid plasma. Although previous studies on BC detection have yielded fairly high classification efficiencies [28], they have used dried sera samples, which is known for its limitations.

As a novelty of the approach, here we showed that similar efficiencies can be achieved using measurements of liquid plasma directly. This is advantageous, especially as native plasma sample measurements are more reproducible, require only minimal sample processing and are thus more time efficient, while not leading to known artifacts such as the so called “coffee-ring effect” [35].

This work provides an assessment of the feasibility of infrared molecular probing for breast cancer detection by implementing robust matching that eliminates age and BMI as possible confounding factors. Although the matching excluded a lot of collected data, it is set such that it provides unambiguous assessment of the suitability of the approach. Albeit being very promising, the results of this study need to be further extended and evaluated in larger populations, as we could not involve many of the collected samples into our final investigation, and furthermore, samples from multiple clinical sites need to be further investigated. The findings of this study indicate that the predictive power of machine learning can be further leveraged in future analyses requiring larger sample sets. Our presented feasibility evaluation is instrumental for the establishment of a lower bound of the AUC and motivates the collection of larger data and sample sets which shall increase the prediction performance and capacity of the approach. Importantly, given the ease and stability of FTIR operational workflows to probe bulk fluid plasma, the approach presented here is robust and reproducible [22] and shall be extendable to larger cohorts in a straightforward way to any given population.

Given that this clinical study has been performed on a population enrolling women living in Saudi Arabia, it will be important to evaluate whether blood-based infrared fingerprinting - as a new phenotyping modality - is in position to detect breast-cancer-specific signals independent of different genetic backgrounds and lifestyles. In particular, it will be essential to investigate whether the presented approach could possibly contribute to lowering the rate of false positive outcomes from current screening programs, to possibly provide an additional new approach to be combined with mammography.

Overall, we find a consistent pattern of infrared spectral changes encoded in the IMFs which is more pronounced in the case of more progressed BC stages (either larger tumour volume, or metastatic spread). Although performed within a limited study setting, these findings suggest that the information retrieved from the measured differences between the IMFs of BC cases and references is connected to cancer-related molecular changes. These changes may be due to larger tumour load leaving a more extensive footprint on the composition of peripheral blood, or to the fact that tumour progression could have caused a higher systemic response, or to a combination of both.


This is a pilot study applying infrared spectroscopy of liquid blood plasma in combination with machine learning for the detection of cancer, showcased on the example of BC. This approach to BC detection, using liquid biopsies, enabled us to differentiate between patients with BC and non-symptomatic reference individuals with an AUC of 0.79, importantly, prior to any cancer-related therapy. In addition, statistical testing shows that the informative signals, captured by the IMFs, are related to the progression of the disease. This pilot study has been performed on a limited cohort with specific characteristics and thus further studies for validating the results on independently-collected samples are necessary. A large-scale validation study is in progress, and additional studies on the detection of several other tumour types are on the way. If proven for its feasibility, given the ease of technical implementation along with the possibility to be extended to high-throughput populational level, this approach possesses the capability to address currently unmet needs in oncology, and has a potential to contribute to the future of precision medicine. Given the time- and cost-efficiency of the approach, we envisage it to be possibly applied in the initial phase of primary disease diagnostics. The main objective may not be to isolate new biomarker candidate molecules, but to efficiently probe with minimally-invasive liquid biopsies in the first instance, before individuals proceed to further diagnostic approaches (based on gold-standard diagnosis by tissue biopsy/radiology).

Availability of data and materials

Anonymized raw datasets are available along with the manuscript. Any additional information and data are available upon reasonable request. The custom code used for the production of the results presented in this manuscript is stored in a persistent repository at the Leibniz Supercomputing Center of the Bavarian Academy of Sciences and Humanities (LRZ), located in Garching, Germany. The code can be shared upon reasonable request.

Change history

  • 13 May 2022

    Open Access funding statement has been added to this article.



Breast cancer


Support vector machines


Fourier-transform infrared


Area under the curve


Receiver operating characteristic


Infrared molecular fingerprints


Magnetic resonance imaging


Tumor node metastasis


Clinical research associate


Body-mass index






  1. Global Cancer Observatory. Accessed: 2021-03-01.

  2. Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 2018: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;68(6):394–424.

    PubMed  Google Scholar 

  3. Smith RA, Andrews KS, Brooks D, Fedewa SA, Manassaram-Baptiste D, Saslow D, et al. Cancer screening in the United States, 2018: a review of current american cancer society guidelines and current issues in cancer screening. CA Cancer J Clin. 2018;68(4):297–316.

    Article  PubMed  Google Scholar 

  4. Schiller-Frühwirth IC, Jahn B, Arvandi M, Siebert U. Cost-effectiveness models in breast cancer screening in the general population: a systematic review. Appl Health Econ Health Policy. 2017;15(3):333–51.

    Article  PubMed  Google Scholar 

  5. Bannister N, Broggio J. Cancer survival by stage at diagnosis for England (experimental statistics): adults diagnosed 2012, 2013 and 2014 and followed up to 2015. Produced in collaboration with Public Health England. 2016.

    Google Scholar 

  6. Schiffman JD, Fisher PG, Gibbs P. Early detection of cancer: past, present, and future. Am Soc Clin Oncol Educ Book. 2015;35(1):57–65.

    Article  Google Scholar 

  7. Srivastava S, Koay EJ, Borowsky AD, De Marzo AM, Ghosh S, Wagner PD, et al. Cancer overdiagnosis: a biological challenge and clinical dilemma. Nat Rev Cancer. 2019;19(6):349–58.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Wan JC, Massie C, Garcia-Corbacho J, Mouliere F, Brenton JD, Caldas C, et al. Liquid biopsies come of age: towards implementation of circulating tumour dna. Nat Rev Cancer. 2017;17(4):223.

    Article  CAS  PubMed  Google Scholar 

  9. Alix-Panabières C, Pantel K. Liquid biopsy: from discovery to clinical application. Cancer Discov. 2021;11(4):858–73.

    Article  PubMed  Google Scholar 

  10. Ivano A, Riccardo B, Pierluigi B, Buonomo OC, Eleonora C, Marcello C, et al. Liquid biopsies and cancer omics. Cell Death Dis. 2020;6(1):1–8.

    Article  Google Scholar 

  11. Geyer PE, Voytik E, Treit PV, Doll S, Kleinhempel A, Niu L, et al. Plasma proteome profiling to detect and avoid sample-related biases in biomarker studies. EMBO Mol Med. 2019;11(11):10427.

    Article  Google Scholar 

  12. Geyer PE, Holdt LM, Teupser D, Mann M. Revisiting biomarker discovery by plasma proteomics. Mol Syst Biol. 2017;13(9):942.

    Article  PubMed  PubMed Central  Google Scholar 

  13. Uzozie AC, Aebersold R. Advancing translational research and precision medicine with targeted proteomics. J Proteome. 2018;189:1–10.

    Article  CAS  Google Scholar 

  14. Xia J, Broadhurst DI, Wilson M, Wishart DS. Translational biomarker discovery in clinical metabolomics: an introductory tutorial. Metabolomics. 2013;9(2):280–99.

    Article  CAS  PubMed  Google Scholar 

  15. Roig B, Rodríguez-Balada M, Samino S, Lam EW-F, Guaita-Esteruelas S, Gomes AR, et al. Metabolomics reveals novel blood plasma biomarkers associated to the brca1-mutated phenotype of human breast cancer. Sci Rep. 2017;7(1):1–9.

    Article  CAS  Google Scholar 

  16. Han X, Wang J, Sun Y. Circulating tumor dna as biomarkers for cancer detection. Genomics Proteomics Bioinformatics. 2017;15(2):59–72.

    Article  PubMed  PubMed Central  Google Scholar 

  17. Otandault A, Anker P, Dache ZAA, Guillaumon V, Meddeb R, Pastor B, et al. Recent advances in circulating nucleic acids in oncology. Ann Oncol. 2019;30(3):374–84.

    Article  CAS  PubMed  Google Scholar 

  18. Abbosh C, Birkbak NJ, Wilson GA, Jamal-Hanjani M, Constantin T, Salari R, et al. Phylogenetic ctdna analysis depicts early-stage lung cancer evolution. Nature. 2017;545(7655):446–51.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Poste G. Bring on the biomarkers. Nature. 2011;469(7329):156–7.

    Article  CAS  PubMed  Google Scholar 

  20. Heise HM. Biomedical vibrational spectroscopy-technical advances. Biomed Vibrational Spectrosc. 2008:9–37.

  21. Griffiths PR, De Haseth JA. Fourier transform infrared spectrometry, vol. 171. Hoboken: Wiley; 2007.

  22. Huber M, Kepesidis KV, Voronina L, Božić M, Trubetskov M, Harbeck N, et al. Stability of person-specific blood-based infrared molecular fingerprints opens up prospects for health monitoring. Nat Commun. 2021;12(1):1–10.

    Article  Google Scholar 

  23. Kepesidis KV, Huber M, Voronina L, Božić M, Trubetskov M, Krausz F, et al. Do infrared molecular fingerprints of individuals exist? Lessons from spectroscopic analysis of human blood. In: The European conference on lasers and electro-optics. Piscataway: Optical Society of America, IEEE; 2019. p. 8.

  24. Voronina L, Leonardo C, Mueller-Reif JB, Geyer PE, Huber M, Trubetskov M, Kepesidis KV, Behr J, Mann M, Krausz F, et al. Molecular origin of blood-based infrared spectroscopic fingerprints. Angewandte Chemie International edition. 2021.

    Google Scholar 

  25. Backhaus J, Mueller R, Formanski N, Szlama N, Meerpohl H-G, Eidt M, et al. Diagnosis of breast cancer with infrared spectroscopy from serum samples. Vib Spectrosc. 2010;52(2):173–7.

    Article  CAS  Google Scholar 

  26. Ghimire H, Garlapati C, Janssen EA, Krishnamurti U, Qin G, Aneja R, et al. Protein conformational changes in breast cancer sera using infrared spectroscopic analysis. Cancers. 2020;12(7):1708.

    Article  CAS  PubMed Central  Google Scholar 

  27. Elmi F, Movaghar AF, Elmi MM, Alinezhad H, Nikbakhsh N. Application of ft-ir spectroscopy on breast cancer serum analysis. Spectrochim Acta A Mol Biomol Spectrosc. 2017;187:87–91.

    Article  CAS  PubMed  Google Scholar 

  28. Zelig U, Barlev E, Bar O, Gross I, Flomen F, Mordechai S, et al. Early detection of breast cancer using total biochemical analysis of peripheral blood components: a preliminary study. BMC Cancer. 2015;15(1):1–10.

    Article  CAS  Google Scholar 

  29. Anderson D, Anderson R, Moug S, Baker M. Liquid biopsy for cancer diagnosis using vibrational spectroscopy: systematic review. BJS Open. 2020;4(4):554.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Diem M. Comments on recent reports on infrared spectral detection of disease markers in blood components. J Biophotonics. 2018;11(7):201800064.

    Article  Google Scholar 

  31. Sangster T, Major H, Plumb R, Wilson AJ, Wilson ID. A pragmatic and readily implemented quality control strategy for hplc-ms and gc-ms-based metabonomic analysis. Analyst. 2006;131(10):1075–8.

    Article  CAS  PubMed  Google Scholar 

  32. Cohen J. Statistical power analysis for the behavioral sciences. New York: Academic Press; 2013.

  33. Boothby M, Rickert RC. Metabolic regulation of the immune humoral response. Immunity. 2017;46(5):743–55.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Hortobagyi G, Connolly J, D’Orsi C, Edge S, Mittendorf E, Rugo H, et al. Breast. ajcc cancer staging manual. Chicago: American College of Surgeons (ACS); 2017.

    Google Scholar 

  35. Hughes C, Brown M, Clemens G, Henderson A, Monjardez G, Clarke NW, et al. Assessing the challenges of fourier transform infrared spectroscopic analysis of blood serum. J Biophotonics. 2014;7(3-4):180–8.

    Article  CAS  PubMed  Google Scholar 

Download references


We would like to thank Daniel Meyer, Jacqueline Hermann, Stefan Jungblut, Liudmila Voronina and Michael Trubetskov for their help with this study. In particular, we wish to acknowledge the efforts of many individuals who participated as volunteers in the clinical study reported here.


This work was funded by the King Saud University (KSU, in the framework of the ECDL collaboration), Center for Advanced Laser Applications (CALA) and Department of Laser Physics of the Ludwig Maximillian University Munich (LMU), and the Max Planck Institute of Quantum Optics (MPQ), Laboratory for Attosecond Physics, Germany. Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations



Project initiation: AMA. Initiation, coordination and supervision of clinical study: JMN, KA, SK, MRKB. Conceptualization: MZ, JMN, FK. Clinical methodology: JMN, FD, MRKB Spectroscopic methodology: MH, MB. Statistical methodology: KVK. Measurements: MB. Data analysis: KVK, MB. Supervision of experimental measurements: MZ, FK. Clinical sample and data collection, processing: NAA, AA, AAD, MAG, MAH, AS, FD, MA. Writing – original draft: KVK, MZ. Review & editing of manuscript: all authors. The author(s) read and approved the final manuscript.

Corresponding author

Correspondence to Kosmas V. Kepesidis.

Ethics declarations

Ethics approval and consent to participate

The study was reviewed and approved by Institutional Review Board (IRB) of KSU, Project-Number E-16-1894, prior to specific protocol procedures and in accordance with regulatory requirements. All participants agreed to and signed the written consent form prior to enrollment into the study.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kepesidis, K.V., Bozic-Iven, M., Huber, M. et al. Breast-cancer detection using blood-based infrared molecular fingerprints. BMC Cancer 21, 1287 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: