Prescreening of tumor samples for tumor-centric transcriptome analyses of lung adenocarcinoma
BMC Cancer volume 22, Article number: 1186 (2022)
Single-cell RNA sequencing (scRNA-seq) enables the systemic assessment of intratumoral heterogeneity within tumor cell populations and in diverse stromal cells of the tumor microenvironment. Gain of treatment resistance during tumor progression or drug treatment are important subjects of tumor-centric scRNA-seq analyses, which are hampered by scarce tumor cell portions. To guarantee the inclusion of tumor cells in the data analysis, we developed a prescreening strategy for lung adenocarcinoma.
We obtained candidate genes that were differentially expressed between normal and tumor cells, excluding stromal cells, from the scRNA-seq data. Tumor cell-specific expression of the candidate genes was assessed via real-time reverse transcription-polymerase chain reaction (RT-PCR) using lung cancer cell lines, normal vs. lung cancer tissues, and lymph node biopsy samples with or without metastasis.
We found that CEA cell adhesion molecule 5 (CEACAM5) and high mobility group box 3 (HMGB3) were reliable markers for RT-PCR-based prescreening of tumor cells in lung adenocarcinoma.
The prescreening strategy using CEACAM5 and HMGB3 expression facilitates tumor-centric scRNA-seq analyses of lung adenocarcinoma.
Tumor heterogeneity is responsible for treatment resistance in cancer, involving outgrowth of pre-existing subclones or acquisition of resistance traits . Single-cell genomic analysis provides a systemic tool for studying tumor heterogeneity at both DNA and RNA levels . While DNA-level intratumoral heterogeneity can be addressed by variant allele frequencies in bulk sequencing data, RNA or gene expression level heterogeneity requires single-cell methods because of its quantitative nature. In early studies, large-scale single-cell RNA sequencing (scRNA-seq) analyses of cancer focused on the primary tumor landscape, depicting both tumor and microenvironmental cell populations [3, 4]. Current applications have shifted to comparative studies of different regions, conditions, and patients to gain clinical insights into treatment resistance and patient stratification [5, 6], which substantiated the need for appropriate sample selection.
Lung adenocarcinoma is the major cancer type that benefits from molecular targeted therapies, including tyrosine kinase inhibitors targeting the epidermal growth factor receptor (EGFR) mutations or ALK, EMAP-like 4, and neurotrophic receptor tyrosine kinase fusions . Patients harboring these somatic alterations and responding to targeted therapy eventually develop treatment resistance, and it is critical to understand the underlying mechanisms to achieve long-term survival . For example, secondary EGFR mutations (T790M or C797S) confer resistance to EGFR-targeted tyrosine kinase inhibitors [9, 10]. Activation of the salvage signaling pathway in MET, hepatocyte growth factor, AXL, Hh, and insulin-like growth factor 1 receptor also leads to resistance to EGFR-targeted therapies . Study designs to compare before and after molecular targeted therapies or in responders and non-responders provide valuable opportunities to understand the mechanisms of treatment resistance. One hurdle in such study designs is the absence of tumor cells in the specimens, which results in the exclusion of precious data . Ensuring the presence of tumor cells before single-cell experiments can save time and resources.
Several strategies that determine the presence or proportion of tumor cells may serve different purposes. First, histological evaluation of tissue sections is the standard diagnostic process for determining tumor type and stage . Second, computational methods estimate tumor purity from genomic data at both the DNA and RNA levels. For example, the ABSOLUTE  algorithm infers tumor purity and ploidy from somatic DNA alterations in whole-genome sequencing data. Purity and ploidy information are critical for determining sub-clonal structures and tumor evolution. In comparison, the ESTIMATE  method uses gene expression data to infer tumor cellularity and stromal/immune cell fractions. Third, flow cytometry or real-time polymerase chain reaction (PCR) can be used to monitor micrometastases  or minimal/measurable residual disease during or after leukemia treatment . The detection sensitivity of PCR-based methods is typically less than 0.01% , which is much higher than that of histological evaluation or genomic inference studies. The high sensitivity and simple experimental procedure that can be incorporated into the scRNA-seq pipeline make the real-time PCR approach the preferred prescreening method.
In this study, we aimed to develop a sample selection strategy for lung adenocarcinoma for tumor-centric analysis of scRNA-seq data. First, target gene selection was achieved using public scRNA-seq data, by cell type specification and differentially expressed gene analysis focusing on tumor cells. We then tested the candidate gene expression using real-time PCR in lung cancer cell lines, normal vs. tumor tissues, and lymph nodes with or without metastasis. Among the four candidate genes, CEA cell adhesion molecule 5 (CEACAM5) and high mobility group box 3 (HMGB3) distinguished the tumor from normal tissues and recapitulated tumor cellularity in single-cell transcriptome data. Based on these results, we recommend sample prescreening using multigene real-time PCR for beta-actin (ACTB), CEACAM5, and HMGB3 to ensure the presence of tumor cells.
The present study was reviewed and approved by the Institutional Review Board (IRB) of the Samsung Medical Center (SMC, Seoul, Korea) (IRB no. 2010–04–039-052). The individuals in this manuscript have given written informed consent. Tumor, distant normal lung, and normal lymph node tissues were obtained during conserving surgery at the SMC from seven patients diagnosed with lung cancer. Metastatic lymph nodes were collected from patients with lung cancer using endobronchial ultrasound and bronchoscopy. A total of 14 samples were collected and immediately snap-frozen in liquid nitrogen or dissociated.
Human cancer cell lines
The human non-small cell lung cancer (NSCLC) cell lines A549 (CCL-185), NCI-H2228 (CRL-5935), HCC827 (KCLB70827), HCC1588 (KCLB71588), NCI-H854 (KCLB90854), HCC1833 (KCLB 71833) and HCC1195 (KCLB71195) were purchased from American Type Culture Collection (Manassas, VA, USA) and Korean Cell Line Bank (Seoul, Korea). Each cell line was cultured in the Roswell Park Memorial Institute-1640 medium (22400–089; Gibco, Waltham, MA, USA) supplemented with 10% fetal bovine serum (16000–044; Gibco, Waltham, MA, USA) at 37 °C in 5% CO2.
RNA extraction and cDNA synthesis
Total RNA was extracted from the samples using the Qiagen RNeasy mini kit reagent (74104; Qiagen, Hilden, Germany), according to the manufacturer’s instructions. The quantity and quality of extracted RNA were assessed using a NanoDrop 2000 spectrophotometer (Thermo Scientific, Wilmington, DE, USA). cDNA was synthesized with an appropriate amount of RNA using the ReverTra AceTM qPCR RT Kit (TOFSQ-101; TOYOBO Co., Ltd., Osaka, Japan), according to the manufacturer’s recommendations. After RNA denaturation at 65 °C for 5 min, 1 μg of total RNA was diluted in 10 μL of reaction mixture containing 2 μL 5X RT buffer, 0.5 μL enzyme mix, 0.5 μL Primer mix, and water. The reaction mixture was incubated at 37 °C for 15 min. The cDNA product was further diluted four-fold with RNase-free water and used directly for real-time PCR.
The amplified cDNA samples were obtained in the library preparation step using Chromium Single Cell 5′ Library & Gel Bead Kit v1.1 (scRNA-Seq)  and Chromium Single Cell 3′ Library & Gel Bead Kit v3 (snRNA-Seq), according to the manufacturer’s recommendations.
Real-time quantitative PCR
Real-time PCR was performed in a 96-well reaction plate (HSP9601; Bio-Rad Laboratories, Hercules, CA, USA) sealed with an adhesive film (MSB1001; Bio-Rad Laboratories, Hercules, CA, USA). Expression analysis of gene of interest (GOI) was performed using the Bio-Rad CFX96 Touch system and PrimeTime Gene Expression Master Mix (1055770; IDT, Coralville, IA, USA) with a predesigned primer and probe mix (Supplementary Table 1). Real-time PCR was performed according to the manufacturer’s instructions. All PCR were run in duplicate, and a non-template control was used for each run. Raw real-time PCR data were analyzed using CFX Manager 3.1, (1845000; Bio-Rad Laboratories, Hercules, CA, USA; https://www.bio-rad.com/ko-kr/sku/1845000-cfx-manager-software?ID=1845000) and PCR replication efficiency and CT numbers were obtained for each reaction. Raw data were transformed into a standard input format for plotting. Microsoft Excel was used to calculate the mean Cq, ΔCq, ΔΔCq, fold change, and log(fold change + 1): ΔCq = Cq GOI – Cq ACTB, ΔΔCq = ΔCq GOI – Normal group ΔCq value within the same batch. Relative fold change was determined using 2-ΔΔCT.
Acquisition and analyses of single-cell and bulk RNA-seq data
Raw unique molecular identifier (UMI) gene-cell-barcode matrix derived from single-cell RNA sequencing data from patients with lung adenocarcinoma and their cell identity was downloaded from the National Center for Biotechnology Information Gene Expression Omnibus database (GSE131907) . The UMI count for genes in each cell was log-normalized using the NormalizeData function of the Seurat R package .
RNA sequencing data for 1019 human cancer cell lines were obtained from the Cancer Cell Line Encyclopedia (CCLE) depmap portal (https://depmap.org/portal/download/) . Expression levels were normalized as (log2 RPKM + 1), where RPKM represents reads per kilobase of transcript per million mapped reads for the genes in each sample.
RNA sequencing data from lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) samples were obtained from The Cancer Genome Atlas (TCGA) data portal (https://portal.gdc.cancer.gov/) . This dataset included 533 primary tumor and 59 normal samples from TCGA LUAD and 502 primary tumor and 49 normal samples from TCGA LUSC. Expression levels were quantified as (log2 FPKM-UQ + 1), where FPKM-UQ refers to the upper quartile fragments per kilobase per million mapped reads for genes in each sample. Violin plots of gene expression for tumor and normal samples were generated using the geom_violin function of the ggplot2 R package.
Selection of tumor-specific genes
Significantly expressed genes for early-stage lung tumor (tLung), late-stage lung tumor (tL/B), and metastatic lymph node (mLN) compared to normal lung (nLung) were identified using the FindMarkers function (default parameters) of the Seurat package. Genes that were differentially expressed in each sample group were listed using the FindAllMarkers function (default parameters) in the Seurat package. The Wilcoxon rank-sum test with Bonferroni correction was used to calculate the significance of differences. We selected genes with log fold change > 0.25, p-value < 0.01, and adjusted p-value (Bonferroni) < 0.01, considering the fraction of expressing cells (> 25% of cells in either cell group, denoted as pct).
All methods were performed in accordance with the relevant guidelines and regulations.
Schematic to identify genes for tumor prescreening
Single-cell RNA sequencing data generated from normal or tumor tissues of patients with lung adenocarcinoma  were used to identify target genes indicative of tumor cell presence or proportions. For tumor-centric analysis, we extracted gene expression data only for malignant cells present in the tumor and compared them with normal epithelial cells (Fig. 1). Malignant cells are derived from various sources, including primary lung tumors (tLung and tL/B), metastatic lymph nodes (mLN), or brain metastases (mBrain). Normal epithelial cells were obtained from distant normal tissues of patients with tumors (nLung). We applied two analytical strategies to increase the specificity of the prescreening target genes to determine the extent of tumor cells. First, pairwise comparisons between tumor and normal sample groups (tLung vs. nLung, tL/B vs. nLung, and mLN vs. nLung) focused on genes upregulated in tumor cells compared with normal epithelial cells. Second, multi-set comparisons among all sample groups scanned genes specifically expressed in each tumor group. Among the genes with statistical significance in both comparisons, candidates were refined to test for the presence of tumor cells by real-time PCR. The expression profiles of candidate genes were also checked using RNA-seq data for cancer cell lines (CCLE)  and lung cancer patients (The Cancer Genome Atlas, TCGA) . This approach provides genes exhibiting tumor cell-specific expression, allowing for the prescreening of samples harboring lung cancer cells.
Tumor cell-specific gene selection in lung cancer
Following the schematics, we first listed the genes differentially expressed between malignant cells of the tumor (tLung, tL/B, and mLN) and normal epithelial cells (nLung) (Fig. 2A). Sets of 701, 1215, and 1173 genes were identified as significantly dysregulated in tumors (tLung, tL/B, and mLN, respectively) (Supplementary Table 2). Among them, 599 genes were significantly upregulated in tumor cells in at least two tumor groups compared to those in normal cells (Fig. 2B). Next, in the comparisons of multiple sample groups, we identified 3120 dysregulated genes specific to each sample group (Fig. 2C; Supplementary Table 2). We selected CEACAM5, HMGB3, plasminogen activator urokinase (PLAU), and argininosuccinate synthase 1 (ASS1) genes that were consistently denoted as the top-ranked upregulated genes in both comparisons. The association of lung cancer with selected tumor cell-specific genes, except ASS1, has been supported by previous studies. CEACAM5 levels have been suggested to serve as prognostic determinants [23, 24] and have been correlated with metastatic lymph node tumor burden . HMGB3 expression was detected in circulating tumor cells in the peripheral blood of patients with lung cancer . PLAU has been established as a prognostic marker for patients with lung cancer . Tumor cell-specific expression of the selected genes was confirmed at the raw expression level (UMI) (Fig. 2D). These genes were overexpressed in tumor cells, with slight variations and low expression levels in all normal samples (Fig. 2E).
Target genes for the prescreening of tumor cells must have specific expression at cellular resolution. Prescreening using whole tumor tissue can be ambiguous if the gene is also expressed in the tumor stroma or in infiltrating immune cells. Therefore, the expression levels of candidate genes were compared between the cell types in each sample group (Fig. 3; Supplementary Fig. 1). The CEACAM5, HMGB3, and ASS1 genes were specifically expressed in tumor cells from the tumor sample groups (tLung, tL/B, mLN, and mBrain). PLAU expression was detected not only in tumor cells, but also in fibroblasts and myeloid cells. These results indicate that CEACAM5, HMGB3, and ASS1 are more reliable candidates than PLAU for the prescreening of tumor cells.
Real-time PCR screening of lung cancer for tumor cell-specific gene expression
To confirm the expression of candidate genes in lung cancer specimens, we initially applied real-time RT-PCR (Supplementary Table 1) to the lung cancer cell lines A549, H2228, HCC827, HCC1195, HCC1588, and HCC1833 which were selected based on the CCLE (Supplementary Fig. 2A). Recapitulating the CCLE data, relatively high PLAU expression and low CEACAM5 expression were detected in H2228 cells (Supplementary Fig. 2B). HCC827 and HCC1833 cells expressed high levels of CEACAM5 (Supplementary Fig. 2C). To assess expression changes according to the tumor cell ratio, we spiked the cDNAs of H2228 cell line into those of normal lung tissues (Supplementary Fig. 2D). In the assessment of HMGB3, PLAU, and ASS1, the PCR products increased gradually with increasing amounts of H2228 cDNAs up to 60–80% and plateaued. Similarly, addition of HCC1833 cDNAs increased the CEACAM5 signal (Supplementary Fig. 2E).
After the cell line test, we used non-small cell lung cancer (NSCLC) patient samples and compared target gene expression between the tumor and distant normal tissues (Fig. 4A). CEACAM5 and HMGB3 showed significant differences in expression between the two groups, and PLAU and ASS1 showed slightly higher expression in tumor tissues, but the difference was not statistically significant. Differential expression between the tumor and normal samples was confirmed in various sample preparation stages and methods (Fig. 4B-D). Similarly, a difference in the expression levels of CEACAM5 and HMGB3 was observed in lymph node samples with or without metastasis (Fig. 4E). Pairwise comparisons of matched normal and tumor samples provided clearer decision criteria for tumor cell positivity. Without a matched normal sample, tumor positivity was determined for samples with > 10% tumor cell content (Supplementary Table 3). To apply the prescreening process as a single-tube reaction, we performed multiplex RT-PCR analyses using CEACAM5, HMGB3, and ACTB probes with different fluorescence dye formats, which resulted in consistent tumor-specific detection (Fig. 4F).
Altogether, these results suggest that real-time PCR screening of CEACAM5 and HMGB3 can be used to confirm the presence of tumor cells in lung adenocarcinoma specimens of both tissue and lymph node origin, as well as in cDNAs and single-cell or nuclear RNA sequencing libraries.
Validation of tumor-specific gene expression using public datasets
To further investigate whether the expression levels predicted the proportion of tumor cells, we calculated the correlation between gene expression levels measured by real-time PCR and the percentage of tumor cells obtained from single-cell sequencing data  (Fig. 5A). Overall, the four candidate genes showed a positive association, yet the correlation coefficient was small, likely because of the large variation in cellular expression levels. Among them, HMGB3 expression showed the highest correlation with the tumor cell proportion.
Next, we examined the lung cancer cohort from TCGA  to determine differential expression of the four genes between normal and tumor at the bulk tissue level. As shown in Fig. 5B, CEACAM5 and ASS1 were specifically expressed in the lung tumor samples. HMGB3 transcripts were not detected in any of the samples, and PLAU expression was not significantly different between the normal and tumor tissues. These data demonstrate the variation in sensitivity and specificity among the different gene detection and sample preparation methods. Taken together, the detection of CEACAM5 and HMGB3 by real-time PCR was suitable for sample prescreening before single-cell or nuclear sequencing experiments requiring the presence of tumor cells.
The power of single-cell RNA sequencing has made this technique a mainstream tool in cell biology to study normal development and differentiation processes, and to define cellular alterations in diseases. There is a need for versatile data generation for hypothesis testing and appropriate sample selection; however, proper guidelines are not available. During the experimental design process, we encountered a situation in which the tumor cell content was too low to perform a tumor-centric data analysis.
To study a tumor subpopulation using a single-cell genomics approach, choices can be made on whether to sort and enrich the target population or to perform all-inclusive analysis after ensuring tumor cell presence . Both approaches have their own merits, the latter requiring no prior knowledge for sorting and allowing inference of cellular interactions between the tumor cells and the support cells in the tumor microenvironment . Cellular composition in the tumor microenvironment and communication with tumor cells changes over time during tumor progression, metastasis, and treatment resistance. Therefore, the unsorted study design ensuring tumor cell presence in the microenvironmental context helps to elucidate disease-associated alterations of the tumor and support cell interactions, which could be a good target for therapeutic intervention.
As a prescreening strategy to ensure tumor cell inclusion in lung adenocarcinoma, we selected four genes showing tumor cell-specific gene expression from publicly available scRNA-seq data and adopted real-time PCR on cDNAs or RNA sequencing libraries of the study objects. The simplicity and reliability of real-time PCR make it the preferred prognostic gene expression testing platform for early-stage breast cancer . During candidate gene expression testing for lung cancer, we found unexpected discrepancies between scRNA-seq and real-time PCR results. These discrepancies may be explained by the different dynamic ranges of each gene detection method , individual cell or population level measurements, and cell- vs. tissue-level gene expression analysis. Since the aim of this study was to develop a sample selection strategy for single-cell or nuclear RNA sequencing analysis, CEACAM5 and HMGB3, which showed the best results in cell-level data, were selected as the final target genes. The use of this sample selection strategy will facilitate the efficient design of tumor-centric single-cell/nucleus genomic analyses.
To guarantee tumor-centric analysis of lung cancer, we selected tumor cell-specific genes from the scRNA-seq data and performed real-time PCR to distinguish samples with or without tumor cell presence. We suggest CEACAM5 and HMGB3 as prescreening markers for single-cell or nuclear sequencing experiments.
Availability of data and materials
CEA cell adhesion molecule 5
High mobility group box 3
Plasminogen activator urokinase
Argininosuccinate synthase 1
Epidermal growth factor receptor
Fetal bovine serum
Gene of interest
Reads per kilobase per million mapped reads
Reverse transcription-polymerase chain reaction
Single-cell RNA sequencing
Unique molecular identifier
Early-stage lung tumor
Late-stage lung tumor
Metastatic lymph node
Cancer Cell Line Encyclopedia
The Cancer Genome Atlas
Lung squamous cell carcinoma
Dagogo-Jack I, Shaw AT. Tumour heterogeneity and resistance to cancer therapies. Nat Rev Clin Oncol. 2018;15(2):81–94.
Navin NE. The first five years of single-cell cancer genomics and beyond. Genome Res. 2015;25(10):1499–507.
Tirosh I, Izar B, Prakadan SM, Wadsworth MH 2nd, Treacy D, Trombetta JJ, et al. Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq. Science. 2016;352(6282):189–96.
Lambrechts D, Wauters E, Boeckx B, Aibar S, Nittner D, Burton O, et al. Phenotype molding of stromal cells in the lung tumor microenvironment. Nat Med. 2018;24(8):1277–89.
Suva ML, Tirosh I. Single-cell RNA sequencing in cancer: lessons learned and emerging challenges. Mol Cell. 2019;75(1):7–12.
Gonzalez Castro LN, Tirosh I, Suva ML. Decoding Cancer biology one cell at a time. Cancer Discov. 2021;11(4):960–70.
Schrank Z, Chhabra G, Lin L, Iderzorig T, Osude C, Khan N, et al. Current molecular-targeted therapies in NSCLC and their mechanism of resistance. Cancers (Basel). 2018;10(7):224.
Lin JJ, Shaw AT. Resisting resistance: targeted therapies in lung cancer. Trends Cancer. 2016;2(7):350–64.
Kobayashi S, Boggon TJ, Dayaram T, Janne PA, Kocher O, Meyerson M, et al. EGFR mutation and resistance of non-small-cell lung cancer to gefitinib. N Engl J Med. 2005;352(8):786–92.
Thress KS, Paweletz CP, Felip E, Cho BC, Stetson D, Dougherty B, et al. Acquired EGFR C797S mutation mediates resistance to AZD9291 in non-small cell lung cancer harboring EGFR T790M. Nat Med. 2015;21(6):560–2.
Morgillo F, Della Corte CM, Fasano M, Ciardiello F. Mechanisms of resistance to EGFR-targeted drugs: lung cancer. ESMO Open. 2016;1(3):e000060.
Kim L, Tsao MS. Tumour tissue sampling for lung cancer management in the era of personalised therapy: what is good enough for molecular testing? Eur Respir J. 2014;44(4):1011–22.
Girard N, Deshpande C, Lau C, Finley D, Rusch V, Pao W, et al. Comprehensive histologic assessment helps to differentiate multiple lung primary nonsmall cell carcinomas from metastases. Am J Surg Pathol. 2009;33(12):1752–64.
Carter SL, Cibulskis K, Helman E, McKenna A, Shen H, Zack T, et al. Absolute quantification of somatic DNA alterations in human cancer. Nat Biotechnol. 2012;30(5):413–21.
Yoshihara K, Shahmoradgoli M, Martinez E, Vegesna R, Kim H, Torres-Garcia W, et al. Inferring tumour purity and stromal and immune cell admixture from expression data. Nat Commun. 2013;4:2612.
D'Cunha J, Corfits AL, Herndon JE 2nd, Kern JA, Kohman LJ, Patterson GA, et al. Molecular staging of lung cancer: real-time polymerase chain reaction estimation of lymph node micrometastatic tumor cell burden in stage I non-small cell lung cancer--preliminary results of Cancer and leukemia group B trial 9761. J Thorac Cardiovasc Surg. 2002;123(3):484–91 discussion 491.
Neale GA, Coustan-Smith E, Stow P, Pan Q, Chen X, Pui CH, et al. Comparative analysis of flow cytometry and polymerase chain reaction for the detection of minimal residual disease in childhood acute lymphoblastic leukemia. Leukemia. 2004;18(5):934–8.
Kerst G, Kreyenberg H, Roth C, Well C, Dietz K, Coustan-Smith E, et al. Concurrent detection of minimal residual disease (MRD) in childhood acute lymphoblastic leukaemia by flow cytometry and real-time PCR. Br J Haematol. 2005;128(6):774–82.
Kim N, Kim HK, Lee K, Hong Y, Cho JH, Choi JW, et al. Single-cell RNA sequencing demonstrates the molecular and cellular reprogramming of metastatic lung adenocarcinoma. Nat Commun. 2020;11(1):2285.
Stuart T, Butler A, Hoffman P, Hafemeister C, Papalexi E, Mauck WM 3rd, et al. Comprehensive integration of single-cell data. Cell. 2019;177(7):1888–1902 e1821.
Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, Kim S, et al. The Cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature. 2012;483(7391):603–7.
Grossman RL, Heath AP, Ferretti V, Varmus HE, Lowy DR, Kibbe WA, et al. Toward a shared vision for Cancer genomic data. N Engl J Med. 2016;375(12):1109–12.
Kozu Y, Maniwa T, Takahashi S, Isaka M, Ohde Y, Nakajima T. Prognostic significance of postoperative serum carcinoembryonic antigen levels in patients with completely resected pathological-stage I non-small cell lung cancer. J Cardiothorac Surg. 2013;8:106.
Okada M, Nishio W, Sakamoto T, Uchino K, Yuki T, Nakagawa A, et al. Prognostic significance of perioperative serum carcinoembryonic antigen in non-small cell lung cancer: analysis of 1,000 consecutive resections for clinical stage I disease. Ann Thorac Surg. 2004;78(1):216–21.
Hayes DC, Secrist H, Bangur CS, Wang T, Zhang X, Harlan D, et al. Multigene real-time PCR detection of circulating tumor cells in peripheral blood of lung cancer patients. Anticancer Res. 2006;26(2B):1567–75.
Di Bernardo MC, Matakidou A, Eisen T, Houlston RS, Consortium G. Plasminogen activator inhibitor variants PAI-1 A15T and PAI-2 S413C influence lung cancer prognosis. Lung Cancer. 2009;65(2):237–41.
Nguyen QH, Pervolarakis N, Nee K, Kessenbrock K. Experimental considerations for single-cell RNA sequencing approaches. Front Cell Dev Biol. 2018;6:108.
Armingol E, Officer A, Harismendy O, Lewis NE. Deciphering cell-cell interactions and communication from gene expression. Nat Rev Genet. 2021;22(2):71–88.
Bartlett JM, Bayani J, Marshall A, Dunn JA, Campbell A, Cunningham C, et al. Comparing breast cancer multiparameter tests in the OPTIMA prelim trial: no test is more equal than the others. J Natl Cancer Inst. 2016;108(9):djw050.
Costa C, Gimenez-Capitan A, Karachaliou N, Rosell R. Comprehensive molecular screening: from the RT-PCR to the RNA-seq. Transl Lung Cancer Res. 2013;2(2):87–91.
We would like to thank Minsu Na for providing technical assistance.
This research was supported by National Research Foundation (NRF) funded by the Ministry of Science & ICT (MSIT), grant number NRF-2017M3C9A6044636, 2019M3A9B6064691, and 2016R1A5A1011974. N. K. and H. E. are supported by Basic Science Research Program through NRF funded by the Ministry of Education (NRF-2020R1I1A1A01065697 and NRF-2021R1I1A1A01043906). The authors wish to acknowledge the financial support of the Catholic Medical Center Research Foundation made in the program year of 2020.
Ethics approval and consent to participate
The present study was reviewed and approved by the Institutional Review Board (IRB) of Samsung Medical Center (SMC, Seoul, Korea) (IRB no. 2010–04–039-052). The individual in this manuscript has given written informed consent.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
t-distributed stochastic neighbor embedding (tSNE) plot colored based on the expression levels of candidate genes in each sample group.
Detection sensitivity of prescreening candidates in the cancer cell population. (A) Waterfall plots of expression of candidate genes and beta-actin (ACTB) in Cancer Cell Line Encyclopedia (CCLE) cancer cell lines. (B) Bar plot of candidate genes in H2228 cells. (C) Relative expression of CEACAM5 in six lung cancer cell lines compared to normal lung tissue. (D) Expression patterns of HMGB3, PLAU, and ASS1 according to the mixing ratio of H2228 and normal lung tissue cDNAs. (E) Expression pattern of CEACAM5 according to the mixing ratio of HCC1833 and normal lung tissue cDNAs.
Predesigned primers for targeted polymerase chain reaction (PCR).
List of differentially expressed genes specific to malignant lung cancer cells.
Tumor cell percentage estimated via single-cell RNA sequencing (scRNA-seq) of patient samples.
About this article
Cite this article
Kim, N., Jeong, D., Jo, A. et al. Prescreening of tumor samples for tumor-centric transcriptome analyses of lung adenocarcinoma. BMC Cancer 22, 1186 (2022). https://doi.org/10.1186/s12885-022-10317-9