Impact of viral presence in tumor on gene expression in non-small cell lung cancer
BMC Cancer volume 18, Article number: 843 (2018)
In our recent study, most non-small-lung cancer (NSCLC) tumor specimens harbored viral DNA but it was absent in non-neoplastic lung. However, their targets and roles in the tumor cells remain poorly understood. We analyzed gene expression microarrays to identify genes and pathways differentially altered between virus-infected and uninfected NSCLC tumors.
Gene expression microarrays of 30 primary and 9 metastatic NSCLC patients were preprocessed through a series of quality control analyses. Linear Models for Microarray Analysis and Gene Set Enrichment Analysis were used to assess differential expression.
Various genes and gene sets had significantly altered expressions between virus-infected and uninfected NSCLC tumors. Notably, 22 genes on the viral carcinogenesis pathway were significantly overexpressed in virus-infected primary tumors, along with three oncogenic gene sets. A total of 12 genes, as well as seven oncogenic and 133 immunologic gene sets, were differentially altered in squamous cell carcinomas, depending on the virus. In adenocarcinoma, 14 differentially expressed genes (DEGs) were identified, but no oncogenic and immunogenic gene sets were significantly altered. In bronchioloalveolar carcinoma, several genes were highly overexpressed in virus-infected specimens, but not statistically significant. Only five of 69 DEGs (7.2%) from metastatic tumor analysis overlapped with 1527 DEGs from the primary tumor analysis, indicating differences in host cellular targets and the viral impact between primary and metastatic NSCLC.
The differentially expressed genes and gene sets were distinctive among infected viral types, histological subtypes, and metastatic disease status of NSCLC. These results support the hypothesis that tumor viruses play a role in NSCLC by regulating host genes in tumor cells during NSCLC differentiation and progression.
Viruses and other infectious agents cause nearly 20% of all human cancers worldwide such as human papillomavirus (HPV) in cervical carcinoma and hepatitis B virus (HBV) in hepatocellular carcinoma . There is growing evidence that viruses play a critical role in cancer development as well as modulating the response to cancer treatment .
Using advanced panmicrobial microarray techniques with polymerase chain reaction (PCR) confirmation in our recent study, we searched for viral DNA sequences in archived frozen non-small cell lung cancer (NSCLC) tumor of various cell types . We found that the majority of NSCLC tumor samples contained viral DNA sequences from ten viral types including exogenous retroviruses and HPV while no viral DNA was detected in any adjacent non-neoplastic lung tissue samples. We also discovered that the susceptibility of lung cancer to viral infection generally varied across its cell types and by the types of viruses, suggesting that lung cancer subtypes could be associated with viral types residing in host cells. Interestingly, NSCLC patients with viral DNA present in their tumors had significantly longer overall survival than those not containing viral DNA.
However, the impact of viruses on host NSCLC tumor cells remains poorly understood and only few studies to date have investigated the roles of virus in the NSCLC . Viruses can cause cellular transformation by expression of viral oncogenes, by genomic integration to alter the activity of cellular proto-oncogenes or tumor suppressor genes and by inducing inflammation that promotes oncogene activity . We therefore hypothesized that viruses in human lung cancer cells have the potential ability to affect host cells by regulating expression levels of important genes, especially oncogenes and immune-related genes.
We expanded our previous analysis  to better understand the association of viral infections with host gene expressions in the NSCLC tumors by performing an extensive differential expression analysis of high-throughput gene expression profiling microarray data from the same fresh frozen archived NSCLC tumor specimens according to viral types, histological NSCLC subtypes, and metastatic disease status.
Clarification of target genes of viruses in human NSCLC tumor cells and the functions of the target genes will provide an opportunity to develop new prognostic and early diagnostic biomarkers of NSCLC as well as potential cancer prevention strategies.
Patients and microarray data
Florida residents who underwent surgical resection for NSCLC at Moffitt Cancer Center and consented to the Total Cancer Care protocol between 2000 and 2013 were eligible in our previous study for a viral DNA detection in NSCLC tumor tissue samples . Approval for the use of archived tissue and patient information was obtained from the University of South Florida IRB, Protocol No. MCC16765.
We randomly selected 70 archived frozen NSCLC tumor samples based on: 1) having enough volume of frozen tissue to perform the studies, 2) preoperative radiographs showed no pneumonia or distal atelectasis, and 3) no patient had chemotherapy or radiotherapy before resection. Resulting NSCLC samples encompass 10 primary adenocarcinomas, 10 bronchioloalveolar carcinomas (BAC, although currently termed adenocarcinomas with lepidic spread) and 30 squamous cell carcinomas (SCCs). In addition, we selected 10 resected stage IV tumors (three SCCs and seven adenocarcinomas) and their 10 matched surgically-resected distant oligometastases. Anatomic sites of those 10 metastatic tumor specimens were brain (n = 3), soft tissue (n = 2), adrenal (n = 4) and kidney (n = 1). 10 non-neoplastic lung specimens were also obtained as controls for this study.
All primary and metastatic tumor specimens underwent viral DNA screening tests by the Lawrence Livermore PanMicrobial Detection Array (LLMDA) designed to detect all sequenced viral families. The LLMDA was developed at the Lawrence Livermore National Laboratory (LLNL; Livermore, CA, USA) and designed to detect all sequenced viral and bacterial families, with appropriate controls [6, 7]. The 135 K format of the LLMDA (v.5) targets all vertebrate pathogens including 1856 viruses, 1398 bacteria, 125 archaea, 48 fungi, and 94 protozoa . In the development of this microarray, PCR was used extensively to validate the results and verify that the statistical algorithm was accurate [9,10,11]. The high-density oligo LLMDA and statistical analysis method has been extensively tested in numerous problems in viral and bacterial detection from pure or complex environmental or clinical samples [9, 12,13,14,15]. A subset of NSCLC tumors [10 squamous cell carcinomas (SCCs)] was evaluated using an oncovirus panel of the International Agency for Research on Cancer (46 HPV types, 10 polyomaviruses, and 5 herpesviruses) . In addition, all 70 NSCLC underwent HPV PCR genotyping using the INNO-LiPA Genotyping Extra Assay which detects 28 HPV genotypes classified as high or low risk, depending on their association with carcinogenesis. Details concerning the detection techniques, patient’s clinical characteristics and virus DNA detection results can be found in our previous study . For the current study, the Moffitt institutional honest broker retrieved 39 available gene expression microarrays from these same tumor samples, of which the platform was Rosetta/Merck Human RSTA Custom Affymetrix Genechip with 60,607 probe sets interrogating 25,285 genes.
All microarray data were normalized using the iterative rank-order normalization algorithm , and experimental batch effects and outlying observations were further examined by a two-way hierarchical cluster analysis based on sample-wise correlation matrix as a distance matrix and a principal component cluster analysis. Unsupervised and supervised approaches for differential gene expression analysis and interpretation of resultant genes were utilized to associate virus infection with gene expressions in lung tumor cells according to NSCLC histological subtypes and metastasis status to minimize their potential confounding effects.
The unsupervised approach first used Linear Models for Microarray Analysis (LIMMA) to identify individual genes with significant differential expression between virus-infected and uninfected tumor samples . Thereafter, a functional annotation and gene ontology (GO) analysis of the genes was performed by using Database for Annotation, Visualization and Integrated Discovery (DAVID)  and a database of virus-host protein-protein interactions (VirusMentha) .
For the supervised approach, Gene Set Enrichment Analysis (GSEA) was utilized to determine whether genes on a known biological or functional pathway have jointly concordant differential expressions between virus-infected and uninfected lung tumor specimens. Fifty hallmark, 189 oncogenic, and 4872 immunologic gene sets annotated in Molecular Signatures Database were subjected to GSEA . For a multiple testing correction, false discovery rate (FDR)-adjusted p-values were estimated in all LIMMA, GO, and GSEA analyses, and a statistical significance was defined when the FDR-adjusted p-value was less than 0.2.
Quality control analysis of gene expression microarrays
Sample-wise Spearman’s rank correlation coefficients of 39 microarrays over all probe sets ranged from 0.866 to 0.944. These high correlation coefficients indicated that a majority of genes had similar expression patterns across NSCLC tumor tissue samples regardless of their diverse histological subtypes and disease progression status. However, a two-way hierarchical cluster analysis and a principal component analysis of all microarrays revealed that all three brain metastatic NSCLC tumor tissue specimens (one SCC sample harboring the Y73 sarcoma virus (Y73SV) DNA, one uninfected SCC, and one uninfected adenocarcinoma samples) were clustered far apart from other primary lung tumor samples and the non-brain metastatic tumor samples, showing a brain-specific biological variation independent of viral infection status in gene expression profiles. We therefore excluded these three brain metastatic tumor samples from a differential gene expression analysis.
Differentially expressed genes and pathways between all NSCLC with and without any viral DNA
To identify differentially expressed genes (DEGs), LIMMA analysis was applied to a total of 36 NSCLC tumor samples: 21 samples harboring viral DNA of at least one viral type (Virus(+)) and 15 samples without any viral DNA (Virus(−)). This analysis identified 338 overexpressed and 301 underexpressed genes in Virus(+), as compared to Virus(−) (FDR p < 0.2; Additional file 1: Figure S1). For instance, PCYT1A was the most significantly overexpressed gene in Virus(+) (fold change (FC) = 2.24) whereas JMJD1C and CTNNB1 were the top two underexpressed genes with FC of 0.44 and 0.45, respectively. (Additional file 2: Table S1). Li et al. (2015) reported that PCYT1A catalyzes the rate-limiting step in synthesis of phosphatidylcholine that is required for replication of HBV. They also confirmed that PCYT1 was up-regulated at the transcriptional level in HBV-infected human hepatoblastoma cells . In addition, Vaezi et al. (2014) found that PCYT1A is the dominant determinant of 8F1 immunoreactivity in lung SCC samples and that high expression of PCYT1A was found to be prognostic of longer disease-free survival . JMJD1C is a component of DNA-damage response pathway with implications for tumorigenesis by demethylating MDC1 to regulate the RNF8 and BRCA1-mediated chromatin response to DNA breaks. Chen et al. (2016) identified that the gain of the miRNA regulation of MIR141 to JMJD1C resulted in the gain of protein-protein interaction between JMJD1C and GADD45A in hepatocarcinogenesis that is a multistep process mainly associated with persistent infection with HBV or hepatitis C virus . CTNNB1 is considered the cancer drivers for hepatocellular carcinoma development with variable frequencies depending on the etiology. A recent genome-wise RNAi screen revealed that a role of a WNT/CTNNB1 signaling pathway as negative regulator of virus-induced innate immune responses . Nakayama et al. (2014) also reported that pharmacological inhibition or conditional deletion of CTNNB1 inhibited lung tumor formation in transgenic mice . A functional annotation of those DEGs was performed using DAVID tool to gain insight into their biological functions. For the 338 overexpressed genes in Virus(+), the protein catabolic process was the most significantly over-represented function (29 hit genes, p < 0.001) followed by cytoskeleton and proteasome core complex. For the 301 underexpressed genes in Virus(+), six biological processes, such as Ras GTPase binding and RNA polymerase II promoter, were significantly enriched (Fig. 1a and b). GSEA was next performed to identify predefined sets of hallmark, oncogenic, and immunologic genes having concordantly differential expressions between the Virus(+) and Virus(−) NSCLC samples. As a result, three oncogenic gene sets, CSR_late.v1.up, mTOR_up.v1_up, Rb_P107_dn.v1_up, were found to be significantly altered with positive enrichment scores, meaning that a majority of genes in those gene sets were simultaneously overexpressed in the Virus(+) (Fig. 1c, d, and e). Among them, the CSR_late_up.v1_up gene set comprises 172 genes up-regulated in late serum response of human foreskin fibroblasts and associated with increased risk of metastasis and death in human lung, breast, and gastric cancer .
All primary NSCLCs: virus carcinogenesis and oncogenic gene sets enriched in virus-infected specimens
LIMMA was performed to identify DEGs between 20 virus-infected primary NSCLC tumor specimens (Virus(+)) and 10 uninfected specimens (Virus(−)). Seven hundred seventy-seven genes were significantly overexpressed in Virus(+) and 751 genes underexpressed (Additional file 3: Figure S2). To understand biological meaning behind these DEGs and discover enriched functional-related gene groups, a functional annotation enrichment analysis was performed using the DAVID tool. Table 1 showed their functional annotation results. For the overexpressed genes, the cell cycle was the most significantly overrepresented function (23 hit genes, p < 0.001). Strikingly, two virus-related biological processes, 1) viral carcinogenesis (22 hit genes) and 2) human T-cell lymphotropic virus Type I (HTLV-1) infection (23 hit genes), were also significantly overrepresented, along with several NSCLC tumorigenesis-related pathways, including proteasome pathway  and p53 signaling pathway . In particular, HPN (hepsin), ACTN4 (actinin alpha 4), and GP130 (interleukin 6 signal transducer) on the viral carcinogenesis pathway were known to be host cellular targets of three viral oncoproteins (HBx, Tax, and vIL-6) that lead to cell proliferation/survival, regulation of actin cytoskeleton, and proliferation angiogenesis, respectively (Fig. 2; Additional file 4: Figure S3) [30,31,32]. On the other hand, cAMP signaling, vascular smooth muscle contraction, and metabolic pathways were the most representative pathways for the underexpressed genes in Virus(+) (FDR p < 0.05). A recent study reported that the cAMP signaling augments radiation-induced apoptosis in NSCLC cells .
Using GSEA of hallmark gene sets, e2F transcription factor target, G2/M-checkpoint, mTORC1 signaling, and mitotic spindle assembly were found to be concordantly over-expressed gene sets in Virus(+) (all FDR p < 0.2; Additional file 5: Figure S4A). The GSEA analysis of 189 oncogenic gene sets also revealed that Rb/P107_DN.v1.up, Rb_down , e2F1 target , GCNP/SHH , and mTOR_up.v1_up  were significantly enriched again with all positive enrichment scores (all FDR p < 0.2; Additional file 5: Figure S4B).
Primary squamous cell carcinoma: differentially expressed genes varied depending on infection viral types
Noticeably, all SCC tumor specimens bear viral DNA of at least one viral type and thus differential gene expression analyses were performed for each viral type except HBV, with which only one SCC specimen was uninfected. Table 2 shows the list of 12 DEGs from the LIMMA analysis of the SCC specimens with and without viral DNA of each of Bovine leukemia virus (BLV), Panthera leo persica Papillomavirus Type 1 (PlpPV1), HPV, Simian T-cell leukemia virus Type 1 (STLV1), Type 2 (STLV2) and Type 6 (STLV6).
For BLV, PSG4 and CPB2 were significantly underexpressed in BLV(+) (n = 8), as compared to BLV(−) SCC specimens (n = 2). Of these, CPB2 is an extracellular matrix-regulated gene and has been considered an indicator for an impaired lung function  (Additional file 6: Figure S5A).
The GSEA of hallmark gene sets resulted in two significantly altered cell cycle-pathways. One was the G2/M checkpoint (FDR = 0.019) and the other was the e2F targets that encode cell cycle-related targets of e2F transcription factors (FDR p = 0.054) (Fig. 3a). Successively, in the GSEA of oncogenic gene sets, seven significant signatures came up and the most significant signature was a set of genes up-regulated in primary keratinocytes with knockout of both Rb1 and Rbl1 (FDR p = 0.038) (Fig. 3b) . Above all, the GSEA of immunologic gene sets revealed that 133 immunologic signatures were significantly enriched. Among these, the top significant signature was WT_vs_NFATC1_KO, which comprises 200 genes up-regulated in B lymphocytes stimulated by anti-IgM under knockout of NFATc1 (ES = 0.674, FDR p = 0.026; Additional file 6: Figure S5B). This suggested that BLV in SCC tumor cells might interact closely with NFATc1, which is an oncogene involved in various functions in cancer [39, 40].
For PlpPV1, GCM1 and SMR3A genes were significantly overexpressed in PlpPV1(+) (n = 2) in comparison with PlpPV1(−) specimens (n = 8). As for STLV1, comparisons of gene expression between STLV1(+) (n = 2) and STLV1(−) specimens (n = 8) yielded five significantly overexpressed, such as FMN2 and MYEOV, and one down-regulated gene (SPRR3) in STLV1(+). A comparison of gene expressions between STLV2(+) (n = 3) and STLV2(−) specimens (n = 7) detected only one gene, CRISP2 (cysteine rich secretory protein 2), that was significantly down-regulated in STLV2(+) (Table 2).
Interestingly, although the LIMMA analysis for HPV57 and STLV6 viral types resulted in no significant DEG, GSEA yielded several significantly enriched gene sets. For HPV57, five hallmark gene sets, such as oxidative phosphorylation and Myc targets, were significantly enriched (Fig. 3C). Additionally, it was a unique oncogenic gene set that TBK1.DN.48 enriched significantly (ES = 0.58, p = 0.049; Additional file 6: Figure S5C). This gene set comprises 50 genes down-regulated in epithelial lung cancer cells upon over-expression of the proto-oncogene KRAS and knockdown of TBK1, and induced apoptosis .
Lastly, the GSEA of hallmark gene sets for comparing STLV6(+) (n = 3) with STLV6(−) specimens (n = 7) recaptured G2/M checkpoint (ES = − 0.581, FDR p = 0.053) and e2F transcription factor target (ES = − 0.622, FDR p = 0.084), both of which were also significantly altered in the aforementioned comparison between BLV(+) and BLV(−), but showed reversed expression changes with negative ESs, which seemed obvious because 6 of 7 STLV(−) SCC specimens were BLV(+) (Fig. 3d).
Primary adenocarcinoma: No enrichment of oncogenic and immunologic gene sets
Eight of 10 (80%) primary adenocarcinoma specimens bear viral DNA of four different viral types (four with Y73SV only, two with HBV only, one with both HPV57 and Y73SV, and one with both Y73SV and STLV2). In the LIMMA analysis of any virus-infected (Virus(+)) (n = 8) versus uninfected primary adenocarcinoma specimens (Virus(−)) (n = 2), ACTC1 and PCSK2 were found to be significantly down-regulated in Virus(+) (Table 2; Additional file 7: Figure S6). ACTC1 is a member of actin cytoskeletons and was recently reported to be a potential candidate contributing to the enhanced lung tumor development .
A viral type-specific subgroup analysis was subsequently performed using the LIMMA for each of HBV and Y73SV with at least two infected samples. When HBV(+)(n = 2) and HBV(−) specimens (n = 8) were compared, five genes (CEACAM8, PRSS1, CALB2, NUDT4, and NAP1L6) were significantly over-expressed in HBV(+) whereas CEP170B, MTND6P4 and NUP62 were down-regulated (Table 2). Of these, PRSS1, CALB2, and NUDT4 shared a common molecular function of metal ion binding according to a DAVID gene ontology analysis. Noticeably, VirusMetha interactome analysis of the genes revealed that NUP62 protein has interactions with three viral proteins: P0C206, Q85601, and TAX. Additionally, NUP62 is an essential component of the nuclear pore complex and plays a novel role in centrosome integrity. A recent study noted that knockdown of NUP62 induced G2/M phase arrest, mitotic cell death, aberrant centrosome, and centriole formation .
Next the LIMMA analysis of six Y73SV(+) and four Y73SV(−) adenocarcinoma specimens identified one up-regulated gene, MARCH3, and three down-regulated genes, NMNAT2, ANKRD29, and QSER1, in Y73SV(+) specimens. NMNAT2 is involved in nicotinate and nicotinamide metabolism, and is a novel regulator of cell proliferation and apoptosis in NSCLC by binding with SIRT3 .
Unlike the GSEA results of SCC specimen data, however, no oncogenic and immunologic gene set was enriched in the GSEA analysis of primary adenocarcinoma data, indicating that viruses in SCC tumors might interact with human genes more strongly than those in adenocarcinoma.
Primary bronchioloalveolar carcinoma: low virus infection rate
There were only two BAC specimens bearing viral DNAs of Porcine circovirus type 2 (PCV-2) and Y73SV, exclusively. Several immune-related genes had high expression levels in the both virus-infected specimens, but not statistically significant (e.g. IGKV1–5 with FC = 96 and FDR p = 0.47; IGKV1D-13 with FC = 228 and p = 0.99) (Additional file 8: Figure S7A). Likewise, no gene set was enriched in the GSEA even though notch-signaling hallmark gene set and CRX_NRL oncogenic signatures were highly overexpressed in the two virus-infected specimens (Additional file 8: Figure S7B and S7C).
Metastatic lung cancer: distinct genes from the findings of primary tumor analysis
Y73SV was the unique viral type detected in two of nine metastatic tumor specimens. Therefore, all differential gene expression analyses were performed over all histological subtypes and metastatic disease sites. Using LIMMA, 69 DEGs including CRCT1 and MAGE9 were found (Additional file 9: Figure S8A). Based on DAVID annotation results of the DEGs, the top represented biological process was the positive regulation or activation of cell proliferation (FDR p = 0.13). It involved four overexpressed genes, BCL6, NTN1, NAMPT, and PBX1, and two underexpressed genes, MVD and VEGFC, in Y73SV(+) metastatic NSCLC specimens. The melanoma associated tumor antigen (MAGE) pathway was also significantly over-represented biological process and encompasses three genes (MAGEA9, MAGEA9B, and MAGEB2) overexpressed in Y73SV(+). Furthermore, 39 down-regulated genes were in part associated with the zinc finger binding process as well as the integral component of membrane (Additional file 9: Figure S8B, C, and D).
The VirusMentha interactome analysis next revealed that nine over-expressed genes (BNIP3, BNIP3L, ENY2, BCL6, TMEFF2, ZNF655, TMA7, SSR3, and GART) and five under-expressed genes (CLTA, RPS21, SUB1, ITSN2, and TSNARE1) in Y73SV(+) had significant virus-host protein-protein interaction with human adenovirus 2 (FDR p = 0.019) and human immunodeficiency virus type 1 (FDR p = 0.027), respectively (Fig. 4). Among them, SSR3 has the most complex interaction network with viral proteins, in terms of the number of interacted viral proteins, followed by ENY2 and GART. In particular, BNIP3 is an apoptosis-inducing protein that can overcome Bcl-2 suppression and plays an important role in the calprotectin (S100A8/A9)-induced cell death pathway . Both BNIP3 and BNIP3L interact with small T-antigen E1B viral protein that is a putative adenovirus Bcl-2 homolog that inhibits apoptosis induced by TNF or FAS pathways, as well as p53-mediated apoptosis . It was also reported that without E1B function, virus production is compromised because of premature death of the host cell .
Lastly, in order to explore metastasis-specific genes, we compared the 69 DEGs with the 1527 DEGs, which was identified in the above primary tumor analysis comparing Virus(+) to Virus(−). Consequently, eight genes (C17orf80, SUB1, PYROXD2, IFT57, PAM, ARMC8, SSR3, and BNC2) were in common. The first five genes showed expression changes in the same direction with negative fold changes for C17orf80 and SUB1, and positive for PYROXD2, IFT57, and PAM in both Virus(+)-primary and -metastatic specimens (Additional file 9: Figure S8D). Since the primary tumor analysis involves various viruses, the 69 DEGs were further compared to the four genes that had differential expressions between Y73SV(+) and Y73SV(−) primary NSCLC tumors. No overlapping gene was found. Collectively, these observations suggested that Y73SV might have different host cellular gene targets and differently influence their expressions in primary and metastatic tumor cells.
Viruses are now accepted as bona fide etiologic factors of human cancer such as with HPV in cervical and oropharyngeal cancer, and HBV in hepatocellular carcinoma . In our recent published study, we screened archived frozen NSCLC specimens and 10 non-neoplastic controls for potential microorganisms and surprisingly found most SCC and adenocarcinomas contained various strains of viral DNA, but they were absent in non-neoplastic lung. These data raised the question of whether viral DNA we found were just from commensal viruses that somehow were attracted to the tumors or whether they were functional and were in some way involved in carcinogenesis. In the current study, we thus analyzed gene expression microarray data to investigate the transcriptomic targets and differential gene expression patterns by virus infection status in these same NSCLC tumor specimens.
We showed that various genes, oncogenic gene sets, and immunologic gene sets had significantly altered gene expression profiles between Virus(+) and Virus(−) NSCLC tumors with the same histological subtype or metastatic disease status according to viral types. For example, the cell cycle, proteasome, and p53 signaling pathways were significantly over-represented biological processes among DEGs in primary NSCLC tumor microarray analysis. This finding of the cell cycle pathway is in parallel with a known cancer-related mechanism of transforming retroviruses which carry oncogenes derived from cellular genes that are involved in mitogenic signaling and growth control . Furthermore, it was reported that a viral oncoprotein E6 in HPV induces proteasomal degradation of the tumor-suppressor p53, thereby compromising cell-cycle arrest and apoptosis .
One of most remarkable findings in our study was the presence of 22 viral carcinogenesis pathway-associated DEGs in primary NSCLC tumors. On the other hand, this pathway was not found when metastatic NSCLC specimens were analyzed solely or together with primary tumors, suggesting that these DEGs from our study are highly plausible transcriptomic targets of viruses for developing primary NSCLC and more extensive investigation of them will shed a light on delineating the potential etiological roles of viruses in NSCLC.
Notably, there was no overlap between the lists of DEGs from the analyses of each viral type that infected the primary NSCLC tumor specimens. Also, several significantly enriched hallmarks, oncogenic signatures, and immune-related signatures were identified in the analysis of SCC tumors, but none in the analysis of adenocarcinoma and BAC. These findings support that differential gene expression patterns of NSCLC tumors were quite distinct among infected viral types even in the same histological subtype.
We also found that eight DEGs were common in both primary tumor analysis and metastatic tumor analysis, and some of them even showed opposite directions of expression changes, in terms of different signs of fold changes. This implies that viruses may have the same host gene targets during NSCLC progression but might regulate their target gene expressions differently.
It is widely agreed that BLV does not cause disease in humans, but we found a high incidence (85%) of several Delta retroviruses in our lung SCC specimens, including BLV. We also identified several genes and gene pathways significantly altered between BLV-infected and uninfected lung SCC specimens. Recently, BLV was detected in a high proportion (59%) of 218 breast cancer specimens  and a statistical association between BLV and the risk of breast cancer was reported . Although this virus has not yet been causally linked to cancer development or progression, it may play an important role that yet to be described.
In this study, we focused primarily on annotating biological functions and virus-host protein interactions of DEGs between virus-infected and uninfected NSCLC, but not on the association of those genes with clinical outcomes of patients. Our previous study  already documented that NSCLC patients whose tumor contained viral DNA had a better prognosis, longer overall survival times, than those without viral DNA in their tumors. The DEGs in these viral-positive tumors that are strongly associated with a more positive clinical outcome in NSCLC patients may serve as potential biomarkers or even novel therapeutic targets for treatment to improve outcomes.
In our NSCLC tumor specimens, particular viruses were detected in certain histologic subtypes and at a cancer progression stage (e.g. SCC tumors infected with BLV, PlpPV1, and STLV, and metastatic lung cancer with Y73SV, and primary NSCLC with HTLV-1). The cell types of lung cancer have far different presentations and usual anatomic location in the lungs: adenocarcinoma primarily starts in the periphery of the lungs and squamous cell usually arises in larger airways. There tend to be differences in the demographics of the patients who develop the different tumor cell types. For example, lung squamous cell carcinoma almost always occurs in current or former heavy smokers (and predominantly in men), whereas adenocarcinoma commonly occurs in former minimal smokers or never smokers. So it is not surprising that different types of viral DNA are found in different tumor cell types.
Since our current study has a small number of NSCLC tumors, we were not able to take account for interaction effects of viral types, histological subtypes, and metastatic disease sites on gene expression profiles in diverse NSCLC tumors. We also identified differentially altered genes and inferred their functional relationships with the detected viruses solely by a statistical comparison and bioinformatics approaches. A larger study comparing the impact of tumor viruses on gene expression in primary tumors to those in metastatic NSCLC will be needed in conjunction with a biological confirmation of expression changes of the genes and the presence of viral proteins by PCR and immunostaining experiments to better understand complex virus-host gene interactions in the NSCLC tumors.
Differential gene expression patterns were distinct among infected viral types, histological subtypes, and metastatic disease status of NSCLC. These findings support the hypothesis that tumor viruses play a role in NSCLC by regulating host gene expressions in tumor cells during tumor differentiation and progression. Therefore this follow-up study strongly indicates that the viral DNA detected in lung tumors was from functional viruses interacting with tumor cells and they were not just “passengers,” a unique finding in lung carcinogenesis that warrants further investigation.
Bovine leukemia virus
Database for annotation, visualization and integrated discovery
False discovery rate
Gene set enrichment analysis
Hepatitis B virus
Human papilloma virus
Human T-cell lymphotropic virus type I
Linear models for microarray analysis
Lawrence Livermore microbial detection array
Non-small cell lung cancer
Polymerase chain reaction
Porcine circovirus type 2
Panthera leo persica papillomavirus type 1
Squamous cell carcinoma
Simian T-cell leukemia virus type 1
Simian T-cell leukemia virus type 2
Simian T-cell leukemia virus type 6
Y73 sarcoma virus
zur Hausen H. Historical review. In: Weinheim z HH, editor. Infections Causing Human Cancer. Germany: Wiley-Blackwell; 2011. p. 7–40.
Pogorzelski M, Ting S, Gauler TC, Breitenbuecher F, Vossebein I, Hoffarth S, Markowetz J, Lang S, Bergmann C, Brandau S, et al. Impact of human papilloma virus infection on the response of head and neck cancers to anti-epidermal growth factor receptor antibody therapy. Cell Death Dis. 2014;5:e1091.
Robinson LA, Jaing CJ, Pierce Campbell C, Magliocco A, Xiong Y, Magliocco G, Thissen JB, Antonia S. Molecular evidence of viral DNA in non-small cell lung cancer and non-neoplastic lung. Br J Cancer. 2016;115(4):497–504.
Hasegawa Y, Ando M, Kubo A, Isa S, Yamamoto S, Tsujino K, Kurata T, Ou SH, Takada M, Kawaguchi T. Human papilloma virus in non-small cell lung cancer in never smokers: a systematic review of the literature. Lung Cancer. 2014;83(1):8–13.
Tang KW, Alaei-Mahabadi B, Samuelsson T, Lindh M, Larsson E. The landscape of viral expression and host gene fusion and adaptation in human cancer. Nat Commun. 2013;4:2513.
Victoria JG, Wang C, Jones MS, Jaing C, McLoughlin K, Gardner S, Delwart EL. Viral nucleic acids in live-attenuated vaccines: detection of minority variants and an adventitious virus. J Virol. 2010;84(12):6033–40.
Erlandsson L, Rosenstierne MW, McLoughlin K, Jaing C, Fomsgaard A. The microbial detection array combined with random Phi29-amplification used as a diagnostic tool for virus detection in clinical samples. PLoS One. 2011;6(8):e22631.
Gardner SN, Jaing CJ, McLoughlin KS, Slezak TR. A microbial detection array (MDA) for viral and bacterial detection. BMC Genomics. 2010;11:668.
Hewitson L, Thissen JB, Gardner SN, McLoughlin KS, Glausser MK, Jaing CJ. Screening of viral pathogens from pediatric ileal tissue samples after vaccination. Adv Virol. 2014;2014:720585.
Thissen JB, McLoughlin K, Gardner S, Gu P, Mabery S, Slezak T, Jaing C. Analysis of sensitivity and rapid hybridization of a multiplexed microbial detection microarray. J Virol Methods. 2014;201:73–8.
Jaing CJ, Thissen JB, Gardner SN, McLoughlin KS, Hullinger PJ, Monday NA, Niederwerder MC, Rowland RR. Application of a pathogen microarray for the analysis of viruses and bacteria in clinical diagnostic samples from pigs. J Vet Diagn Investig. 2015;27(3):313–25.
Tellez J, Jaing C, Wang J, Green R, Chen M. Detection of Epstein-Barr virus (EBV) in human lymphoma tissue by a novel microbial detection array. Biomark Res. 2014;2(1):24.
Paradzik M, Bucevic-Popovic V, Situm M, Jaing CJ, Degoricija M, McLoughlin KS, Ismail SI, Punda-Polic V, Terzic J. Association of Kaposi’s sarcoma-associated herpesvirus (KSHV) with bladder cancer in Croatian patients. Tumour Biol. 2014;35(1):567–72.
Be NA, Allen JE, Brown TS, Gardner SN, McLoughlin KS, Forsberg JA, Kirkup BC, Chromy BA, Luciw PA, Elster EA, et al. Microbial profiling of combat wound infection through detection microarray and next-generation sequencing. J Clin Microbiol. 2014;52(7):2583–94.
Rosenstierne MW, McLoughlin KS, Olesen ML, Papa A, Gardner SN, Engler O, Plumet S, Mirazimi A, Weidmann M, Niedrig M, et al. The microbial detection array for detection of emerging viruses in clinical samples--a useful panmicrobial diagnostic tool. PLoS One. 2014;9(6):e100813.
Ragin C, Obikoya-Malomo M, Kim S, Chen Z, Flores-Obando R, Gibbs D, Koriyama C, Aguayo F, Koshiol J, Caporaso N, et al. HPV-associated lung cancers: an international pooled analysis. Carcinogenesis. 2014;35:1267–75.
Welsh EA, Eschrich SA, Berglund AE, Fenstermacher DA. Iterative rank-order normalization of gene expression microarray data. BMC bioinformatics. 2013;14:153.
Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43(7):e47.
Huang d W, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2009;4(1):44–57.
Calderone A, Licata L, Cesareni G. VirusMentha: a new resource for virus-host protein interactions. Nucleic Acids Res. 2015;43(Database issue):D588–92.
Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102(43):15545–50.
Li H, Zhu W, Zhang L, Lei H, Wu X, Guo L, Chen X, Wang Y, Tang H. The metabolic responses to hepatitis B virus infection shed new light on pathogenesis and targets for treatment. Sci Rep. 2015;5:8421.
Vaezi AE, Bepler G, Bhagwat NR, Malysa A, Rubatt JM, Chen W, Hood BL, Conrads TP, Wang L, Kemp CE, et al. Choline phosphate cytidylyltransferase-alpha is a novel antigen detected by the anti-ERCC1 antibody 8F1 with biomarker value in patients with lung and head and neck squamous cell carcinomas. Cancer. 2014;120(12):1898–907.
Chen BS, Li CW. Constructing an integrated genetic and epigenetic cellular network for whole cellular mechanism using high-throughput next-generation sequencing data. BMC Syst Biol. 2016;10:18.
Baril M, Es-Saad S, Chatel-Chaix L, Fink K, Pham T, Raymond VA, Audette K, Guenier AS, Duchaine J, Servant M, et al. Genome-wide RNAi screen reveals a new role of a WNT/CTNNB1 signaling pathway as negative regulator of virus-induced innate immune responses. PLoS Pathog. 2013;9(6):e1003416.
Nakayama S, Sng N, Carretero J, Welner R, Hayashi Y, Yamamoto M, Tan AJ, Yamaguchi N, Yasuda H, Li D, et al. Beta-catenin contributes to lung tumor development induced by EGFR mutations. Cancer Res. 2014;74(20):5891–902.
Chang HY, Sneddon JB, Alizadeh AA, Sood R, West RB, Montgomery K, Chi JT, van de Rijn M, Botstein D, Brown PO. Gene expression signature of fibroblast serum response predicts human cancer progression: similarities between tumors and wounds. PLoS Biol. 2004;2(2):E7.
Escobar M, Velez M, Belalcazar A, Santos ES, Raez LE. The role of proteasome inhibition in nonsmall cell lung cancer. J Biomed Biotechnol. 2011;2011:806506.
Wang DT, Ma ZL, Li YL, Wang YQ, Zhao BT, Wei JL, Qi X, Zhao XT, Jin YX. miR-150, p53 protein and relevant miRNAs consist of a regulatory network in NSCLC tumorigenesis. Oncol Rep. 2013;30(1):492–8.
Zhang JL, Zhao WG, Wu KL, Wang K, Zhang X, Gu CF, Li Y, Zhu Y, Wu JG. Human hepatitis B virus X protein promotes cell proliferation and inhibits cell apoptosis through interacting with a serine protease Hepsin. Arch Virol. 2005;150(4):721–41.
Boxus M, Twizere JC, Legros S, Dewulf JF, Kettmann R, Willems L. The HTLV-1 tax interactome. Retrovirology. 2008;5:76.
Wan X, Wang H, Nicholas J. Human herpesvirus 8 interleukin-6 (vIL-6) signals through gp130 but has structural and receptor-binding properties distinct from those of human IL-6. J Virol. 1999;73(10):8268–78.
Kim EJ, Juhnn YS. Cyclic AMP signaling reduces sirtuin 6 expression in non-small cell lung cancer cells by promoting ubiquitin-proteasomal degradation via inhibition of the Raf-MEK-ERK (Raf/mitogen-activated extracellular signal-regulated kinase/extracellular signal-regulated kinase) pathway. J Biol Chem. 2015;290(15):9604–13.
Lara MF, Garcia-Escudero R, Ruiz S, Santos M, Moral M, Martinez-Cruz AB, Segrelles C, Lorz C, Paramio JM. Gene profiling approaches help to define the specific functions of retinoblastoma family in epidermis. Mol Carcinog. 2008;47(3):209–21.
Ma Y, Croxton R, Moorer RL Jr, Cress WD. Identification of novel E2F1-regulated genes by microarray. Arch Biochem Biophys. 2002;399(2):212–24.
Zhao Q, Kho A, Kenney AM, Yuk Di DI, Kohane I, Rowitch DH. Identification of genes expressed with temporal-spatial restriction to developing cerebellar neuron precursors by a functional genomic approach. Proc Natl Acad Sci U S A. 2002;99(8):5704–9.
Majumder PK, Febbo PG, Bikoff R, Berger R, Xue Q, McMahon LM, Manola J, Brugarolas J, McDonnell TJ, Golub TR, et al. mTOR inhibition reverses Akt-dependent prostate intraepithelial neoplasia through regulation of apoptotic and HIF-1-dependent pathways. Nat Med. 2004;10(6):594–601.
Lim SB, Tan SJ, Lim WT, Lim CT. An extracellular matrix-related prognostic and predictive indicator for early-stage non-small cell lung cancer. Nat Commun. 2017;8(1):1734.
Bhattacharyya S, Deb J, Patra AK, Thuy Pham DA, Chen W, Vaeth M, Berberich-Siebelt F, Klein-Hessling S, Lamperti ED, Reifenberg K, et al. NFATc1 affects mouse splenic B cell function by controlling the calcineurin--NFAT signaling network. J Exp Med. 2011;208(4):823–39.
Im JY, Lee KW, Won KJ, Kim BK, Ban HS, Yoon SH, Lee YJ, Kim YJ, Song KB, Won M. DNA damage-induced apoptosis suppressor (DDIAS), a novel target of NFATc1, is associated with cisplatin resistance in lung cancer. Biochim Biophys Acta. 2016;1863(1):40–9.
Barbie DA, Tamayo P, Boehm JS, Kim SY, Moody SE, Dunn IF, Schinzel AC, Sandy P, Meylan E, Scholl C, et al. Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1. Nature. 2009;462(7269):108–12.
Franks SE, Briah R, Jones RA, Moorehead RA. Unique roles of Akt1 and Akt2 in IGF-IR mediated lung tumorigenesis. Oncotarget. 2016;7(3):3297–316.
Liu SM, Chen W, Wang J. Distinguishing between cancer cell differentiation and resistance induced by all-trans retinoic acid using transcriptional profiles and functional pathway analysis. Sci Rep. 2014;4:5577.
Li H, Feng Z, Wu W, Li J, Zhang J, Xia T. SIRT3 regulates cell proliferation and apoptosis related to energy metabolism in non-small cell lung cancer cells through deacetylation of NMNAT2. Int J Oncol. 2013;43(5):1420–30.
Ghavami S, Eshragi M, Ande SR, Chazin WJ, Klonisch T, Halayko AJ, McNeill KD, Hashemi M, Kerkhoff C, Los M. S100A8/A9 induces autophagy and apoptosis via ROS-mediated cross-talk between mitochondria and lysosomes that involves BNIP3. Cell Res. 2010;20(3):314–31.
Thomson BJ. Viruses and apoptosis. Int J Exp Pathol. 2001;82(2):65–76.
Han J, Sabbatini P, Perez D, Rao L, Modha D, White E. The E1B 19K protein blocks apoptosis by interacting with and inhibiting the p53-inducible and death-promoting Bax protein. Genes Dev. 1996;10(4):461–77.
Butel JS. Viral carcinogenesis: revelation of molecular mechanisms and etiology of human disease. Carcinogenesis. 2000;21(3):405–26.
Buehring GC, Shen HM, Jensen HM, Jin DL, Hudes M, Block G. Exposure to bovine leukemia virus is associated with breast Cancer: a case-control study. PLoS One. 2015;10(9):e0134304.
Prochazka M, Granath F, Ekbom A, Shields PG, Hall P. Lung cancer risks in women with previous breast cancer. Eur J Cancer. 2002;38(11):1520–5.
This study was supported by the Paul Hoenle Foundation, Sarasota, Florida, USA and supported in part by the Shared Resources at the H. Lee Moffitt Cancer Center & Research Institute, an NCI designated Comprehensive Cancer Center (P30-CA076292). All the authors of this paper declare they have no financial interest related to the work described in this paper.
Availability of data and materials
All relevant data and materials are presented in the manuscript.
Ethics approval and consent to participate
Ethnical approval for the use of archived tissue and patient information was obtained from the University of South Florida IRB, Protocol No. MCC16765. Due to retrospective nature of data collection consent to participate was not sought in this study as previously the patients were consented to be used their clinical samples for general research investigation in H. Lee Moffitt Cancer Center and Research Institute.
Consent for publication
The authors consent to the publication of the manuscript and all materials attached.
All authors declares that they have no competing interests and involvements that have a bearing on this paper including:
1) Support for the work under consideration, or for other projects, either financial or in kind from any third party, company or organization whose finances or reputation may be affected by the publication of the work. 2) Any recent, existing or planned employment relationship or consultancy (whether paid or unpaid) any of the authors has with an organization whose finances or reputation may be affected by the publication of the work. 3) Any direct financial interest any of the authors or their spouses, parents or children has (personal shareholdings, consultancies, patents or patent applications) whose value could be affected by the publication.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Figure S1. Expression Patterns of 639 Genes Differentially Altered between all Virus-infected and Uninfected NSCLC Tumor Specimens. (PDF 163 kb)
Table S1. Top 10 Significant Genes with Differential Expression between Virus-infected and uninfected NSCLC Tumor Specimens . (PDF 197 kb)
Figure S2. Differentially-Expressed Genes between Virus-infected and Uninfected Primary NSCLC Tumors. (PDF 181 kb)
Figure S3. Viral Carcinogenesis (KEGG ID: hsa05203). (PDF 134 kb)
Figure S4. Gene Sets Enriched Significantly in All Primary NSCLC. (PDF 176 kb)
Figure S5. Expression Patterns of 12 Genes Differentially Expressed between Virus-infected and Uninfected Squamous Cell Carcinoma. (PDF 137 kb)
Figure S6. Differentially-Expressed Genes between Virus-infected and Uninfected Adenocarcinoma. (PDF 147 kb)
Figure S7. Gene Expression of Notch-signaling and CRX_NRS Gene sets in Bronchioloalveolar Carcinoma. (PDF 148 kb)
Figure S8. Differentially-Expressed Genes between Virus-infected and Uninfected Metastatic NSCLC. (PDF 129 kb)
About this article
Cite this article
Kim, Y., Pierce, C.M. & Robinson, L.A. Impact of viral presence in tumor on gene expression in non-small cell lung cancer. BMC Cancer 18, 843 (2018). https://doi.org/10.1186/s12885-018-4748-0
- Non-small cell lung cancer
- Gene expression
- Human papilloma virus