Pharmacogenomics of in vitro response of the NCI-60 cancer cell line panel to Indian natural products

Background Indian natural products have been anecdotally used for cancer treatment but with limited efficacy. To better understand their mechanism, we examined the publicly available data for the activity of Indian natural products in the NCI-60 cell line panel. Methods We examined associations of molecular genomic features in the well-characterized NCI-60 cancer cell line panel with in vitro response to treatment with 75 compounds derived from Indian plant-based natural products. We analyzed expression measures for annotated transcripts, lncRNAs, and miRNAs, and protein-changing single nucleotide variants in cancer-related genes. We also examined the similarities between cancer cell line response to Indian natural products and response to reference anti-tumor compounds recorded in a U.S. National Cancer Institute (NCI) Developmental Therapeutics Program database. Results Hierarchical clustering based on cell line response measures identified clustering of Phyllanthus and cucurbitacin products with known anti-tumor agents with anti-mitotic mechanisms of action. Curcumin and curcuminoids mostly clustered together. We found associations of response to Indian natural products with expression of multiple genes, notably including SLC7A11 involved in solute transport and ATAD3A and ATAD3B encoding mitochondrial ATPase proteins, as well as significant associations with functional single nucleotide variants, including BRAF V600E. Conclusion These findings suggest potential mechanisms of action and novel associations of in vitro response with gene expression and some cancer-related mutations that increase our understanding of these Indian natural products. Supplementary Information The online version contains supplementary material available at 10.1186/s12885-022-09580-7.


History of Ayurveda
Ayurveda is a traditional system of medicine that originated around 3000-4000 BCE, which utilizes Indian natural products (INP) derived mainly from plants to treat "imbalances" in the body aiming to cure a variety of diseases, including cancer [1]. In the Ayurvedic system of herbal medicine, there are 3 main physiologic states called doshas which are based on several phenotypic (body frame, weight, facial features) and mental (memory, emotional lability) factors. A fundamental belief in Ayurvedic medicine is that an imbalance in these doshas leads to disease and illness, which are purported to be corrected by a combination of these herbal remedies [2].
Historical references in Ayurvedic text contain some of the first descriptions of cancer (blood and soft tissue) and their successful treatment with a combination of INPs administered via oral and topical routes [2]. However, results reported in these historical references are difficult to replicate due to the use of multiple herbal products in combination, a difference in basic disease terminology, and heterogeneity in preparation of the herbal compounds [3,4]. Despite the uncertain efficacy of these INPs, Ayurvedic medications have been reported to be used by as many as 20-40% of patients with cancer in India as they are believed to prevent chemotherapyrelated toxicity, boost immunity, and slow tumor growth [5,6]. Knowledge of the putative anticancer mechanisms of action of individual molecular compounds comprising the INPs is incomplete, however some in vitro and in vivo data for several commonly used INPs exist and are discussed below.

Examples of Indian natural products
Curcumin is a bioactive polyphenol that is the most common curcuminoid, a group of compounds that impart a yellow color to Curcuma longa (turmeric). Curcumin has generated a lot of interest as an INP with possible chemo-preventative, anticancer, and anti-inflammatory properties, highlighting the difficulty of defining a specific indication due to its description as a panacea [7]. Some reports have demonstrated the modest activity of curcumin to induce apoptosis in cancer cell lines, its role in enhancing response to cisplatin, and its antiinflammatory properties [7,8]. These findings have led to many trials including active clinical trials in the US (NCT02064673, NCT02944578, NCT02782949) exploring the role of curcumin as a chemo-preventative agent in preventing gastric cancer, cervical intraepithelial neoplasia, and the recurrence of prostate cancer.
Neem (Azadirachta indica) is another commonly used herbal product that has several component INPs with reported anticancer properties, which highlights the difficulty in isolating active INP compounds. Nimbolide is a terpenoid lactone derived from Neem that induces apoptosis in pancreatic cancer cells through reactive oxygen species (ROS) generation and upregulation of pro-apoptotic proteins [9]. Gedunine, a pentacyclic triterpenoid derived from Neem, has also demonstrated activity in pancreatic cancer through inhibition of the sonic hedgehog pathway [10]. These mechanisms of action of multiple INPs from the same herbal product make it difficult to attribute the activity of INPs, which is further complicated as many patients taking INPs receive combinations of several herbal products.
Amla (Phyllanthus emblica), a.k.a. Indian gooseberry, is part of the genus Phyllanthus, which has been used in traditional herbal medicine to treat multiple ailments. The Phyllanthus genus includes several species (e.g., P. niruri, P. urinaria, P. fraternus, etc.) which have been used to treat a wide range of ailments from diabetes to renal calculi [11]. Although anecdotal reports of use of Amla to treat cancer are lacking, some active molecules in Amla have been studied more extensively, including quercetin. Quercetin, a polyphenolic flavonoid derived from P. emblica, has been shown to attenuate tumor growth in breast and pancreatic cancer models through multiple mechanisms including growth signal inhibition of the PI3K pathway and tyrosine kinase inhibition [12].
Cucurbitacins are a group of compounds characterized by a tripterpene hydrocarbon. which are found in over 40 species, including Indian plants such as Brahmi (Bacopa monnieri) and bitter gourd (Momordica charantia) [13]. These plants, which are known for their bitter taste due to the cucurbitacins, are purported to prevent cancer and are administered orally as a liquid formulation. While cucurbitacin B is one of the more extensively studied cucurbitacins, its putative anticancer mechanism of action is not well defined; however this product is thought to be involved in JAK/STAT pathway inhibition and F-actin cytoskeleton disruption [14].
While putative anti-cancer mechanisms of action have been suggested for commonly used INPs as detailed above, these data are often limited to in vitro response in one or a few cell lines. Data regarding rarer INPs including plumbagin (Plumbago zeylanica), alizarin (Rubia cordifolia), and Achilleol A (Achillea odorata) are limited or have not yet been reported [15]. Analysis of data from a large database of cell line assay results such as the NCI-60 cancer cell line panel data, for the purpose of determining a mechanism of action, may improve our understanding of these INPs.

NCI-60 cell line panel
Our overall strategy to explore the possible mechanisms of action of INPs was to compare patterns of cell line response to each INP with publicly available data to those for standard reference anticancer compounds and to identify clusters (subtrees) of INPs with similar patterns of response across the NCI-60 cell lines. Next, we examined the association of gene expression levels and of clinically or biologically important single nucleotide variants (SNVs) with response to individual INPs. We also examined how the molecular features associated with tumor cell line responses to individual INPs were distributed among the INP subtrees that had similar patterns of response. Lastly, we investigated the biological pathways representing the gene expression patterns that were associated with different INP subtrees. These analyses provided new insights into potential mechanisms of actions of the INPs.
To examine the activity of INPs in tumor cells, we analyzed publicly available data from the NCI-60 cancer cell line panel. The NCI-60 initiative was started by the U.S. National Cancer Institute (NCI) in 1989 with the purpose of screening candidate anti-cancer compounds on 60 cancer cell lines representing 10 different tumor types. Over 100,000 compounds have been screened to date, including INPs and well-characterized reference compounds approved for clinical use (e.g., paclitaxel, methotrexate, and other agents) [16][17][18]. The Developmental Therapeutics Program (DTP) of the NCI screens these compounds using a single high-dose test to meet pre-specified minimum inhibition criteria and subsequently screens each compound in a 5-dose screen using a 48 h endpoint measured by a Sulforhodamine B stain [18]. Data recorded by the screen include GI50, IC50, LC50, and total growth inhibition (TGI) cell response data which are used to generate unique patterns across cell lines [17][18][19]. To interrogate this rich dataset, the COM-PARE algorithm was developed to allow comparisons of response patterns (across cell lines) of synthetic and natural products of interest with standard reference compounds to help determine their putative mechanisms of actions [16].
Additionally, molecular features of the NCI-60 cell lines have been extensively characterized. Their gene expression, whole exome sequencing, and other molecular data have been made publicly available [20,21]. These data were integrated into online databases and made available through CellMiner and CellMinerCDB data portals, which allow access to gene expression, genetic variation, and drug sensitivity data [22,23]. Measures of response of the cell lines to a large number of drugs and investigational compounds, including some natural products, are also publicly available from the NCI DTP NCI-60 Growth Inhibition data repository. Combined, these data provide an opportunity to assess gene-drug relationships. Thus, the NCI-60 resource offers a robust dataset that may be interrogated to increase our understanding of INPs and their mechanisms of action. Figure 1 summarizes the workflow of the steps of the analyses in this study.

Collection of Indian natural products and reference compounds with cell line response data
A biomedical literature search in PubMed at the National Center for Biotechnology Information (NCBI) using keywords "Ayurveda" AND "cancer" AND "review" was conducted to identify Ayurvedic herbs of interest, with a total of 170 publications found. Each publication was manually reviewed. Among them, 25 publications contained a comprehensive description of one or more Ayurvedic herbs and their specific INPs that are commonly used by Ayurvedic practitioners in cancer treatment. These INPs were included in subsequent searches. All INPs identified in our manual curation were then searched in PubMed for evidence of any activity in cancer cell lines and were compiled, resulting in the total of 258 INPs.
The NCI DTP screening program uses a special identifier, called an NSC number, for each compound screened in the NCI-60 cell line panel. Those INPs obtained from our literature search that did not have NSC numbers (n = 66) were excluded from further analysis. The unique NSC numbers for the remaining INPs (n = 192) identified from biomedical literature were interrogated using the NCI PUBLIC COMPARE portal for available GI50 data (https:// dtp. cancer. gov/ public_ compa re) [24,25]. Each GI50 value represents sensitivity of an NCI-60 cell line to a particular compound, calculated as the concentration producing 50% growth inhibition that is derived from the 5-concentration screen of each compound at 48 h after incubation [18]. Those INPs with only single dose response data (n = 117) were excluded. The remaining INPs (n = 75) were used as input for separate queries in the NCI PUBLIC COMPARE portal. The public version of the NCI PUBLIC COMPARE database does not store the taxonomy and global locations of the original source products for the database compounds. The queries use Pearson correlation analysis to compare the vector of GI50 values across the NCI-60 cell line panel for each input INP to the vector of GI50 values for available COMPARE reference antitumor agents (including approved agents, e.g., methotrexate and vincristine, and experimental agents). We used a cutoff of the absolute value for a pairwise Pearson correlation coefficient |r|> 0.5 to select the reference compounds with similar GI50 response profiles to each input INP.
The NSC numbers of the 75 INPs and the 57 reference compounds that were correlated with at least one of those 75 INPs with |r|> 0.5 (Table 1) were used to download publicly available -log 10 GI50 data (negative log 10 GI50, referred as NLOGGI50 in the downloadable dataset) from the static public release at the DTP website NCI-60 Growth Inhibition data repository (https:// wiki. nci. nih. gov/ displ ay/ NCIDT Pdata/ NCI-60+ Growth+ Inhib ition+ Data). This dataset is currently available under previous releases (filename: NCI60_GI50_2016b. zip, June 2016 release downloaded on March 4, 2020). Details of the sample handling, preparation and cell line testing methods followed to generate the data in this repository are described elsewhere [19]. The NLOGGI50 values were multiplied by -1 in order to convert them to log 10 GI50, a measure of cell line response to treatment. Here and below, we refer to these measures as logGI50. All logGI50 values that were not available were set to missing. The term "compound" is used to describe the INPs and reference compounds with available logGI50 data. As multiple experiments had been run for each compound, the median logGI50 was calculated, using replicate experiments, for each cell line-compound pair. These median logGI50 values for each NCI-60 cell line were computed for all 132 compounds using 15,199 experiment records. The majority of the data were screened in molar units, except for the product of Ricinus communis (NSC 15384), which had the units in μg/ml and was not included in the clustering analysis for that     [16].

Hierarchical clustering of the logGI50, logLC50 and TGI values of INPs and reference compounds
In order to identify groups of INPs with similar patterns of activity in the NCI-60 cell line panel, we employed hierarchical clustering of the INPs. The initial clustering to identify groups of compounds with similar response patterns was based on the logGI50 values (Fig. 2). Reference compounds were also included in the clustering to provide information about possible mechanisms of action of each hierarchical cluster, or subtree, containing INPs with similar response. Clustering was based on pairwise Euclidean distances between each compound pair, which were calculated using the logGI50 values of the INPs and reference compounds in all 60 NCI-60 cell lines. A hierarchical tree based on these Euclidean distances was generated using the hclust package using the 'average' , or UPGMA, option and exported for further visualization using the ape package [26]. Additionally, a 2-dimensional heatmap of the compounds and cell lines was generated from logGI50 values using heatmap.2 in the gplots package. We used RStudio v1.2.5033 for clustering analysis. Further visualization and graphical representation of the hierarchical clustering of all compounds and of their individual subtrees was done using Dendroscope version 3.7.2 [27].
To augment the analysis of clusters of INPs and reference compounds using logGI50 values, we also performed separate clustering of compounds using logLC50 and TGI values representing the 50% lethal concentration needed for the 50% cell kill and the concentration (also on the log 10 scale) for the total inhibition of growth, respectively [18,19,25]. Both logLC50 and TGI values were downloaded from the December 2021 release of the NCI-60 Growth Inhibition Data (https:// wiki. nci. nih. gov/ NCIDT Pdata/ NCI-60+ Growth+ Inhib ition+ Data). Values for all INPs and reference compounds were extracted, and median values were computed as detailed above. Pairwise Euclidean distances were calculated, and unrooted radial hierarchical trees were generated using the methodology described above. These trees were visualized and compared to the tree inferred using logGI50 values ( Fig. 2; Table 1). Subsequent analyses of The column labeled Plant Name/Reference Product Mechanism shows the main mechanism of action of reference products and taxonomy for Ayurvedic compounds.  Table 5 association of INP response with gene expression, gene enrichment, and single nucleotide variation data were performed using logGI50 values as the primary endpoint measure.

Analysis of association of gene expression with INP activity
To examine how NCI-60 cell line response to INPs may be influenced by molecular genetic features, we analyzed the association of median logGI50 values with NCI-60 molecular data. Pre-treatment gene expression data for the NCI-60 cell lines was downloaded from the CellM-inerCDB resource [23,28]. A more detailed description of the collection of molecular measures can be found in our previous publication [29]. For expression analysis, we used log 2 transformed expression measures of 23,059 annotated transcripts, lncRNAs, and miRNAs which had been previously combined from five Affymetrix expression microarray platforms and normalized by the CellMiner development team [22]. Cell lines for which there were no drug response data (MDA-MB-468) or no gene expression data (MDA-N) were excluded (n = 2). For each gene-INP pair, Spearman correlation was computed to evaluate the association between pre-treatment gene expression and logGI50 in 58 cell lines. Benjamini-Hochberg procedure was applied to control the false discovery rate (FDR) across the 23,059 gene × 75 INP pairs. Gene-INP pairs with FDR-adjusted p < 0.05 were considered significant. A positive value of the Spearman correlation coefficient ρ indicated an association of higher gene expression with higher logGI50 values of an INP, i.e., with increased resistance to that INP. Similarly, negative values of ρ showed an association of higher gene expression with lower logGI50 values, i.e., with increased sensitivity to that INP. Here and below, the terms sensitivity and resistance were used to define the direction of the associations, as the analyses of logGI50 values were performed on the continuous scale. All genes with significant Spearman correlations were investigated to determine whether the gene involved in the gene-INP pair was associated with a known molecular mechanism of action  Table 1 and Supplementary Table 4 of reference compounds that clustered in the same subtree with that INP.

Gene set enrichment analysis
Gene set enrichment analysis was performed using g:Profiler (https:// biit. cs. ut. ee/ gprofi ler/ gost), which is a regularly updated web-based utility that includes annotated pathway gene sets from KEGG, Reactome, and WikiPathways [30]. Genes that were significantly associated with response to INPs (FDR adjusted p < 0.1) in each cluster were stratified to negatively and positively correlated groups (Supplementary Tables 1-3). GSEA analysis was performed on each gene group separately for each cluster, using the gene symbols as input for g:Profiler.
A significance level for enriched pathways was set at p < 0.05 (FDR adjusted).

Analysis of association of INP activity with single nucleotide variants
To examine the association between NCI-60 cell line response to INPs and specific DNA alterations of cancer genes that may affect cytotoxicity response, whole exome sequencing (WES) data were downloaded from the CellMiner data download portal [22,31]. One cell line (MDA-N) which did not have drug response data was excluded, leaving a total of 59 cell lines available for analysis. The data were filtered using a list of candidate genes and functionally relevant SNVs from OncoKB v. 1.17, a curated precision oncology knowledge base [32]. As outlined in our earlier report [29], the list consisted of variants classified by OncoKB at levels 1-4 of potential therapeutic action, R1 and R2 levels of resistance, and variants classified as "oncogenic" and "likely oncogenic". After applying this filter to the CellMiner WES data, 1,586 protein changing SNVs in 280 genes across 59 NCI-60 cell lines were identified. These SNVs, which included nonsynonymous changes, frameshift variants, and variants involving the stop codon or the loss of a translational initiation codon start site, were additionally filtered to include only variants present in at least 3 NCI-60 cell lines, resulting in 107 genes with 220 SNVs across 59 cell lines. A Student's t-test was used to compare logGI50 values between groups of NCI-60 cell lines defined by variant status, for each SNV-INP pair. A positive value of the t-statistic indicated an association of higher gene expression with higher logGI50 values of an INP, i.e. increased resistance to that INP, whereas a negative value of the t-statistic showed an association of higher gene expression with lower logGI50 values, i.e. increased sensitivity to that INP. All analyses of associations between response to the natural products and sequence variants were performed using the RStudio v1.0.153. Biological interpretation of significant SNV-response associations was based on SNV annotation in OncoKB, using its updated annotation of levels of functional and oncogenic SNV effects as of 03/25/2021, and on published reports in biomedical literature.

Visualization of associations of response to INPs with molecular features and with cellular pathways in the NCI-60 cell lines
Visualization of significant associations (FDR adjusted p < 0.05) of logGI50 with gene expression and with single nucleotide variants, and of association of significantly upregulated and downregulated cellular pathways with INP subtrees was performed using Cytoscape v. 3.9.1 [33] and Microsoft Excel. Figure 2 shows the hierarchical clustering of the Indian natural products and reference compounds based on their median logGI50 values, presenting the results as an unrooted radial phylogram. Clustering revealed 4 distinct subtrees. As Subtree 4 consisted of only reference products (NSC 326231 -L-buthionine sulfoximine, and NSC 237020 -largomycin), it was excluded from subsequent analysis. Supplementary Fig. 1 provides a heatmap showing the two-dimensional clustering of the NCI-60 cell lines and the INPs and reference compounds, clustered according to the similar patterns of cell line response to these compounds using logGI50 values. The similarities of logGI50 response patterns within each subtree may suggest similar potency of the INPs with their grouped reference products and possibly similar mechanisms of actions.

Subtree 1 (13 INPs and 18 reference products)
The reference compounds in this subtree have mainly anti-mitotic activity (vincristine sulfate, vinleurosine sulfate, vinblastine sulfate, paclitaxel); however, they also included some agents that act as DNA intercalators (doxorubicin) and anti-metabolites (methotrexate). Some INPs of the cucurbitacin family and its derivatives (Cucurbitacin A, B, D, E, L, datiscoside) affect mitotic spindles and delay mitoses leading to a G2/M phase cell cycle arrest of cancer cells [13,14]. Phyllanthoside has been demonstrated to function both in vivo and in vitro as an inhibitor of eukaryotic protein synthesis by interfering with translation elongation, similar to the reference compound actinomycin D [34]. While a mechanism of action has not been clearly defined for tylophorin and its analog cryptoleurine, some experimental evidence points toward G1 arrest through cyclin A2 downregulation and VEGF2-mediated angiogenesis, which is not a known mechanism of any of the reference compounds correlated with its cytotoxicity [35,36].

Hierarchical clustering of Indian natural products and reference compounds based on the logLC50 and TGI measures
Supplementary Figs. 2 and 3 show the hierarchical clustering of INPs and reference compounds based on their median logLC50 or TGI values, respectively. The trees inferred using logLC50 and TGI were similar to each other, except for 12 compounds. Both logLC50 and TGI trees were comprised of 5 distinct subtrees, as compared to 4 distinct subtrees in the logGI50 tree (Fig. 2, Supplementary Figs. 2-3). Table 1 provides information, for each INP and reference compound, whether a compound had a similar clustering with other compounds and was assigned to a subtree with the same number based on logLC50 and TGI as compared to the subtrees based on clustering of logGI50. Detailed comparison of the cluster assignment of the compounds based on different response measures is provided in Supplementary  Table 5. Clustering which was based on TGI was more similar to logGI50-based clustering, whereas with the logLC50-based clustering more compounds showed differences from their logGI50-based cluster assignment (Supplementary Table 5). These patterns of similarity and difference between the three trees derived from different response measures may be explained by the fact that logGI50 and TGI both represent different degrees of growth inhibition, both being derived from the growth curve, whereas logLC50 is a different parameter representing a concentration needed to achieve 50% of cell kill [19]. Overall, the clustering was consistent for many INPs among the three difference response measures (Table 1 and Supplementary Table 5). It was less consistent for a number of reference compounds, possibly due to the higher potency of established anticancer drugs, which may result in their lower concentration needed to achieve total growth inhibition (TGI) or 50% lethal concentration (LC) as compared to the INPs. Seven reference compounds from subtree 2 of the logGI50 tree formed a separate cluster (subtree 5) in both TGI-and logLC50based trees. Anti-mitotic reference compounds (e.g. vinblastine, vincristine) clustered closely together in logGI50 subtree 1, however they were not tightly clustered in both logLC50 and TGI trees. The cluster assignment of many INPs (e.g. cinnamon and turmeric) in both logLC50 and TGI trees was similar to that in the logGI50 tree.

Association of cell line response to INPs with gene expression
Using pre-treatment gene expression data of 23,059 transcripts and the median logGI50 values of the 75 INPs, we conducted a Spearman correlation analysis that identified 204 natural product-gene pairs (including 190 unique genes and 28 unique INPs) that were statistically significant after adjusting for multiple testing (FDR adjusted p value < 0.05). All significant results are listed in Table 2 and summarized in a graphical format in Supplementary  Fig. 4. Below we discuss some of the highly significant correlations of biologically important protein-coding genes.

SLC7A11 and plumbagin (NSC 688284)
SLC7A11 (solute carrier family 7 member 11) has recently been suggested as potential drug target in pancreatic adenocarcinoma [37]. It plays a role in maintaining cellular glutathione levels via cystine uptake, protecting cells from oxidative stress induced death and is commonly overexpressed in cancer, which has been linked to chemoresistance in many anti-tumor agents [38][39][40][41]. Deletion of the SLC7A11 gene in genetically engineered mice with pancreatic ductal adenocarcinoma induced tumor-selective ferroptosis and inhibited tumor growth [40]. Targeting of the SLC7A11/ glutathione axis with sulfasalazine has been shown to cause synthetic lethality via decreased cystine uptake and intracellular glutathione biosynthesis [42]. Alternative strategies leveraging this metabolic addiction have also been demonstrated via inhibiting glucose uptake preventing the conversion of potentially toxic cystine to cysteine [38,43]. This highly positive correlation (Spearman correlation coefficient ρ = 0.79, unadjusted p value = 1.07 × 10 -13 , FDR adjusted p value = 8.47 × 10 -8 ) demonstrates increased resistance of tumor cell lines to plumbagin associated with increased gene expression of SLC7A11, which is consistent with the previous findings by our group and other authors about the potential role of this transporter in resistance to multiple antitumor agents and natural products [29,38,42,44].

ATAD family and curcumin
ATAD3A and ATAD3B are mitochondrial ATPase proteins expressed in embryogenesis. ATAD3B has been shown to be over-expressed in head and neck cancer and hepatocellular carcinoma [45,46]. Curcumin acts as a protonorphic uncoupler of oxidative phosphorylation decreasing ATP biosynthesis which alters the AMP:ATP ratio and ultimately decreases cell proliferation [47]. The negative correlation for both ATAD3A (Spearman correlation coefficient ρ = -0.57, unadjusted p value = 3.68 × 10 -6 , FDR adjusted p value = 0.04) and ATAD3B (Spearman ρ = -0.67, unadjusted p value = 1.29 × 10 -8 , FDR adjusted p value = 3.4 × 10 -3 ) genes demonstrates that increased sensitivity of cell lines to curcumin (i.e., lower logGI50 values) was associated with increased expression of the ATAD3A and ATAD3B genes.

MYB and phyllanthoside
MYB, a transcriptional activator, is a proto-oncogene that has been shown to be over-expressed in hematologic, colorectal, and breast cancer [48]. The negative correlation (Spearman correlation coefficient ρ = -0.66, unadjusted p value = 1.69 × 10 -8 , FDR adjusted p value = 3.84 × 10 -3 ) demonstrates an association between increased sensitivity of cell lines to phyllanthoside and increased expression of the MYB gene. This suggests a potential role of MYB-mediated transcriptional regulation in response to this INP.

Biological pathway analysis
The results of pathway analysis using g:Profiler are presented in Supplementary Tables 1-3 and summarized in a graphical format in Supplementary Fig. 5. Below we discuss the pathways and molecular functions that were identified for Subtrees 1 and 3. Subtree 2 was not evaluable due to a paucity of significant genes.
Biological pathway analysis using g:Profiler identified several biological pathways and functions which may be associated with increased sensitivity or resistance to INPs. Among the INPs in Subtree 1, resistance to NSC number 328426 (phyllanthoside), 342443 (S3'-desacetyl-phyllanthoside), 94743 (cucurbitacin A), 143925 (pekilocerin A), 112167 (elatericin B) was associated with pathways related to mineral homeostasis (Supplementary Table 1). Due to an insufficient number of genes associated with sensitivity to INPs in Subtree 1, common biological processes for those genes and INPs could not be evaluated.

Nuclear factor erythroid 2-related factor 2 (NRF2) pathway
NRF2 is a key transcription factor and a key modulator of cellular antioxidant response which has a role in preventing carcinogenesis. However, persistent activation of NRF2 has been demonstrated in some tumor types, which raises a possibility of its role in cancer proliferation [49]. As expression of the genes in this pathway was positively correlated with the INPs in Subtree 3, this suggests that resistance mechanisms to these INPs may be related to the NRF2 pathway [50].

PI3K-Akt-mTOR pathway
Overactivation of the PI3K-Akt-mTOR signaling pathway has been demonstrated in many different cancer types as a mechanism for tumor growth and therapeutic resistance [51]. As the pathway analysis of expression of the genes in this pathway found a positive correlation with logGI50 of the INPs in Subtree 3, this suggests that resistance mechanisms to the INPs such as NSC number 236613 (plumbagin), 643023 (alpha-phenyl-2,5-dimethoxy-alpha-cinnamonitrile), 365798 (piceatannol) and 112166 (cucurbitacin K) may be related to the PI3K-Akt-mTOR signaling. Subtree 3 contained several curcumin INPs and gallocatechin, which have been previously demonstrated to be associated with this pathway [52].

Eukaryotic translation pathway
A crucial component of cancer progression is translational control of protein synthesis through a increased rates of protein synthesis and specific mRNAs that promote increased tumor cell growth and survival [53].    Listed are the genes whose expression was associated with logGI50 of INPs with FDR adjusted p < 0.05. For each product, the subtree from hierarchical clustering shown in Fig. 1  As the pathway analysis of expression of genes in this pathway found a negative correlation with logGI50 of the INPs in Subtree 3, this suggests that sensitivity mechanisms to these INPs may be related to pathways associated with protein synthesis inhibition. Subtree 3 contained several curcumin-related INPs which have been previously demonstrated to have an association with these pathways [54].

Slit/Robo pathway
While the Slit/Robo pathway mainly involves functions to promote axon branching and neuronal migration, it is also involved in other physiological processes including angiogenesis and apoptosis [55]. Promoter hypermethylation of Slit/Robo has been observed in many different cancers, leading to undetectable or low levels of Slit/Robo, and natural products that reactivate this pathway via demethylation or other mechanisms are actively being explored [55]. Increased expression of genes in this pathway was negatively correlated with logGI50 of several INPs in Subtree 3, including NSC number 32982 (curcumin), 309909 (nimbolide), 87868 (phenethyl mustard oil), 742021 (curcumin tri adamantylaminoethylcarbonate), 742020 (ethoxycurcumin trithiadiazolaminomethylcarbonte), 705537 (daturaolone), 643769 (O-bromo-alpha-benzoyl cinnamonitrile), and 383468 (product of Andrographis paniculata), suggesting that overexpression of those genes may confer increased sensitivity to these products. This association indicates that such INPs could be explored to target this pathway. Curcumin and its related analogues have been demonstrated to also have a demethylating effect [56].

Association of cell line response to INPs with protein-changing single nucleotide variants
For each of the 75 INPs, and using whole exome sequencing data for the cell lines from CellMiner after filtering, we used a Student's t-test to analyze the differences between logGI50 values comparing cell lines with and without individual protein-changing single nucleotide variants in each of the 107 genes listed in OncoKB. After FDR adjustment, 13 SNV-INP pairs satisfied the FDR adjusted p value < 0.05, including 4 unique genes and 10 unique natural products. Below we discuss examples of associations of functionally important variants and likely oncogenic variants from OncoKB (Table 3 and Supplementary Fig. 6).

BRAF V600E and Cucurbitacin D (NSC 308606)
OncoKB lists BRAF V600E as a level 1 actionable variant, which was present in 9 cell lines (7 melanoma and 2 colorectal cell lines) in the NCI-60 dataset. Tumors with this variant are responsive to treatment with BRAF inhibitors (e.g., dabrafenib, vemurafenib) and in combination with MEK inhibitors this has been shown to be an effective treatment strategy for melanoma [57]. Consistent with our earlier analysis of a separate large natural product dataset [29], mean logGI50 response to cucurbitacin D was statistically significantly different when comparing cell lines without the BRAF V600E variant (mean = -6.69) to those with this variant (mean = -7.16, unadjusted p value = 5.71 × 10 -7 ; FDR adjusted p value = 7.42 × 10 -5 ). This association suggests that cucurbitacin D may have a role in targeting cancers with BRAF mutations or having an effect on BRAF [58]. Alternatively, the presence of BRAF V600E in most of the melanoma lines (8 out of 9 melanoma cell lines) may suggest that this INP may have a more general effect on growth inhibition in melanoma.

Discussion
In this study, we used in vitro data to examine the associations of variation in gene expression and deleterious mutations with tumor cell response to INPs. We also compared response patterns to those of reference compounds as a preliminary investigation of the possible mechanisms of action of these products at the cellular level. We reported the findings that were highly significant after the correction for multiple comparisons. We compared publicly available cancer cell line response data in the NCI-60 panel for 75 INPs to data for standard  [13,34]. Overall, the logGI50 response data were closely grouped among similar products, including cucurbitacins in Subtree 1, and curcumin and curcuminoids in Subtree 3.
Our analysis found multiple novel associations between gene expression and logGI50 values of INPs, including a highly significant association between increased levels of SLC7A11 expression and resistance to plumbagin. This resistance may involve increased SLC7A11 expression inhibiting ferroptosis, a distinct form of cell death due to excessive lipid peroxidation [43]. To our knowledge, our observed association between increased levels of ATAD3A/ ATAD3B expression and sensitivity to curcumin has not been previously reported. The products of these genes, ATPase family AAA domain containing 3A and 3B proteins, are involved in multi-protein complexes associated with mtDNA that are important for regulation of mitochondrial biogenesis and lipogenesis. Curcumin has been reported to regulate expression of enzymes involved in mitochondrial biogenesis and mitochondrial oxidative stress, to increase apoptosis and autophagic cell death, and to reduce cellular proliferation [59][60][61][62]. The association with ATAD3A and ATAD3B expression may be of interest since ATAD3 over-expression has been linked to the progression of head and neck cancer, lung adenocarcinoma, non-Hodgkin's lymphoma, uterine cancer, cervical cancer, prostate cancer, glioma, and hepatocellular carcinoma [46,63,64]. Interestingly, prior reports suggested the roles of increased ATAD3 expression in chemoresistance [46].
Our analysis of SNV variants demonstrated a statistically significant association of BRAF V600E with logGI50 measure of response to cucurbitacin D. The triterpene compounds from the Cucurbitaceae family, which include cucurbitacin D, are found in many gourd species. While they have demonstrated cytotoxicity in many cell lines, our finding of increased sensitivity in BRAF V600E mutated cell lines which includes almost all the melanoma cell lines in our dataset may warrant further investigation.
Paucity of INPs available in the public domain and consequently their underrepresentation in the NCI-60 cell line database limited our ability to evaluate some of the more commonly used Ayurvedic concoctions and herbs of interest including Triphala, Momordica charantia, and Withania somnifera. Additional open-source natural products databases [65][66][67] contain more INPs; however, the available NCI-60 screening data for these additional products in the DTP dataset were limited to single dose data and were not analyzed in our study.
We used logGI50 values as the primary response endpoint because many previous studies have shown these measures to be a relevant outcome to study associations with molecular targets. When using logGI50 values, clusters of compounds derived from logGI50 values have been shown to correlate well both with potential mechanism of cell line response and with similarities among compound structures [18,28,29,[68][69][70].
We used median logGI50 derived from the five-range dose screen as our measure of cell line response for the analysis of associations with molecular features of tumor cell lines. While this single logGI50 measure is informative in characterizing the cytotoxic effect of individual products, it may not reflect the cytotoxicity of the compound if it fell outside the pre-defined range of activity, in which case this measure would not reflect low levels of activity of some of the compounds we analyzed. As we analyzed pre-treatment gene expression levels for each cancer cell line, our findings cannot characterize the association between cell line response and post-treatment gene expression changes in response to each INP or reference compound. Such analyses may be of potential benefit in the future if post-treatment response data for Indian natural products become available. As the NCI-60 panel does not include normal cell lines for comparison, we did not focus on toxicity of these compounds and further studies will need to examine the side effects of these INPs.
As a note of caution, our findings do not indicate clinical efficacy but rather our study is an attempt to characterize available INPs and identify possible mechanisms of action for further study. In this analysis, utilization of the in vitro molecular screening data from the NCI-60 allowed us to identify molecular features of tumor cells associated with response to INPs. As Ayurvedic products are often used in specific combinations, our analysis would not be able to evaluate their clinical and immunomodulatory features associated with response to the combinations of such agents. Additionally, due to the limited representation of tumors and mutational features in the NCI-60 panel, we could not examine the response within individual cancer categories. Additional models including mouse patient-derived xenografts or other clinically relevant approaches may be needed to further investigate the physiological effects of Ayurvedic products in specific tumor types.

Conclusions
Our analysis examining NCI60 response patterns for 75 INPs and standard reference compounds and their similarities allowed us to elucidate potential common mechanisms of action and molecular features associated with response to these INPs. We identified a number of genes and several biological pathways that were associated with sensitivity and resistance to specific INPs and/ or entire INP clusters. Our findings provide a proof of principle that INPs may represent compounds of interest for cancer drug discovery and further studies should increase our understanding of their possible mechanisms of action. and drafted the manuscript. LMM provided statistical expertise and oversaw the statistical and computational analysis of the data. All authors participated in the interpretation of study results and read, edited, and approved the final manuscript.

Funding
Open Access funding provided by the National Institutes of Health (NIH)

Availability of data and materials
All response data for the INPs and the reference compounds used in this analysis are publicly available at the DTP PUBLIC COMPARE portal (https:// dtp. cancer. gov/ public_ compa re) [24,25]. NCI-60 expression and single nucleotide variant data are publicly available from the CellMiner and CellMinerCDB online resources [22,28].

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.