Validation of putative reference genes for gene expression studies in human hepatocellular carcinoma using real-time quantitative RT-PCR

Background Reference genes, which are often referred to as housekeeping genes are frequently used to normalize mRNA levels between different samples in quantitative reverse transcription polymerase chain reaction (qRT-PCR). The selection of reference genes is critical for gene expression studies because the expression of these genes may vary among tissues or cells and may change under certain circumstances. Here, a systematic evaluation of six putative reference genes for gene expression studies in human hepatocellular carcinoma (HCC) is presented. Methods Six genes, beta-2-microglobulin (B2M), glyceraldehyde-3-phosphate dehydrogenase (GAPDH), hydroxymethyl-bilane synthase (HMBS), hypoxanthine phosphoribosyl-transferase 1 (HPRT1), succinate dehydrogenase complex, subunit A (SDHA) and ubiquitin C (UBC), with distinct functional characteristics and expression patterns were evaluated by qRT-PCR. Inhibitory substances in RNA samples were quantitatively assessed and controlled using an external RNA control. The stability of selected reference genes was analyzed using both geNorm and NormFinder software. Results HMBS and GAPDH were identified as the optimal reference genes for normalizing gene expression data between paired tumoral and adjacent non-tumoral tissues derived from patients with HCC. HMBS, GAPDH and UBC were identified to be suitable for the normalization of gene expression data among tumor tissues; whereas the combination of HMBS, B2M, SDHA and GAPDH was suitable for normalizing gene expression data among five liver cancer cell lines, namely Hep3B, HepG2, HuH7, SK-HEP-1 and SNU-182. The determined gene stability was increased after exclusion of RNA samples containing relatively higher inhibitory substances. Conclusion Of six genes studied, HMBS was found to be the single best reference gene for gene expression studies in HCC. The appropriate choice of combination of more than one reference gene to improve qRT-PCR accuracy depends on the kind of liver tissues or cells under investigation. Quantitative assessment and control of qRT-PCR inhibitors using an external RNA control can reduce the variation of qRT-PCR assay and facilitate the evaluation of gene stability. Our results may facilitate the choice of reference genes for expression studies in HCC.


Background
Real-time quantitative reverse transcription (qRT)polymerase chain reaction (PCR) is a rapid, sensitive and reliable method for gene expression studies. It is inherently an indirect method of measurement, and variabilities exist in the various steps of the qRT-PCR which may lead to severe misinterpretation of the results. The latter may be due to different amounts and quality of starting material, variable enzymatic efficiencies (i.e. efficiency of retrotranscription from RNA to complementary DNA (cDNA), and PCR efficiency) between samples and runs, operator errors, and differences between tissues or cells in overall transcriptional activity [1,2]. Thus, proper normalization strategy is necessary for reliable quantitative information to be extracted from this variable system.
Various strategies have been explored in an attempt to normalize these variations, and it is generally accepted that gene-expression levels should be normalized by carefully selected and stably expressed reference genes [2][3][4]. The success of this procedure is highly dependent on the choice of the appropriate reference genes. An ideal reference gene should be unaffected by the experimental conditions and should have low variation in gene expression. Otherwise, the detection of small changes becomes unfeasible, producing results that may be entirely incorrect [5]. Several reports have provided the evidence that the expression of most commonly used reference genes varies among tissues or cells and may also change under certain environmental circumstances [6][7][8][9][10][11]. For instance, a recent report indicated the de-regulation of common reference genes in hepatocellular carcinoma (HCC) arising from hepatitis C virus (HCV) infected liver [12]. Therefore, it is critical to perform preliminary evaluation studies, aimed at identifying the most stably expressed reference genes in individual tissues and distinct circumstances for each single experiment.
HCC is the sixth most frequently diagnosed cancer and the third most common cause of cancer mortality in the world [13], but the molecular mechanisms of hepatocarcinogenesis, including gene expression variation, are not well understood. Therefore, the number of studies assessing global gene expression profiles of HCC has increased exponentially in recent years [14][15][16][17][18][19][20], and the identification of optimal reference genes is necessary for correct gene expression profiling of HCC. In the current study, we validated the stability of six putative reference genes in liver cancer cell lines, tumoral tissues and adjacent nontumoral tissues from 20 HCC patients. Two algorithms based on different strategies, geNorm and NormFinder, were used for data analysis.

Primer design
Primers for RT-PCR assays of HMBS and UBC were designed using Primer Express v2.0 (Applied Biosystems, Foster City, California, USA). Primer sequences for B2M, GAPDH, HPRT1 and SDHA were obtained from the realtime PCR primer and probe database (RTPrimerDB) [21]. Special attention was given to primer length, annealing temperature, base composition and 3'-end stability. To ensure optimal DNA polymerization efficiency, amplicon length ranged between 69 and 113 bp. Exon and intron boundaries were determined aligning primers with corresponding human genomic sequences downloaded from GenBank http://www.ncbi.nlm.nih.gov/Genbank. RT-PCR primers for each gene were located on different exons or directly spanning exon-exon boundaries of the genomic sequence to minimize amplification from any contaminating genomic DNA that may remain following DNase treatment. To ensure that gene transcripts used as standards display a similar secondary structure with native RNA, primers for cloning were designed to have binding sites located at least 20 nucleotides external to the binding sites of primers for assays. The sequences of primers for cloning and the length of amplicons are shown in Supplementary table 1 in additional file 1.

Samples
Human liver cancer cell lines Hep3B, HepG2, SK-HEP-1 and SNU-182 were obtained from the American Type Culture Collection (ATCC, Manassas, USA). Human HCC cell line HuH7 was a kind gift from Dr. Brigitte Pützer. Hep3B, HepG2, HUH7 and SK-HEP-1 were grown in DMEM medium (high glucose, with glutamine and sodium pyruvate) supplemented with 10% (v/v) heat-inactivated fetal bovine serum (FBS), 100 U/ml penicillin, 100 μg/ml streptomycin (all from PAA Laboratories Pasching, Austria), and 0.1 mM GIBCO™ MEM Non Essential Amino Acids (Invitrogen Corporation, Carlsbad, CA). SNU-182 was grown in RPMI 1640 medium containing 1 mM sodium pyruvate, 2 mM L-glutamine, 4500 mg/L glucose, 10 mM HEPES, and 1500 mg/L sodium bicarbonate, supplemented with 10% (v/v) heat-inactivated FBS, 100 U/ ml penicillin and 100 μg/ml streptomycin (all from PAA Laboratories). Stock cultures were maintained at 37°C in a humidified, 5% CO 2 incubator and were routinely passaged at 3 to 4 day intervals. Tumoral and non-tumoral liver tissues were obtained from 20 patients with HCC undergoing liver resection. Most of those were paired tumoral and adjacent non-tumoral tissues. All tissue samples were snap frozen in liquid nitrogen and stored till use at -80°C. The study was approved by the local ethics committee and all patients provided written informed consent. Detailed information regarding cell lines, patients and tissue samples is shown in Supplementary tables 2 and 3 in additional file 1.

Total RNA isolation and characterization
Total RNA was extracted using the miRNeasy Mini Kit (Qiagen, Hilden, Germany), according to the manufacturer's instructions. Genomic DNA was eliminated by a DNase-on-column treatment with RNase-free DNase set (Qiagen). The purity of extracted RNA was assessed by measuring the optical density (OD) at wavelengths of 230 nm, 260 nm and 280 nm using a quartz cuvette with Eppendorf Biophotometer (Eppendorf, Hamburg, Germany). Absorbance ratio at 260/280 nm and 260/230 nm was used to assess the purity of the RNA samples. The integrity and quantity of extracted RNA were assessed by evaluating the capillary electrophoresis trace with Agilent 2100 bioanalyzer (Agilent Technologies, Waldbronn, Germany). The integrity of the samples was determined by RNA integrity number (RIN) algorithm, which assigns a score from 10 to 1, where level 10 represents a completely intact RNA, and level 1 represents a highly degraded RNA [22]. Total RNA from cell lines was extracted at the exponential growing phase. Liver tissue samples were disrupted and homogenized in QIAzol lysis reagent with TissueRuptor (Qiagen) for total RNA isolation.

Construction of absolute standard curves
In vitro transcripts of each gene were generated as standards. RT-PCR products of each gene were purified with Wizard ® SV Gel and PCR Clean-Up System (Promega Corporation, Madison, WI, USA), and then cloned with PGEM ® -T easy vector system (Promega), according to the manufacturer's instructions. Plasmid DNA was isolated and purified with Wizard ® plus SV Minipreps DNA purification system (Promega). The orientation and sequence of the inserts were confirmed by restriction endonuclease digestion with EcoRI (Roche Molecular Biochemicals, Mannheim, Germany) followed by agarose gel electrophoresis with ethidium bromide staining, and gene sequencing with BigDye ® Terminator v1.1 Cycle Sequencing Kit (Applied Biosystems, Foster City, CA, USA) using the T7 Promoter Primer (Promega). Bacterial clones with correct inserts were cultured again for isolation and purification of plasmid DNA with PureYield™ Plasmid Midiprep System (Promega). Plasmid DNA was linearized with restriction endonuclease SaI I (Roche) or NcoI (Promega) and transcribed into RNA with MEGAscript ® T7 or Sp6 kit (Ambion, Austin, TX, USA). Transcripts were purified with MEGAclear™ kit (Ambion) and plasmid DNA was eliminated by DNase treatment before and after the purification of the transcripts with TURBO DNase I (Ambion) and TURBO DNA-free™ kit (Ambion) respectively. DNase treated samples were desalted with DNase inactivation reagent included in TURBO DNA-free™ kit.
The quantity and quality of the transcripts were evaluated with Eppendorf Biophotometer (Eppendorf), denatured agarose gel electrophoresis and Agilent 2100 bioanalyzer (Agilent). Five points of 10-fold serial dilution of each transcript were used to build the standard curve for each gene. The copy number range of each standard curve was set according to the approximate expression level of the individual gene determined by preliminary experiments and encompassed the copy number range of the genes in the samples. The copy numbers of the standards are shown in Supplementary table 4 in additional file 1.

qRT-PCR inhibitor detection
qRT-PCR inhibitor was detected in each sample and standard using Alien ® QRT-PCR Inhibitor Alert (Stratagene, La Lolla, CA, USA), according to the manufacturer's instructions. Briefly, each sample was spiked with 10 5 copies of alien RNA transcript, an in-vitro transcribed RNA being non-homologous to the sequences currently available in GenBank. Increases in the threshold cycle (Ct) value for amplification of alien RNA in samples were compared with alien RNA alone and calculated to determine the presence of inhibitory substances in the samples [23].

Real-time quantitative RT-PCR
Real-time quantitative RT-PCR (qRT-PCR) was performed with the SYBR Green QuantiTect RT-PCR Kit (Qiagen), according to the manufacturer's instructions. For each sample, 20 ng total RNA was used in the assay and all the genes were tested with the same panel of RNA samples. RT negative reactions were performed for standards and sample RNA. All standards and samples were run in duplicate on 96-well reaction plates with the iQ™5 Multicolor Real-Time PCR Detection System (Bio-Rad, Hercules, CA, USA). Reactions were prepared in a total volume of 25 μl containing 20 ng RNA in 5 μl volume, 1.25 μl of each 10 μM primer (500 nM), 12.5 μl of SYBR Green master mix, 0.25 μl of RT mix and 4.75 μl RNase/DNase-free sterile water. No template controls were run for each master mix and primer pair. After performing preliminary gradient real-time RT-PCR assays, the optimal annealing temperature for all the primer pairs resulted to be 56.4°C. At this temperature the lowest Ct value was generated as well as a sharp melting peak, with no amplification of non-specific products or primer-dimer artifacts (see Additional file 2, 3, 4, 5, 6, 7). The conditions of reverse transcription and primer concentrations were set according to the recommendations in the manufacturer's instructions and confirmed by preliminary experiments which generated correlation coefficient (R 2 ) of each standard curve above 0.99. The cycle conditions were as follows: reverse transcription at 50°C for 30 min, DNA polymerase activation and RT enzyme inactivation at 95°C for 15 min, followed by 40 cycles of denaturation at 94°C for 15 s, primer annealing at 56.4°C for 30 s, elongation at 72°C for 30 s. The data collection step was performed at a temperature that was 3°C lower than the melting temperature of each amplicon and higher than that of primer-dimers of each primer pair. This cycle was followed by a melting curve analysis, ranging from 55°C to 95°C, with temperature increasing steps of 0.5°C every 10 s. Baseline and threshold values were automatically determined for all plates using the Bio-Rad iQ5 Software 2.0. The obtained data were analyzed using geNorm (version 3.5, written by Vandesompele J. et al) and NormFinder software (written by Andersen C.L. et al).

Results and discussion
Six putative reference genes from different abundance and functional classes were selected for evaluation: B2M, GAPDH, HMBS, HPRT1, SDHA, and UBC (Table 1). Ribosome RNAs, like 28s rRNA and 18s rRNA were not included in this study, because imbalance between rRNA and mRNA fractions has been observed and rRNA transcription is affected by biological factors and drugs [24][25][26][27]. Gene sequence information deposited in GenBank under the Accession Numbers was used to design RT-PCR primers ( Table 2). The qRT-PCR efficiency for each single primer pair was determined in real-time RT-PCR using serial 10-fold dilutions of in vitro transcripts of each gene. Expression patterns of the six reference genes have been subsequently evaluated in different sample groups.

RNA quantity and quality determination
All RNA samples were examined for their purity, integrity and concentration. The absorbance ratio at 260/280 nm of all the samples were all over 2.0, except for one sample with a ratio of 1.85, indicating all the samples were free from proteins potentially accumulating during the RNA extraction procedure. The absorbance ratios at 260/230 nm were 0.43 ~2.23, indicating possible contamination of some samples with carbohydrates, peptides, phenols or aromatic compounds [28]. Although samples with a low OD 260/230 ratio may interfere with downstream process like RT-PCR [28], only a few studies using RT-PCR methods have taken the OD 260/230 ratio into consideration when evaluating RNA sample purity. In addition, there is no generally accepted cutoff value of OD 260/230 ratio that allows researchers to determine the suitability of a sample to be used in qRT-PCR. Thus, RNA samples were additionally subjected to qRT-PCR inhibitor detection for the mining of the relationship between OD 260/230 ratio and qRT-PCR inhibition. All RNA samples were determined to be intact or to have mild loss of integrity (RIN > 6.5) using the RIN algorithm [22]. A RIN higher than 5 was recommended as good total RNA quality and a RIN higher than 8 as perfect total RNA quality for downstream qRT-PCR application [29]. Therefore, the integrity of all RNA samples was suitable for the use in our experiments. The absorbance ratio at 260/280 nm, 260/230 nm and RIN are shown in Supplementary table 5 in additional file 1.

Presence of qRT-PCR inhibitors in purified RNA samples
Increases of the Ct value (ΔCt), for amplification of alien RNA in 100 ng samples compared with alien RNA alone ranged between 0.16~2.37. When the samples were diluted 5-fold, increases of the Ct value, ranged between -0.14~0.92, indicating that the inhibition of qRT-PCR by inhibitors was significantly reduced by simply diluting the experimental RNA samples (Figure 1), but was not reduced to an acceptable level in some samples.
Pearson's correlation analysis between the OD 260/230 ratios and ΔCt values resulted in a Pearson's r of -0.2420, r 2 of 0.0586, and a p value of 0.175, indicating no statistically significant linear relationship between the two variables (Table 3). Data shown here suggest that an alien RNA control, but not the OD 260/230 ratio, is an direct and specific way for the detection of qRT-PCR inhibitors    N1  N2  N3  N4  N5  N6  N7  N8  N9  N10  N13  N14  N15  N17  N20  T1  T2  T3  T4  T5  T6  T7  T8  T9  T10  T12  T13  T14  T15  T18  T19  T20 Samples Delta Ct 20ng RNA 100ng RNA in RNA samples. Thus, the selection of samples in this study was based on the quantitative assessment of qRT-PCR inhibitors using an alien control RNA. Samples with a ΔCt value higher than the 75th percentile of all the samples, namely 0.675, were excluded from the analyses for the determination of suitable reference genes. For paired tissues, both tissues were excluded from the analysis if the ΔCt value of one of them exceeded the cutoff value.

QRT-PCR inhibitor detection in all samples
There is a variety of inhibitors potentially affecting the efficiency of qRT-PCR reactions which may be co-purified with RNA samples, depending on the source of starting material and the methods of extraction. Tissues are among the common sample types known to contain inhibitors like collagen [30]. Inhibitors from experimental reagents include phenol, ethanol, guanidine, EDTA and others [23]. Thus, it is important to assess the existence of qRT-PCR inhibitors in RNA samples from tissues prior to the qRT-PCR assay.
We found no significant correlation between the OD 260/ 230 ratio and the level of qRT-PCR inhibitors in RNA samples. Moreover, it has been shown that some inhibitors may directly bind to nucleotides [31]. Therefore, whether a modification of the RNA extraction procedure or a repurification of RNA samples will reduce the level of qRT-PCR inhibitors needs further validation. Efficiency of qRT-PCR (E) of primer pairs ranged from 87.8% to 94.9%, and correlation coefficients (R 2 ) ranged from 0.994 to 0.998 ( Table 2). The amplification of alien RNA transcripts among standards and 20 ng native RNA samples was at a similar level, thus indicating analogical qRT-PCR efficiency (Figure 2).

Expression levels of candidate reference genes
According to the average copy number of each gene in 20 ng purified total RNA of each sample, the six tested genes belong to various abundance classes, with an approximately 10,000-fold expression difference between the most abundant (B2M) and the rarest (UBC) transcript. The expression levels for individual genes were also different among different sample panels. Four out of six genes showed an expression pattern in cell lines > tumoral tissues > non-tumoral tissues, SDHA showed an expression pattern in cell lines > non-tumoral tissues ≈ tumoral tissues, and B2M showed an expression pattern in tumoral tissues > non-tumoral tissues > cell lines (Figure 3).

Expression stability of putative reference genes
The data obtained for each sample and each reference gene were analyzed using both geNorm and NormFinder [1,33]. GeNorm provides a ranking of the tested genes based on the reference gene stability measure M which is defined as the average pair-wise variation of a particular gene compared with all other control genes. Genes with higher M values have greater variations of expression. Additionally, the assessment of the pair-wise variations (V n/n+1 ) between each combination of sequential normalization factors allows the identification of the optimal number of reference genes. NormFinder, whose strategy is rooted in a mathematical model of gene expression, provides a ranking of the tested genes based on a direct measure of both overall expression variation and the variation between sample subgroups of candidate reference genes. The stability of genes was shown as stability value, and genes with lower stability values have higher expression stability. NormFinder has been reported to show relatively low sensitivity towards co-regulation of the candidate normalization genes [33].
The analysis by geNorm suggested that all studied genes reached a high expression stability with low M values (Table 4 and 5), below the default limit of M = 1.5 [1]. The optimal reference genes for normalizing gene expression data between paired tumoral and non-tumoral liver tissues were HMBS and GAPDH (Table 4; Figure 4). HMBS, GAPDH and UBC were identified to be suitable for the normalization of gene expression data among tumor tissues. The combination of HMBS, B2M, SDHA and GAPDH was suitable for normalizing gene expression data among five liver cancer cell lines, yielding a V 4/5 value of 0.175, which is close to the cutoff value 0.15. The reason why there was no low enough V value in cell lines may be due to the limited number of cell lines or candidate reference genes included in the evaluation.
The results of the analysis by NormFinder appeared to be identical to the one determined using geNorm in each sample panel, except for the position of two genes with close stability, GAPDH and SDHA, whose ranking was inverted in paired tissues (Table 4). Interestingly, the stable value of the best combination of two genes determined by NormFinder for paired tissues was slightly higher than that of HMBS alone (Table 4), indicating HMBS alone may be sufficient as reference gene for expression studies in paired tissues.
When comparing the data of the paired tissue group in which 10 tissue RNA samples with high inhibitory substances were excluded, a higher stability of each gene, and a higher similarity between the two algorithms became evident (Table 4; Figure 4), indicating that the existence of qRT-PCR inhibitors in RNA samples can increase the variation of qRT-PCR assay.  It is worth noting that clonal variation that has been described to occur in HCC [34], might induce bias in gene expression studies. While genomic heterogeneity may interfere with gene expression studies aimed at identifying a malignant genotype, this phenomenon should be of less importance for the selection of reference genes which are constitutively expressed and are generally involved in basic functions needed for the sustenance of the cell.

QRT-PCR inhibitor detection in standards and several RNA samples
As shown in Table 6, major differences were evident in the ranking order and gene stability of the candidate genes in paired tissues compared to a similar study previously published by Kim et al. [35]. UBC was ranked as the most sta- Determination of the optimal number of reference genes for paired tumoral and non-tumoral tissues, tumor tissues and cell lines separately  bly expressed gene in the previous study, but was ranked as the fourth using both geNorm and NormFinder in our study. In addition, the M values for the candidate reference genes were much lower in our study, thus indicating a high reliability of the data. The notably higher stability of putative reference genes in our study may be due to the following: (i) matching tumoral tissue with the nontumoral tissue from the same patient may have reduced the noise deriving from diversity of gene expression among individuals; (ii) the integrity of RNA was assessed and controlled using Agilent Bioanalyzer; (iii) qRT-PCR inhibitor was maintained at an acceptable level in all standards and samples; and (iv) qRT-PCR efficiency was determined for each primer pair.

Conclusion
Of six genes studied, HMBS was found to be the single best reference gene for expression studies in HCC. The appropriate choice of combination of more than one reference gene to improve qRT-PCR accuracy depends on the kind of liver tissues or cells under investigation. Quantitative assessment and control of qRT-PCR inhibitors using an external RNA control can reduce the variation of qRT-PCR assay and facilitate the evaluation of gene stability. Our results may facilitate the choice of reference genes for expression studies in HCC.