Increased frequency of single base substitutions in a population of transcripts expressed in cancer cells
© Bianchetti et al.; licensee BioMed Central Ltd. 2012
Received: 22 June 2012
Accepted: 9 October 2012
Published: 8 November 2012
Single Base Substitutions (SBS) that alter transcripts expressed in cancer originate from somatic mutations. However, recent studies report SBS in transcripts that are not supported by the genomic DNA of tumor cells.
We used sequence based whole genome expression profiling, namely Long-SAGE (L-SAGE) and Tag-seq (a combination of L-SAGE and deep sequencing), and computational methods to identify transcripts with greater SBS frequencies in cancer. Millions of tags produced by 40 healthy and 47 cancer L-SAGE experiments were compared to 1,959 Reference Tags (RT), i.e. tags matching the human genome exactly once. Similarly, tens of millions of tags produced by 7 healthy and 8 cancer Tag-seq experiments were compared to 8,572 RT. For each transcript, SBS frequencies in healthy and cancer cells were statistically tested for equality.
In the L-SAGE and Tag-seq experiments, 372 and 4,289 transcripts respectively, showed greater SBS frequencies in cancer. Increased SBS frequencies could not be attributed to known Single Nucleotide Polymorphisms (SNP), catalogued somatic mutations or RNA-editing enzymes. Hypothesizing that Single Tags (ST), i.e. tags sequenced only once, were indicators of SBS, we observed that ST proportions were heterogeneously distributed across Embryonic Stem Cells (ESC), healthy differentiated and cancer cells. ESC had the lowest ST proportions, whereas cancer cells had the greatest. Finally, in a series of experiments carried out on a single patient at 1 healthy and 3 consecutive tumor stages, we could show that SBS frequencies increased during cancer progression.
If the mechanisms generating the base substitutions could be known, increased SBS frequency in transcripts would be a new useful biomarker of cancer. With the reduction of sequencing cost, sequence based whole genome expression profiling could be used to characterize increased SBS frequency in patient’s tumor and aid diagnostic.
KeywordsCancer Bioinformatics Transcripts Substitutions ESC Biomarker Long-SAGE Tag-seq Patient Genetic integrity
In mammalian cells, genetic integrity is maintained by stability genes  which are in charge of correct chromosomal segregation and recombination; damaged DNA repair; accurate genomic DNA replication and transcriptional fidelity. In healthy cells, base substitutions occur at an extremely low incidence during both DNA replication  and RNA synthesis . Single base variations can be the result of Single Nucleotide Polymorphisms (SNP)  or RNA-editing carried out by ADAR  or APOBEC  enzymes. By contrast, genetic instability is a hallmark of cancer [7, 8]. Major mutational events such as chromosome region translocations, deletions and gene copy number variations have been reported in almost all cancer cells . Somatic mutations, i.e. acquired or inherited SBS which differentially alter cancer cell genomes and consequently transcript sequences, were reported on a genome wide scale using deep sequencing [10, 11]. Now, a census of cancer related somatic mutations that alter 422 human genes has been made available . A growing body of cancer studies reports SBS in transcripts that are not supported by the genome of tumor cells. Using EST alignments on reference mRNA sequences, Brulliard M. et al. proved that 15 abundantly expressed transcripts, namely GAPDH, VIM, FTH1, ENO1, HSPA8, TPT1, RPS4X, ATP5A1, FTL, RPL7A, TPI1, RPS6, ALDOA, LDHA and CALM2 had statistically greater SBS frequencies in cancer than in healthy cells whereas ALB and TSMB4X showed the opposite . Since most EST are 3’ fragments of mRNA sequences, increased SBS in cancer was detected at the 3’ boundary of mRNA. These SBS could not be explained by known SNP and were also unlikely the result of somatic mutations or RNA-editing enzymes. The possibility that instruments generated more sequencing errors when EST originated from cancer cells does not seem rational. As a working hypothesis, the concept of transcriptional infidelity (TI) was proposed: i) TI introduces non-random base variations in RNA sequences that are not supported by the genome ii) TI exists in both healthy and cancer cells, but is greater in cancer. Increased TI in cancer has been speculated to originate from a defective proofreading activity of RNA polymerases. Recently, in a study carried out at both genomic and transcriptional levels, SRP9 and COG3 mRNA expressed in tumor cells were clearly shown to carry SBS that were conflicting with the genome sequence . SRP9 sequencing chromatogram traces showed an adenine (in tumor DNA) and a guanine (in tumor RNA) substitution which might be attributed to ADAR, in fact ADAR carries out adenosine to inosine editions and inosine is read as guanosine by sequencing instruments. Intriguingly, in the case of COG3, a thymine (in tumor DNA) was replaced by a cytidine (in tumor RNA) which cannot be carried out by ADAR nor APOBEC enzymes because APOBEC converts cytosine to uracile in RNA sequences. Personalized omics profiling also concurred on the fact that RNA-editing is extensively carried out in peripheral blood mononuclear cells with more than 2300 target sites and approximately 50% of them were not typical ADAR or APOBEC edits . Deregulation of RNA-editing, e.g. adenosine to inosine hypoediting, was also reported in tumors . To identify mRNA with greater SBS frequencies in cancer, we performed a bioinformatics analysis of 7.6 million tags produced by 87 human L-SAGE  experiments (a molecular biology method using Sanger sequencing), and 67.8 million tags generated by 15 human Tag-seq experiments (a combination of L-SAGE and deep sequencing) [18, 19]. Both L-SAGE and Tag-seq generate short sequences that are likely localized on the 3’ boundary of transcripts. Therefore, L-SAGE and Tag-seq may prove useful to detect SBS introduced in the 3’ boundary of transcripts. Briefly, tags are short sequences of 17 bases which are signatures of 3' polyadenylated transcripts expressed in cells. The most 3’ NlaIII “CATG” motif in the transcript sequence is directly followed by the 17 base tag. Moreover, tag counts and mRNA expression levels are correlated. Comparing tags to RT sequences, i.e. tags matching the human genome exactly once, we showed that a plethora of transcripts had greater SBS frequencies in cancer cells. Although the genomic sequences of the tumor and the healthy cells were not simultaneously available in our study, these SBS could not be attributed to known SNP, catalogued cancer related somatic mutations, and known APOBEC1 or ADAR editing. ST proportions, i.e. proportions of tags sequenced only once were calculated for each experiment and were used as an indicator of SBS frequency. Interestingly, among healthy cells, ESC had the lowest ST proportions which might indicate that transcriptional fidelity could be increased in ESC. Conversely, the greatest ST proportions were observed in cancer cells. Finally, focusing on a series of 4 L-SAGE experiments carried out on the biopsies of a single patient at 1 healthy and 3 consecutive tumor stages, we were able to demonstrate that SBS frequencies significantly increased during cancer progression.
L-SAGE and Tag-seq experiments
The GPL1485 platform of the NCBI Gene Expression Omnibus (GEO) server is a repository of L-SAGE and Tag-seq experiments carried out on human cells. In the GPL1485, the GSE1902 (L-SAGE) and GSE15314 (Tag-seq) series of experiments were selected. All experiments were carried out using the NlaIII anchoring enzyme which cuts 3’ polyadenylated transcripts at CATG sites. Experiments were separated into 2 groups, namely healthy and cancer using a dictionary of cancer related terms: adenocarcinoma, cancer, carcinoma, dysplasia, fibroadenoma, glioblastoma, leukemia, lymphoma, medulloblastoma, melanoma, tumor, retinoblastoma and rhabdomyosarcoma. The Sybase system was used to store the tags of L-SAGE and Tag-seq experiments. Programs were run on a 6 × 4 Sun AMD Opteron processors (2.6 GHz) under the linux operating system.
Reference tags (RT)
RT were selected among the tags produced by the L-SAGE and Tag-seq experiments. Tags should fulfill 2 criteria to be selected i) presence in at least 75% of L-SAGE or 90% of Tag-seq experiments ii) exactly one match on the human genome sequence. Tags that fulfilled the first criteria were selected using a JAVA program and were subsequently aligned on the human genome using a blastn tool. Two distinct lists of RT were thus created, 1 for the L-SAGE and 1 for the Tag-seq experiments.
Single base substituted RT (sbsRT)
For each RT, and for each of the 17 base positions, a nucleotide was replaced by a "_" metacharacter. Thus, 17 distinct patterns were generated (Additional file 1). A Java program was written to automatically i) generate the 17 distinct patterns ii) retrieve from the database of L-SAGE and Tag-seq experiments all the tags that matched the patterns and iii) sum the tag counts. The risk that a sbsRT could match by chance a RT was calculated (Additional file 2) and equaled 6.5 × 10-5. Thus, any tag that was identical to a RT except at 1 base position, was very likely the result of a SBS that had occurred in this RT.
Testing for SBS frequency equality in transcripts expressed in healthy and cancer cells
Sum of counts of the RT across all healthy experiments Sc _ H _ RT = ∑ k = 1 H RT count in exp. k
Sum of counts of the RT across all cancer experiments Sc _ C _ RT = ∑  k = 1 CRT count in exp. k
Sum of counts of sbsRT (associated with the RT) across all healthy experiments Sc _ H _ sbsRT = ∑ k = 1 H ∑ i = 1 51sbsRTi count
Sum of counts of sbsRT (associated with the RT) across all cancer experiments Sc _ H _ sbsRT = ∑  k = 1 H ∑ i = 1 51sbsRTicount
sbsRT proportion across all healthy experiments
sbsRT proportion across all cancer experiments
Pearson’s chi-squared proportion test (Cancer > Healthy):
H0 : "sbsRT_prop_C equals sbsRT_prop_H"
H1: "sbsRT_prop_C is greater than sbsRT_prop_H"
Pearson’s chi-squared proportion test (Healthy > Cancer)
H0 : "sbsRT_prop_C equals sbsRT prop_H"
H1: "sbsRT_prop_H is greater than sbsRT_prop_C"
A script was written in the R environment to carry out the Pearson’s chi-squared proportion tests. For a RT, and thus a transcript, the Ho hypothesis was rejected when a p-value less than 0.025 was obtained. Three lists of RT were thus produced according to the decision of the Pearson’s chi-squared proportion test i) RT for which proportions of sbsRT were greater in cancer than in healthy, ii) RT for which proportions of sbsRT were greater in healthy than in cancer iii) RT for which proportions of sbsRT in cancer and healthy were not significantly different.
Global proportions of sbsRT
Global sbsRT proportions were tested for equality across different healthy tissues using the Analysis of Variance (Anova).
Single tags (ST)
ST are tags that were sequenced only 1 in a L-SAGE experiment, i.e. ST were associated with a count of 1. For each L-SAGE experiment, a list of ST could thus be defined and the proportion of ST on total tags could be calculated. ST was not reported in Tag-seq experiments. In fact, counts were greater than 1 which showed that ST had been discarded from Tag-seq experiments.
where n is the number of ST and total_tags is the sum of counts.
Known SNP that altered 17 base NlaIIItags of transcripts
A file of 17 base NlaIII tags associated with known SNP was provided by Dr. Anamaria Camargo. In this file, each line recorded a Genbank mRNA accession number, the NlaIII 17 base tag associated with the mRNA and the sequence of the tag with the known SNP. The file contained 4,697 entries. It was thus possible to identify sbsRT that were the result of known SNP.
Census of genes with cancer related somatic mutations
A census of somatically mutated genes in cancer was downloaded from the COSMIC database (v56). Known somatic mutations were recorded for 422 distinct genes which were identified by NCBI Gene ID. In our study, transcripts were identified with Genbank or RefSeq ID and thus were converted to NCBI gene ID using the Synergizer tool . Area proportional Venn diagrams were drawn to determine whether known somatically mutated genes were present among the genes with greater SBS frequencies. Bases that were somatically mutated in cancer and recorded by COSMIC were localized on transcript sequences and their proximity or inclusion to the 17 base NlaIII tag was determined.
Validated and predicted APOBEC1 and ADAR RNA-editing targets
APOBEC1 RNA-editing targets. A series of 32 editing sites in 30 distinct transcripts are known substrates for the Apoliprotein B-editing enzyme, catalytic polypeptide-1 (APOBEC1) in mouse. Using an APOBEC1 specific editing sequence pattern, namely WCWN2-4WRAUYANUAU (mooring sequence), which is located directly 3’ to the edited cytosine, Rosenberg B. R. et al. predicted 376 editing sites in 363 distinct mouse transcripts. Out of these 363 transcripts, ten were previously experimentally validated, in particular, the prototypic ApoB editing site. Thus 383 distinct mouse transcripts either predicted or validated APOBEC1 RNA-editing targets are available. However, our study was carried out on human sequences. Therefore, conservation of RNA-editing targets between human and mouse organisms was hypothesized. Human orthologues of mouse RNA-editing targets were retrieved from RefSeq by sequence similarity searches using blastn. Top scoring human transcripts were assumed to be orthologues of mouse transcripts targeted by the APOBEC1. RefSeq ID were then converted to NCBI gene ID with the Synergizer tool. A list of 361 unique NCBI gene ID was thus produced for the human transcripts. Venn diagrams were drawn to identify human transcripts which could be APOBEC1 RNA-editing targets and showing greater SBS frequencies in cancer or healthy cells. These transcripts were compared with the mouse orthologues to determine the local level of similarity between mouse and human mooring sequences. Pairwise sequence comparison was carried out using the Smith and Waterman local algorithm implemented in the water program of the EMBOSS package (gap opening penalty 10, gap extension penalty 0.5, EDNAFULL matrix). When the mooring sequences were conserved between mouse and human, the 17 base NlaIII tag was localized on the human transcript. Finally, proximity between the 17 base NlaIII tag and the mooring sequence was determined and the possibility that the 17 base NlaIII tag could be edited by the APOBEC1 enzyme was assessed.
ADAR RNA-editing targets. Most A-to-I susbstitutions occur within interspersed repetitive elements mainly in Alu sequences. Since RT match the human genome exactly once, they are very unlikely located in Alu repeats. Therefore, sbsRT may not be the result of ADAR RNA-editing.
Groups of healthy and cancer experiments
Reference Tags (RT)
2,930 tags were present in at least 75% of the 40 healthy and at least 75% of the 47 cancer L-SAGE experiments. Among these 2,930 tags, 1,966 matched the human genome sequence exactly once. Seven tags had a sequence composition bias and were discarded. Thus, 1,959 distinct tags were selected as RT (= L-SAGE list of RT). 11,967 tags were present in at least 90% of the 7 healthy and at least 90% of the 8 cancer Tag-seq experiments. Among these 11,967 tags, 8,806 matched the human genome sequence exactly once, 234 were discarded because of sequence composition bias and 8,572 distinct tags were selected as RT (=Tag-seq list of RT). 1,878 tags were common to both L-SAGE and Tag-seq lists of RT. In theory, a RT can generate 51 (= 3 × 17) possible distinct sequences by SBS, therefore each RT may be associated with 51 sbsRT. For each RT, the frequencies of sbsRT in both cancer and healthy cells were calculated. COG3 (alias SEC34) and SRP9 3’ polyadenylated transcripts were recorded in genbank with AF332595 and EF488978 accession numbers respectively. The 17 base NlaIII tags of SRP9 and COG3 transcripts were determined using genbank sequence records. However, SRP9 and COG3 17 base NlaIII tags were not present among the L-SAGE and Tag-seq lists of RT. Conversely, GAPDH, VIM, ENO1, HSPA8, TPT1, ATP5A1, FTL, TPI1, ALDOA and LDHA 17 base NlaIII tags were present among the L-SAGE or Tag-seq lists of RT.
Increased SBS frequencies in transcripts expressed in cancer cells
Testing SBS frequency equality in healthy (H) and cancer (C) cells for the 17 mRNA selected by Brulliard, M. et al. (2007)
Brulliard, M. et al. TI study using EST
C > H (3.67×10-115)
C > H (~0)
C > H
C = H
C > H (2.32×10-78)
C > H
C > H (3.48×10-3)
C < H (0.76×10-2)
C > H
C > H (9×10-9)
C > H
C > H (4.05×10-4)
C > H (~0)
C > H
C > H (1.51×10-15)
C > H (1.35x10-83)
C > H
C > H (1.5×10 -7)
C > H
C > H (1.14×10-52)
C > H (~0)
C > H
C = H
C < H (5.55×10-23)
C > H
C > H (6.98×10-14)
C > H (1.84×10-3)
C > H
C > H
C > H
3’ polyadenylated RNA record not available
C > H
C > H
C > H
C < H
C < H
Known cancer somatic mutations do not support increased SBS frequencies in mRNA
APOBEC1 or ADAR RNA-editing do not support increased SBS frequencies
Wide range of molecular functions potentially affected by increased SBS frequencies
For L-SAGE, 1,879 (96%) RT out of 1,959 could be associated with a transcript (=L-SAGE background list). 355 RT out of the 372 that showed greater SBS frequencies in cancer (=L-SAGE query list) associated with a transcript. GO analysis using DAVID  determined that the “Translation” biological process was over-represented among the 355 transcripts (p-value = 6×10-7, Benjamini-Hochberg = 10-3). The “Ribosome” cellular localization was also enriched (p-value = 1.8×10-5, Benjamini-Hochberg = 5.7×10-3). For Tag-seq experiments, 7,830 (91%) RT out of 8,572 were mapped to a transcript (=Tag-seq background list). Among the 4,289 RT that showed greater SBS in cancer, 3,953 could be associated with a transcript (=Tag-seq query list n°1). 1,053 (94%) out of the 1,123 RT that showed greater SBS in healthy cells associated with a transcript (=Tag-seq query list n°2). However, no GO term enrichment was present in both Tag-seq query lists. As a result, many different biological processes or molecular functions could be potentially represented among transcripts with greater SBS in cancer.
Increased diversity of SBS in transcripts expressed in cancer cells
Heterogeneity of ST proportions across healthy and cancer cells
Lowest ST proportions in transcripts expressed in ESC
Lowest SBS frequency in transcripts expressed in ESC
1,748; 583; and 860 RT out of the 1,959 that were selected in the L-SAGE experiments were present in 100% of the 10 ESC, 100% of the 12 breast and 100% of the 10 WBC experiments, respectively. For each experiment, a global sbsRT proportion was calculated and the means were determined, i.e. 0.14 (breast), 0.15 (WBC) and 0.058 (ESC) (Figure 6b). ESC had thus the lowest mean. We tested the significance of the differences between global sbsRT proportion means across the 3 cell types. The hypothesis of normal distributions for the transformed global proportions calculated on breast, WBC and ESC were accepted with a Shapiro-Wilk test (p-value = 0.30, 0.79 and 0.22 respectively). However, the equality of variance was rejected by a Bartlett test (p-value = 0.005). A non-parametric Kruskal-Wallis test rejected the equality between the transformed global sbsRT proportion means in breast, WBC and ESC with a 5.4 x 10-5 p-value. This showed that ESC had a SBS frequency in transcripts that was significantly different from the other two cell types.
ST proportions and SBS frequencies correlate and increase during cancer progression
In the present study, we provide evidence for an increased frequency of SBS that occur in a population of transcripts expressed in cancer cells. Known SNP, catalogued cancer related somatic mutations and predicted or validated targets of RNA-editing enzymes did not support the increased SBS frequency in cancer. However, the transcripts but not the genome of healthy and tumors cells were available and thus transcript and genome sequences both originating from the same patient could not be directly compared. To fully confirm that increased base conflicts exist between transcript and genome sequences in patient’s tumors, back-to-back exome sequencing and RNA-seq would be required. Using Tag-seq, 1,123 RT had greater SBS in healthy than in cancer cells, therefore questioning the reliability of this result. In fact, ST had been removed from Tag-seq experiments recorded in GEO and thus 30% of the tags data was unavailable. As ST represent a reservoir of SBS, their removal may have introduced a bias in sbsRT accounting. Moreover, slight heterogeneity of sequencing quality between platforms cannot be excluded. Some Tag-seq experiments carried out in healthy cells may have been produced with poor sequencing quality and thus may have introduced more SBS than in cancer cells. Finally sequence biases such as read redundancy have been reported in deep sequencing. Using RNA-seq, read redundancy can be cleaned by bioinformatics programs. Conversely, tag redundancy produced by deep sequencing bias cannot be cleaned in Tag-seq experiments. ST have been considered as low quality sequences, i.e. enriched in sequencing errors and may be excluded from analysis by standard bioinformatics procedures. Here, we agree with previous statements that in fact valuable information is available in ST . Furthermore, L-SAGE and Tag-seq may be so sensitive that they can detect base errors introduced by the cell transcriptional machinery or RNA-editing. ST are thus an archive of mRNA sequence alterations either due to sequencing errors, TI, or RNA-editing and should not be sacrificed for the benefit of disk space sparing. Moreover, the proportion of ST per experiment has proved to be an accurate indicator of SBS frequency in transcripts. An unexpected high level of SBS in tags produced by L-SAGE experiments had already been reported in a previous study . Using 29 publicly available L-SAGE libraries - that were also used in our study - and aligning the tags on the human genome sequence, the conclusion that the sequencing error rate might have been underestimated was drawn since a large number of tags did not match the genome after having taken into account the currently accepted 1% base error rate of L-SAGE tags. However, in this previous study both healthy and cancer experiments were mixed, i.e. cancer was not suspected to introduce additional SBS in transcripts. The molecular mechanism underlying increased TI in cancer is still elusive. Brulliard et al. speculated that increased TI might be due to defective transcription assisted proofreading activity. In fact, transcriptional fidelity relies i) on the ability of RNA polymerases to select the correct base before incorporation, ii) to impair RNA extension beyond a mismatch, iii) to cleave a mismatched base at the RNA 3’ boundary and resume RNA synthesis [29, 30]. Dysfunction at any of these 3 crucial steps is likely to compromise RNA sequence integrity. However, cancer related somatic mutations have not been reported so far in genes coding for RNA polymerases. Conversely, mice deficient for DNA polymerase δ proofreading activity have been associated with a high incidence of epithelial cancer . Mutations in genes that code for proteins involved in mRNA synthesis could be searched in patients showing an increased SBS frequency. In ESC, the transcription of the genome is globally hyperactive . No information has been made available on transcriptional fidelity in ESC. Comparing SBS frequencies across different cell types, we uncovered that ESC had a very low SBS frequency. This finding is in favor of a transcriptional fidelity which might be greater in ESC than in differentiated cells. We provided strong evidences that SBS frequency is significantly increased for a population of transcripts expressed in cancer cells. However, further investigations are required to determine whether this feature is common to all cancers or whether it is only present in some malignancies or in a subset of patients.
SBS frequency in transcript sequences is heterogenously distributed across cells, i.e. ESC have the lowest, cancer cells have the greatest and healthy differentiated cells may lie “in between”. Therefore, SBS frequency in transcript sequences could represent a new cancer specific biomarker which may be useful to characterize patient’s tumors. With the reduction of sequencing cost, cancer diagnostic could be aided by the determination of SBS frequency in transcripts expressed in tumors. In the future, drugs or gene therapies which may prove particularly efficient to treat patient’s tumors showing increased SBS frequency in transcripts could be valuable and thus intensively searched.
Long serial analysis of gene expression
Single base substitution
Single base substituted reference tag
Single nucleotide polymorphism
Embryonic stem cells
White blood cells
Gene expression omnibus
Expressed sequence tag
Low grade dysplasia
This work was supported by the National Institute of Health and Medical Research (INSERM), the National Center of Scientific Research (CNRS) and the Strasbourg University (UdS). We are grateful to Prof. Anirban Maitra and Dr. Hector Alvarez who produced the single patient L-SAGE experiments and to Dr. Anamaria Camargo who provided us with L-SAGE tags associated with SNP. We thank Dr. Susan Park and Dr Julie D. Thompson for helpful comments and manuscript correction. We express also our gratitude to Dr. Wolfgang Raffelsberger and Dr. Céline Keime for help on biostatistics. Finally, L. Bianchetti would like to thank Dr. Christelle Thibault-Carpentier, Madame Marie-Ange Luc (INSERM) and Madame Anne Bara (INSERM) for their support.
- Vogelstein B, Kinzler KW: Cancer genes and the pathways they control. Nat. Med. 2004, 10 (8): 789-799. 10.1038/nm1087.View ArticlePubMedGoogle Scholar
- McCulloch SD, Kunkel TA: The fidelity of DNA synthesis by eukaryotic replicative and translesion synthesis polymerases. Cell Res. 2008, 18 (1): 148-161. 10.1038/cr.2008.4.View ArticlePubMedPubMed CentralGoogle Scholar
- Alic N, Ayoub N, Landrieux E, Favry E, Baudouin-Cornu P, Riva M, Carles C: Selectivity and proofreading both contribute significantly to the fidelity of RNA polymerase III transcription. Proc. Natl. Acad. Sci. USA. 2007, 104 (25): 10400-10405. 10.1073/pnas.0704116104.View ArticlePubMedPubMed CentralGoogle Scholar
- The International SNP Map Working Group: A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature. 2001, 409: 928-933. 10.1038/35057149.View ArticleGoogle Scholar
- Kevanon K, Eisenberg E, Rechavi G, Levanon EY: Letter from the editor: adenosine-to-inosine RNA editing in Alu repeats in the human genome. EMBO reports. 2005, 6 (9): 831-835. 10.1038/sj.embor.7400507.View ArticleGoogle Scholar
- Rosenberg BR, Hamilton CE, Mwangi MM, Dewell S, Papavasiliou FN: Transcriptome-wide sequencing reveals numerous APOBEC1 mRNA-editing in transcript 3’ UTRs. Nature Struct. & Mol. Biology. 2010, 18 (2): 230-238.View ArticleGoogle Scholar
- Hanahan D, Weinberg RA: The hallmarks of cancer. Cell. 2000, 100 (1): 57-70. 10.1016/S0092-8674(00)81683-9.View ArticlePubMedGoogle Scholar
- Martin SA, Hewish M, Lord C, Ashworth A: Genomic instability and the selection of treatments for cancer. J. Pathol. 2010, 220 (2): 281-289.PubMedGoogle Scholar
- Davies JJ, Wilson IM, Lam WL: Array CGH technologies and their applications to cancer genomes. Chromosome Res. 2005, 13 (3): 237-248. 10.1007/s10577-005-2168-x.View ArticlePubMedGoogle Scholar
- Pleasance ED, Cheetham RK, Stephens PJ, McBride DJ, Humphray SJ, Greenman CD, Varela I, Lin ML, Ordonez GR, Bignell GR, et al: A comprehensive catalogue of somatic mutations from a human cancer genome. Nature. 2010, 463 (7278): 191-196. 10.1038/nature08658.View ArticlePubMedGoogle Scholar
- Wood LD, Parsons DW, Jones S, Lin J, Sjöblom T, Leary RJ, Shen D, Boca SM, Barber T, Ptak J, et al: The genomic landscapes of human breast and colorectal cancers. Science. 2007, 318 (5853): 1108-1113. 10.1126/science.1145720.View ArticlePubMedGoogle Scholar
- Futreal PA, Coin L, Marshall M, Down T, Hubbard T, Wooster R, Rahman N, Stratton MR: A census of human cancer genes. Nature Reviews. 2004, 4: 177-183. 10.1038/nrc1299.PubMedPubMed CentralGoogle Scholar
- Brulliard M, Lorphelin D, Collignon O, Lorphelin W, Thouvenot B, Gothié E, Jacquenet S, Ogier V, Roitel O, Monnez JM, et al: Non-random variations in human cancer ESTs indicate that mRNA heterogeneity increases during carcinogenesis. Proc. Natl. Acad. Sci. USA. 2007, 104 (18): 7522-7527. 10.1073/pnas.0611076104.View ArticlePubMedPubMed CentralGoogle Scholar
- Shah SP, Morin RD, Khattra J, Prentice L, Pugh T, Burleigh A, Delaney A, Gelmon K, Guliany R, Senz J, et al: Mutational evolution in a lobular breast tumour profiled at single nucleotide resolution. Nature. 2009, 461 (7265): 809-10.1038/nature08489.View ArticlePubMedGoogle Scholar
- Chen R, Mias GI, Li-Pook-Than J, Jiang L, Lam HYK, Chen R, Miriami E, Karczewski KJ, Hariharan M, Dewey FE, et al: Personal omics profiling reveals dynamic molecular and medical phenotypes. Cell. 2012, 148: 1293-1307. 10.1016/j.cell.2012.02.009.View ArticlePubMedPubMed CentralGoogle Scholar
- Paz N, Levanon EY, Amariglio N, Heimberger AB, Ram Z, Constantini S, Barbash ZS, Adamsky K, Safran M, Hirschberg A, et al: Altered adenosine-to-inosine RNA editing in human cancer. Genome Res. 2007, 17: 1586-1595. 10.1101/gr.6493107.View ArticlePubMedPubMed CentralGoogle Scholar
- Saha S, Sparks BA, Rago C, Akmaev V, Wang CJ, Vogelstein B, Kinzler KW, Velculescu VE: Using the transcriptome to annotate the genome. Nat. Biotechnology. 2002, 20 (5): 508-152. 10.1038/nbt0502-508.View ArticleGoogle Scholar
- Morrissy S, Zhao Y, Delaney A, Asano J, Dhalla N, Li I, McDonald H, Pandoh P, Prabhu AL, Tam A, et al: Digital Gene Expression by Tag sequencing on the Illumina Genome Analyzer. Curr. Protoc. Hum. Genet. 2010, 65: 11.11.1-11.11.36.Google Scholar
- Nielsen KL, Hogh AL, Emmersen J: DeepSAGE– digital transcriptomics with high sensitivity, simple experimental protocol and multiplexing of samples. Nucleic Acids Res. 2006, 34 (19): e133-10.1093/nar/gkl714.View ArticlePubMedPubMed CentralGoogle Scholar
- Berriz GF, Roth FP: The Synergizer service for translating gene, protein and other biological identifiers. Bioinformatics. 2008, 24 (19): 2272-2273. 10.1093/bioinformatics/btn424.View ArticlePubMedPubMed CentralGoogle Scholar
- Barrett T, Troup DB, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Philippy KH, Sherman PM, et al: NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res. 2009, 37: 885-90. 10.1093/nar/gkn764.View ArticleGoogle Scholar
- Bianchetti L, Wu Y, Guérin E, Poch O: SAGETTARIUS: a program to reduce the number of tags mapped to multiple transcripts and to plan SAGE sequencing stages. Nucleic Acids Res. 2007, 35 (18): e122-10.1093/nar/gkm648.View ArticlePubMedPubMed CentralGoogle Scholar
- Forbes SA, Tang G, Bindal N, Bamford S, Dawson E, Cole C, Kok CY, Jia M, Ewing R, Menzies A, et al: COSMIC (The Catalogue of Somatic Mutations in Cancer) a resource to investigate acquired mutations in human cancer. Nucleic Acids Res. 2010, 38: 652-657.View ArticleGoogle Scholar
- Silva AP, De Souza JE, Galante PA, Riggins GJ, De Souza SJ, Camargo AA: The impact of SNPs on the interpretation of SAGE and MPSS experimental data. Nucleic Acids Res. 2004, 32 (20): 6104-6110. 10.1093/nar/gkh937.View ArticlePubMedPubMed CentralGoogle Scholar
- Huang DW, Sherman BT, Lempicki RA: Systematic and integrative analysis of large gene lists using DAVID bioinformatics resource. Nature Protocols. 2008, 4 (1): 44-57. 10.1038/nprot.2008.211.View ArticleGoogle Scholar
- Alvarez H, Montgomery EA, Karikari C, Canto M, Dunbar KB, Wang JS, Feldmann G, Hong SM, Haffner MC, Meeker AK, et al: The AxI receptor tyrosine kinase is an adverse prognostic factor and a therapeutic target in esophageal adenocarcinoma. Cancer Biology & Therapy. 2010, 10 (10): 1009-1018. 10.4161/cbt.10.10.13248.View ArticleGoogle Scholar
- Wang SM: Understanding SAGE data. Trends in genetics. 2006, 3 (1): 42-50.Google Scholar
- Keime C, Sémon M, Mouchiroud D, Duret L, Gandrillon O: Unexpected observations after mapping LongSAGE tags to the human genome. BMC Bioinformatics. 2007, 8 (154): 1471-2105.Google Scholar
- Thomas MJ, Platas AA, Hawley DK: Transcriptional fidelity and proofreading by RNA polymerase II. Cell. 1998, 93 (4): 627-37. 10.1016/S0092-8674(00)81191-5.View ArticlePubMedGoogle Scholar
- Sydow JF, Cramer P: RNA polymerase fidelity and transcriptional proofreading. Current Opinion in Structural Biology. 2009, 19 (6): 732-739. 10.1016/j.sbi.2009.10.009.View ArticlePubMedGoogle Scholar
- Goldsby RE, Hays LE, Chen X, Olmsted EA, Slayton WB, Spangrude GJ, Preston BD: High incidence of epithelial cancers in mice deficient for DNA polymerase δ proofreading. PNAS. 2002, 99 (24): 15560-15565. 10.1073/pnas.232340999.View ArticlePubMedPubMed CentralGoogle Scholar
- Efroni S, Duttagupta R, Cheng J, Dehghani H, Hoeppner DJ, Dash C, Bazett-Jones DP, Le Grice S, McKay RD, Buetow KH, et al: Global transcription in pluripotent embryonic stem cells. Cell. 2008, 2: 437-447.Google Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2407/12/509/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.