Different regression equations relate age to the incidence of Lauren types 1 and 2 stomach cancer in the SEER database: these equations are unaffected by sex or race

Background Although impacts upon gastric cancer incidence of race, age, sex, and Lauren type have been individually explored, neither their importance when evaluated together nor the presence or absence of interactions among them have not been fully described. Methods This study, derived from SEER (Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute) data, analyzed the incidences of gastric cancer between the years 1992–2001. There were 7882 patients who had developed gastric cancer. The total denominator population was 145,155, 669 persons (68,395,787 for 1992–1996, 78,759,882 for 1997–2001). Patients with multiple tumors were evaluated as per the default of the SEER*Stat program. 160 age-, five year period (1992–1996 vs 1997–2001)-, sex-, race (Asian vs non-Asian)-, Lauren type- specific incidences were derived to form the stratified sample evaluated by linear regression. (160 groups = 2 five year periods × 2 race groups × 2 sexes × 2 Lauren types × 10 age groups.) Linear regression was used to analyze the importance of each of these explanatory variables and to see if there were interactions among the explanatory variables. Results Race, sex, age group, and Lauren type were found to be important explanatory variables, as were interactions between Lauren type and each of the other important explanatory variables. In the final model, the contribution of each explanatory variable was highly statistically significant (t > 5, d.f. 151, P < 0.00001). The regression equation for Lauren type 1 had different coefficients for the explanatory variables Race, Sex, and Age, than did the regression equation for Lauren type 2. Conclusion The change of the incidence of stomach cancer with respect to age for Lauren type 1 stomach cancer differs from that for Lauren type 2 stomach cancers. The relationships between age and Lauren type do not differ across gender or race. The results support the notion that Lauren type 1 and Lauren type 2 gastric cancers have different etiologies and different patterns of progression from pre-cancer to cancer. The results should be validated by evaluation of other databases.

The Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute is an authoritative source of information on cancer incidence and survival in the United States that currently collects and publishes cancer incidence and survival data from 14 population-based cancer registries and three supplemental registries covering approximately 26 percent of the US population; the SEER website provides extensive information about it [42].
The study used SEER to evaluate the contributions of age, sex, race (Asian vs non-Asian), year of diagnosis (1992-1996 vs 1997-2001), and Lauren type to gastric cancer incidence. The study showed Lauren type 1 tumor incidence increased with respect to age in a different way than did Lauren type 2 tumor incidence; the regression equations that described these relations were the same for men and women and for Asians and non-Asians. Incidence was considered in terms of the natural logarithms of the rates of development, over a five year period, of stomach cancer.

Data acquisition
The SEER data base, SEER 11 Regs + AK Public-Use, Nov 2003 Sub for Expanded Races (1992Races ( -2001 was used [44]. The analysis was limited to persons with stomach cancer 40 years and older. The SEER*STAT program stratified persons who developed stomach cancer and the underlying population from which they emanated, into ten age groups, two sexes, two races, two Lauren types, and two five year periods. This produced two numbers for each of 160 age, race, sex, Lauren type, and five year period groups, a number of persons who developed stomach cancer and an underlying number of persons in the denominator. Lauren type 1 was defined as those patients whose cancers showed intestinal morphology (M-8144); Lauren type 2 was defined as those patients whose cancers showed diffuse (M-8145), signet ring cell (M-8490), or linitis plastica (M-8142) morphology. The ten age groups were defined as follows: 40-44 (group 1), 45-49 (group 2), 50-54 (group 3), 55-59 (group 4), 60-64 (group 5), 65-69 (group 6), 70-74 (group 7), 75-79 (group 8), 80-84 (group 9), and 85+ (group 10). The relative lack of blacks who had stomach cancer meant many analyzed cells would be zero, making the analysis suspect. The two races evaluated were Asian and non-Asian. Persons of unknown race were excluded. Multiple primaries were handled by the default set by the SEER*STAT program. The two five-year periods comprised the years 1992-1996 and 1997-2001.

Statistical methods
Software R was used for data analysis.

Data transformation
Counts and population provided by SEER*STAT were used to calculate rates. Preliminary data analysis showed the distribution of rates lacked a normal distribution. One cell had no persons with cancer; to take the natural logarithms of rates in a circumstance in which a zero cell is present, one may increase for all cells the numerator and the denominator by 0.5 [44]: ln(ca) = ln [(persons with cancer + 0.5)/(persons in denominator + 0.5)]

Model selection
All models tested were linear regression models with the response variable being the logarithms of the cancer rates, as defined above. Analysis of variance determined which model best reflected the data. Covariates were always added to regression; the ratio of the change in the residual sum of squares (∆RSS) to the RSS before the covariate was added was compared with an F test. Independent (explanatory or predictor) variables included: A) Five year period (1992-1996 = 0, 1997-2001 = 1), B) Sex (men = 0, women = 1), C) Race (Non Asian = 0, Asian = 1), D) Lau-ren type (type 1 = 0, type 2 = 1), E) Age Group (40-44 = 1, 45-49 = 2, 50-54 = 3, etc.). All ten possible two-way variable interactions were assessed. The null hypothesis was rejected if P < 0.05. Neither higher-order explanatory variable interactions nor non-linear relationships of explanatory variables to the response variable were considered in this analysis.
Evaluation of the data precluded the performance of Poisson regression: the mean number of patients with cancer was 49; the variance was 2224. When Poisson regression was tried with population as an offset, with or without the zero cell, all five potential explanatory variables (five year period, sex, race, Lauren type, and age) and all ten potential first order interactions were associated with the counts; each explanatory variable had an associated z > 600, P < 1 × 10 -10 . A residual plot showed the model lacked a good fit.

Model adequacy
Standardized residuals were calculated by R. To test the assumption that the standardized residuals were normally distributed, a Shapiro-Wilks test was performed. To test the assumption that the mean of the standardized residuals was 0, a t-test was performed. To test the assumption that the standardized residuals had constant variance with   1996 = 1992, 1993, 1994, 1995, 1996 1997-2001 = 1997, 1998, 1999, 2000

Raw data
The distribution of persons who developed cancer, stratified by Lauren type, five year period, sex, and race, is displayed in Table 2. The distribution of the denominator population, stratified by five year period, sex, and race, is shown in Table 3. Age group distributions for the persons who developed cancer and the denominator population are displayed in Table 4.

Summary of model building
Initial evaluation showed rates lacked a normal distribution (Shapiro-Wilks W = 0.76, P < 0.0001). As discussed in the model adequacy section, the use of the logarithms of the rates yielded a model that fulfilled the assumptions of linear regression, once an outlier was removed; the residuals of that model did not prove to lack a normal distribution, did not have a mean that proved to differ from zero, and did not prove to lack homogeneity of variance. Table 5 displays the results of the univariate analyses. Significant associations between the natural logarithms of cancer rates and age group, race, Lauren type, and sex, but not five year period, were identified. Table 6 displays the results, with and without the outlier identified at model adequacy assessment, of ANOVA comparisons of sequential models; the results confirmed those of the univariate analyses. Table 7 displays the results of the analysis of interaction covariates. ANOVA demonstrated significant decreases in residual sum of squares when interactions of Lauren type and age group, Lauren type and race, and Lauren type and sex were added to regression. No other first order interaction was shown by ANOVA to decrease RSS sufficiently to reject the null hypothesis.

Final model
The final model did not include five year period as an explanatory variable because 1) there was no association between the natural logarithms of the cancer rates and the five year period and 2) there was no demonstrated interaction between five year period and any other explanatory variable. The final model, displayed in Table 8, included the four other main effects (ME) covariates and the four interaction covariates found to significantly reduce residual sum of squares (RSS). Lauren type interacted with the group of explanatory variables that include Race, Age, and Sex, which did not interact with one another. Hence, Lauren type 1 and type 2 cancers differ as regards the shapes of the age distributions, sex, and race. When Lauren type     Removing the outlier yielded the same choice of covariates.
Quantitative assessments of model adequacy are displayed in Table 9. The final model with the outlier had a set of standardized residuals with non-constant variance, when evaluated by Bartlett's test; the model without the outlier did not show this failure of model assumption.
The final model with the outlier had a set of standardized residuals without a normal distribution, when evaluated by the Shapiro-Wilks test; the model without the outlier did not show this failure of model assumption. Neither the model with nor the model without the outlier had standardized residuals whose means differed from zero,

Discussion
This study found, in the SEER database, that Lauren type 1 and Lauren type 2 stomach cancers differ to such a degree that different regression equations are required to explain variations in their incidences. Sex, race (Asian or non-Asian), and age are explanatory variables, but the equations that relate these explanatory variables to the incidence of each Lauren type differ. Recent epidemiologic studies well support the rationale of the current study, namely to evaluate year of diagnosis (in this case five year period), sex, race, age and Lauren type. The articles also support the need for evaluation of interactions and also provide interesting thoughts about the limita-tions of administrative databases and other factors that should be considered in future studies.
Year of diagnosis Boyle [4] found stomach cancer in general was declining in incidence in Europe, as did Faycal [5], Pineros [6] in Columbia, Ardanaz [7] in Navarro, and Stracci [8] and Crocetti [9] in Italy. Liu's [10] results showed that over time, for both sexes, there were different trends for stomach cancer depending on the third of the stomach involved and the age of the patient. Henson [11] studied Lauren type 2 and Lauren type 1 incidences over time, revealing that the changes over time differed between the two Lauren types. Sunny's [12] study of Indian men and women revealed different rates of decline in stomach cancer, demonstrating an interaction of time of diagnosis and sex. Levi [13], using joinpoint regression analysis, found Plot of the natural logarithms of cancer rates, denoted as ln(ca), as a function of age in years for Asian men Figure 1 Plot of the natural logarithms of cancer rates, denoted as ln(ca), as a function of age in years for Asian men. that the fall over time in gastric cancer rates was proportionally greater for older than for younger persons, although all showed a decline. The results of the study of Kobayashi [14], by contrast, gave the lion share of the decrement in gastric cancer for the young. For the Greenland Inuit, stomach cancer rates appear to have increased [15]. This study of SEER data did not show time of diagnosis or an interaction of any factor and time of diagnosis to be an important explanatory variable for the incidence of stomach cancer over the decade 1992-2001 in the United States.

Sex
Marmo [16], Turkdogan [17], Faycal [5], Bani-Hanu [21], and Dobru [18] all showed men at greater risk for stomach cancer than women. Among Epstein Barr Virus positive cancers, the gender difference exists, but is only statistically significant for Lauren type 2 gastric cancers [19]. A family history of stomach cancer would appear to place women, but not men at increased risk [20]. Alaskan Native American men differ less from other American men than do Alaskan Native American women, demonstrating an interaction of sex and race [22]. Green tea consumption appears to protect women, but not men, from gastric cancer [23]. Japanese men appear to have a greater increase in risk as they age than do Japanese women [24]. Khan [25] found that different foods for men than for women increased the risk of gastric cancer. This study of SEER data identified gender as an important explanatory variable for the incidence of stomach cancer.

Race
Ciliated metaplasia, a precursor to stomach cancer, occurs at different rates in the Pacific and Atlantic basins [26].
Plot of the natural logarithms of cancer rates, denoted as ln(ca), as a function of age in years for Asian women Figure 2 Plot of the natural logarithms of cancer rates, denoted as ln(ca), as a function of age in years for Asian women. Yao [27] showed that Hispanics with stomach cancer differed in age than other persons and that Asians differed in survival than other persons. Multiple studies have placed Asians at greater risk of gastric cancer [28][29][30]. When a known risk factor, such as H. pylori, becomes universally acquired, it ceases to be a risk factor; this has been shown to have occurred in Koirea [31]. This study of SEER data did not find an interaction between race and sex, but did identify race as an explanatory variable for stomach cancer incidence.
As to black race, some have suggested that Caucasians are more likely than blacks to develop gastric cancer that arises in the cardia and that blacks are more likely than Caucasians to develop gastric cancer that arises outside the cardia [32,33]. There were insufficient patients in this study to subdivide the analysis by site within the stomach or to separately analyze black persons. To evaluate the importance of black race, studies would need 1) to have more black patients and 2) to take into account whether the cancer originated in the gastric cardia or not. The location of origin would be of interest in itself as immunohistochemical patterns of cardia and non-cardia gastric cancers differ [34].

Age
Older persons more likely develop ciliated metaplasia than do young persons [26]. Multiple studies have shown that age is a vitally important factor to consider as regards the risk of stomach cancer [18,35,21]. In terms of other risk factors, there is good reason to think that H pylori's effect declines with age [36] and that acquisition of H pylori after age 1 may be less important in carcinogenesis than is acquisition before age 1 [37]. Marmo [16] and Plot of the natural logarithms of cancer rates, denoted as ln(ca), as a function of age in years for non-Asian men Figure 3 Plot of the natural logarithms of cancer rates, denoted as ln(ca), as a function of age in years for non-Asian men.  Tanaka [24] demonstrated an interaction of age and sex. Levi [13], Kobayashi [14], and Liu [10] showed an interaction of age and time of diagnosis. A prior study [38] suggested that the effect of environmental carcinogens is largely limited to childhood. The latter is in accord with our study, which would suggest that a person's risk of cancer is set at or below the age of 40 and that its expression occurs at predictable increments thereafter. This study of SEER data found age to be an explanatory variable for stomach cancer incidence.

Lauren type
Loss of CDX2 may represent a marker of tumor progression in early gastric cancer and carcinomas with an intestinal, but not a non-intestinal phenotype [39]. The frequency of ciliated metaplasia differs between intestinal and non-intestinal stomach cancer types [26]. For some nations, Lauren type 1 cancer was more common than Lauren type 2 [18,40]. Yao [27] showed Hispanics differed in Lauren type from other persons. An interaction with between time of diagnosis and Lauren type exists [5,11]. van Beek [41] found that Epstein Barr Virus associated cancer was more frequently associated with Lauren type 1 than with Lauren type 2 adenocarcinoma. This study of SEER data found interactions of Lauren type and race, age, and gender to be of such importance that two different regression equations had to be created to describe the data. The importance of separating Lauren types from one another lies in part in the demonstrated multiple interactions between Lauren type and so many other variables, both those found in this study and those found in recent epidemiologic studies.
Plot of the natural logarithms of cancer rates, denoted as ln(ca), as a function of age in years for non-Asian women Figure 4 Plot of the natural logarithms of cancer rates, denoted as ln(ca), as a function of age in years for non-Asian women. Red refer-  The above discourse allows one to appreciate the limitations and utility of the study. SEER is, like many of the sources of the other studies, an administrative database. Administrative databases lack a review of histopathology; the added loss of precision is unavoidable because such a review would increase the expense of any such study and decrease participation by hospitals, largely invalidating its results. As expected, specific program codes are not available on line for investigators, reviewers, readers, and editors to explore issues that may be important to them, such as the means of creation of a denominator in rate calculations. The website is excellent, but might also include readily accessible links to the data registries themselves and their policies and procedures, so investigators, reviewers, editors, and readers can satisfy any questions they might have as to such matters as data collection or the particular manner of dealing with multiple primaries for a particular study. No administrative database can keep a record of such things as H pylori rates, genetic markers, food intake, or any of the other above miscellaneous factors identified. As with any study, the number of factors that can be evaluated is limited both for reasons of data collection and for statistical reasons having to do with sample size; for this reason, a global explanation encompassing all potential factors cannot be expected. The most any epidemiologic study can offer is a partial explanation of complex phenomena. Most vital, the above referenced recent studies show that any conclusion derived by examination of a particular population must be verified by evaluation of multiple populations. This is because factors that are important in one population may be unimportant in another population; only by repeating an analysis in multiple populations can an epidemiologic conclusion be considered verified. Notwithstanding these caveats, such studies of epidemiology have great practical significance; Marmo [16] used such results to design a screening protocol for stomach cancer based on age and sex so as to reduce cost.

Conclusion
In summary, two regression equations were derived from the SEER database to explain differences in stomach cancer incidence, one for Lauren type 1 stomach cancers, one for Lauren type 2 stomach cancers. Each regression equation revealed a simple relationship between the natural logarithm of stomach cancer incidence rates and age. These equations were the same for men and women and for Asians and non-Asians. These results should be verified by similar evaluations conducted in other populations.