Geospatial analysis, web-based mapping and determinants of prostate cancer incidence in Georgia counties: evidence from the 2012–2016 SEER data

Prostate cancer (CaP) cases are high in the United States. According to the American Cancer Society, there are an estimated number of 174,650 CaP new cases in 2019. The estimated number of deaths from CaP in 2019 is 31,620, making CaP the second leading cause of cancer deaths among American men with lung cancer been the first. Our goal is to estimate and map prostate cancer relative risk, with the ultimate goal of identifying counties at higher risk where interventions and further research can be targeted. The 2012–2016 Surveillance, Epidemiology, and End Results (SEER) Program data was used in this study. Analyses were conducted on 159 Georgia counties. The outcome variable is incident prostate cancer. We employed a Bayesian geospatial model to investigate both measured and unmeasured spatial risk factors for prostate cancer. We visualised the risk of prostate cancer by mapping the predicted relative risk and exceedance probabilities. We finally developed interactive web-based maps to guide optimal policy formulation and intervention strategies. Number of persons above age 65 years and below poverty, higher median family income, number of foreign born and unemployed were risk factors independently associated with prostate cancer risk in the non-spatial model. Except for the number of foreign born, all these risk factors were also significant in the spatial model with the same direction of effects. Substantial geographical variations in prostate cancer incidence were found in the study. The predicted mean relative risk was 1.20 with a range of 0.53 to 2.92. Individuals residing in Towns, Clay, Union, Putnam, Quitman, and Greene counties were at increased risk of prostate cancer incidence while those residing in Chattahoochee were at the lowest risk of prostate cancer incidence. Our results can be used as an effective tool in the identification of counties that require targeted interventions and further research by program managers and policy makers as part of an overall strategy in reducing the prostate cancer burden in Georgia State and the United States as a whole.


Background
Prostate cancer is the leading diagnosis of malignancy and the second cause of mortality among American men, with an estimated national annual health care cost of $9.8 billion [1,2]. The United States Cancer Statistics reported 192,443 new cases of prostate cancer in 2016, with an incidence rate of 101 per 100,000 men, and 30, 370 prostate cancer deaths or 19 deaths per 100,000 during the same year [3]. Despite an overall decline in incidence across the United States since the early 1990s [4], there remain pockets of high prostate cancer burden.
In the United States, the state of Georgia has the second largest annual incidence rate of prostate cancer [3]. In 2016, there were 7160 reported new cases and 889 deaths in the state, with associated incidence and mortality rates of 133 and 23 per 100,000 men, respectively [3]. African American (AA) men not only have higher incidence of prostate cancer but also demonstrate 60% more mortality than white men, after controlling for incidence [5]. As 32% of Georgia consists of AA [6], it represents an unusual opportunity to investigate community factors associated with a high-risk population. Although a few studies have identified high prostate cancer incidence in the southwest of the state [7,8], the sociodemographic characteristics of these regions are not well described.
For the purpose of planning for prostate cancer interventions with limited health resources, it is important to characterize and identify predictors of high prostate cancer burden at the community level. The present study, therefore, aims to 1) model and map Georgia county incidence of prostate cancer, 2) evaluate county sociodemographic factors associated with high incidence of prostate cancer.

Data source and study population
We used the Surveillance, Epidemiology, End Results (SEER) population-based cancer registry, which is publicly available data to investigate county-level distribution of prostate cancer cases in the state of Georgia. For this ecological study, only newly diagnosed cases 40 years and older from January 1, 2012 through December 31, 2016 were used for this study, because case reporting to SEER from the greater Georgia area started in 2010 and at the time of analysis SEER's most current county attributes data spanned the 2012 to 2016 period. The greater Georgia area includes all counties in the state, except the 15 represented by the older Atlanta and Rural Georgia areas previously reported to SEER [9]. Therefore, since 2010 SEER captures cancer data from all 159 counties in Georgia. The SEER Georgia registry reports clinical, or preferentially pathologic diagnosis of cancer from eligible patient records in hospitals, laboratories and physician offices [10,11]. Patients must be Georgia residents at the time of diagnosis, even though the address of residence is not reported in the registry. Only patients with an International Classification of Diseases for Oncology, third edition, (ICD-O-3) with topography code C61 and behaviour code 3 were included for analysis. SEER, being one of the oldest registries in the country, represents the gold standard in reporting standards and data quality, with completeness rates of more than 97% [12][13][14].
SEER data are publicly available deidentified records of cancer cases. Permission was sought from and granted by SEER Program to access and use the data for this study. We did not attempt to identify, contact patients or link records to identifiable health information.

Outcome variable
The outcome variable is the number of incident prostate cancer cases per county. Detailed information is provided under the statistical analysis section.

Covariates
The covariates used in this study were county-level variables for the period 2012-2016 identified in the literature to be associated with the prostate cancer incidence [2,[15][16][17]. These included percentage of blacks in the counties, number of persons above 65 years of age in the counties, number of persons having at least a bachelor's degree in the counties, mean age at diagnosis, number of persons living below poverty in the counties, number of foreign born persons in the counties, percentage of the rural population in the counties, monthly median family income in the counties, and number of unemployed.

Statistical analysis
We employed a Bayesian geospatial model to investigate both measured and unmeasured spatial risk factors for prostate cancer among men residing in 159 counties in Georgia State.

Model formulation
We set Y i to be the observed counts of prostate cancer cases in county i and E i as the expected number of prostate cancer cases in county i. We implemented Besag-York-Mollié (BYM) model [18] to analyse the data. We assumed that Y i are conditionally independently Poisson distributed, and modelled as: where n is the number of counties (i.e n = 159) and θ i is the relative risk in county i. We expressed the logarithm of θ i as: where β 0 is the intercept parameter that represents the overall risk, d(.) is a vector of observed covariates, β is a vector of regression coefficients for the covariates, u i is a spatial structured effect component. We modelled the u i using conditional autoregressive (CAR) distribution given as: and v i is an unstructured spatial effect defined as v i ¼ Nð0; σ 2 v Þ. The relative risk θ i quantifies whether county i has higher (θ i > 1) or lower (θ i < 1) risk than the average risk in the reference population. We produced the probabilities of predicted relative risk being greater than a given threshold c (exceedance probabilities, i.e. P(θ i > c)).
Finally, we visualised the risk of prostate cancer by mapping the predicted relative risk and exceedance probabilities. We developed interactive web-based maps to guide optimal policy formulation and intervention strategies targeted at improving the survival of prostate cancer patients and the overall health of men in Georgia.
Using the Bayesian framework, we implemented our Poisson model through recommended strategies (i.e. Integrated Nested Laplace Approximation (INLA) with Stochastic Partial Differential Equation (SPDE)) [19,20]. We followed non-informative approach in choosing our priors due to lack of reliable prior information about all parameters, and thus used the default priors available in the R-INLA package. All the analyses were implemented in R-INLA package [21,22]. We used 95% Bayesian Credible intervals to declare statistical significance.

Sample characteristics
On average, 31.6% Georgia county residents were African American or black while the percentage of persons aged ≥65 years was 15.6%. The mean percentage of persons having at least bachelor's degree in the counties was 17.5% while the overall percentages of persons below poverty and foreign born were 21.6 and 4.6% respectively, and with an average of 60.% rural population among all counties. Overall, the median annual family income was $51,116 and the mean percentage of unemployed was 9.1% (Table 1).
Risk factors from non-spatial and spatial models Number of persons above age 65 years and below poverty, higher median family income, number of foreign born and unemployed were risk factors independently associated with prostate cancer risk in the non-spatial model (Fig. 1).
Except for number of foreign born, all these significant risk factors in the non-spatial model were also significant in the spatial model with the same direction of effects (Fig. 2).
Mapping predicted risk of prostate cancer incidence from the Bayesian spatial model Substantial geographical variations in prostate cancer incidence were found in the study (Fig. 3). In addition, we presented the web-based interactive map of Fig. 3 in the supplementary material online. The predicted mean relative risk (RR) was 1.20 with a range of 0.53 (95% CI: 0.34, 0.78) to 2.92 (95% CI: 2.13, 3.86). Individuals residing in Towns, Clay, Union, Putnam, Quitman, and Greene counties were at increased risk of prostate cancer incidence while those residing in Chattahoochee were at the lowest risk of prostate cancer incidence.
Presented in Figs. 4 and 5 are the predictive maps of the probability that the relative risk will exceed 1.5 and 2 respectively at a given county in the Georgia State. We also presented the web-based interactive map of Figs The probability that the relative risk will exceed 1.5 is highest in Union, Towns, Putnam, Greene and Quitman counties (Fig. 4). Also, the probability that the relative risk will exceed 2 is highest in Towns county with a probability of 0.99 (Fig. 5).

Discussion
The study sets out to use Bayesian geospatial methods to model and map prostate cancer incidence in Georgia counties, and to evaluate county sociodemographic factors associated with high incidence of prostate cancer for the purpose of optimal planning for prostate cancer interventions amidst limited public health resources. Critical risk factors for prostate cancer identified in the present study included number of persons above 65 years of age and below poverty, median family income and number of foreign born and the unemployed in counties. In contrast to previous studies [5,7], our study did not find an association between prostate cancer incidence and proportions of blacks and rural population.
One of the important aims of this study is identification of high-risk counties for public health interventions amidst limited public health resources. This is critical because residential location of people could act as a marker for the socioeconomic, personal, and climatic/environmental factors that influence access to healthcare services and the general health of the people. Thus, spatial modelling and mapping provides the required tools to obtain an improved understanding of health outcomes of people by place for targeted public health interventions [7,[23][24][25][26][27]. The predicted relative risk ranges from 0.53 (95% CI: 0.34, 0.78) in Chattahoochee to 2.92 (95% CI: 2.13, 3.86) in Towns with a mean of 1.20. The study identified   Towns (2.92) as the county with the highest prostate cancer incidence. Other counties with relatively high incidence include Clay (RR = 2.55), Quitman (RR = 2.39), Union (RR = 2.30), Greene (RR = 2.14) and Putnam (RR = 2.13) counties were at increased risk of prostate cancer incidence.  On closer examination of high risk prostate cancer counties, we observed that despite being predominantly white and better educated (25.1% with a Bachelor's degree) the main driver of risk in Towns County in the north of Georgia was its older population, reporting the largest proportion of persons at least 65 years of age (33.1%). While advancing age is a well-known risk factor for prostate cancer, Clay and Quitman Counties in also suggest that low educational attainment (7.4 and 8.5% with a Bachelor's degree), high unemployment (18.9 and 18.5%) and individual poverty (39.8 and 25.6%) may be additional risk factors in black communities. Exactly how these socioeconomic indices may impact prostate cancer risk within older black populations is not well known, but high cigarette use and alcohol consumption as well as poor diet have been hypothesized to mediate or moderate this risk [28]. Furthermore, risk factors of exposures to water, air and soil pollution from agricultural farming of cash crops such as cotton, from the southwest through to central Georgia, may also be involved [29]. As neighbouring lower risk counties with large or predominantly black populations likely shared these environmental conditions with Clay and Quitman, our modelling suggests that prostate cancer risk in both communities is multifactorial, resulting from a possible confluence of negative lifestyle, economic and environmental factors experienced over long periods of time.
In comparing the high-risk counties with Chattahoochee and rural low-risk counties, we observed that population age was the single most obvious distinction. Low risk counties had a smaller proportion of elderly persons, irrespective of whether they were classified as rural, and in particular, Chattahoochee had the youngest population (3.8% 65 years and older) with the highest educational attainment (30% with a Bachelor's degree).
Our study supports the findings of others that reported geographical differences in health outcomes such as prostate and lung cancers, malaria, malnutrition, mortality among others [5,7,[23][24][25]30]. Against the backdrop of a national reduction in incident prostate cancer, there remain pockets of high risk in the north, southwest as well as central areas of Georgia. The present study suggests that there may be racial differences in prostate cancer risk within counties. The aging population may be the main risk factor in overwhelmingly white counties while limited education and poverty may play a larger role in black counties. It should be noted that although several counties with large African American populations were observed to have a high-risk of prostate cancer incidence, the present study found no association between race and prostate cancer risk, in part because these counties tended to be considerably smaller than predominantly white counties. Importantly, this is an ecological study and the associations discussed herein should not be regarded as causal or necessarily significant at the level of individual prostate cancer patients. Prostate Specific Antigen (PSA) screening has driven prostate cancer diagnosis since the 1980s [31,32]. However, this reliance on PSA has come at the cost of overtreatment and its complications among many low risk men, and in May 2012, the US Prevention Services Task Force (USPSTF) recommended against routine PSA screening for all men [32,33]. While current diagnostic practices among prostate cancer patients may be of interest and the scope of the present study may represent a substantial post-recommendation period, our study design additionally prevents comparisons that are better made over time among individual patients managed by primary care physicians [32]. Furthermore, we did not include individual-level diagnostic data in our analysis. With these constraints in mind, our results are best suited for hypotheses generation.

Strengths and limitation
The use of Bayesian spatial analysis methods in this study provided an essential tool for the investigation of prostate cancer incidence in relation to risk factors to help in the better understanding of spatial distribution and potential etiologic mechanism of prostate cancer disease using an internationally recognised gold standard SEER data. Our modelling approach also allowed counties with small counts to borrow information from their neighbouring counties thereby reducing the risk of inflated relative risk due to small expected counts. Furthermore, unlike the frequentist spatial modelling approach, our Bayesian spatial modelling approach allowed graphical presentation of the posterior distribution of risk factor effects on the prostate cancer incidence as presented in Figs. 1 and 2. The present study might have left out some potential risk factors that might explain some of the geographical differences in prostate cancer disease observed in the study so the findings should be interpreted with caution. Our findings broadly support previous studies [2,[15][16][17]34] that report that older ages (≥65 years), income (number below poverty and median family income), race (being a foreign born) and unemployed are critical risk factors for prostate cancer disease. For example, the finding that the number of persons aged 65 years or older increased the risk of the disease supports previous studies that reported that prostate cancer risk increases with age, and with incidence rate over 60% [34][35][36]. The finding that increased number of foreign born increases the risk of prostate cancer disease supports previous studies that reported prostate cancer inequality by race [7].

Conclusion
Our modelling approach captured variation in prostate cancer risk over the whole of the Georgia State. The risk maps indicate substantial geographical variations in the risk of prostate cancer. This can be used as an effective tool in the identification of counties that require targeted interventions and further research by program managers and implementers as part of an overall strategy in reducing the prostate cancer burden in the Georgia State and the U.S. as a whole. For example, a further research could aim at identifying as yet unidentified risk factors that might have accounted for the geographical differences we observed in the prostate cancer disease among the counties in the Georgia State after we have accounted for the present risk factors in our model. Predictive maps for exceedance probability of relative risk of 2 (i.e. P (RR > 2)). Source: This map was produced by the authors