Study population
Case-probands were selected if they satisfied a defined eligibility criteria, namely: all farmers who had died of a primary malignancy of the lung (stated as an underlying cause of death in their death records) in three counties in Anhui over a ten-year period (1995–2004). Spouses of probands were included in the control group if these spouses were themselves unaffected with lung cancer and were farmers. The families of probands and controls have lived in rural regions for over 20-year.
The population of the rural regions in Anhui had almost little migration 20 years ago and the cultural patterns and ethnicity have retained definite demographic characteristics. The use of spouse-matched controls can be expected to control for the potential confouning effects of different cultural and residential environment, e.g. diet, drinking water, socioeconomic status and so on.
First-degree relatives (parents, full siblings) of probands were designated as case families. The control families comprised first-degree relatives (parents, full siblings) of the spouses of probands. Thus, in the following sections of this paper, the term "family", when applied to the study population, never includes the proband or spouse.
Data collection
A complete listing of all deaths satisfying the eligibility criteria was obtained from the local Tumor Prevention and Treatment Office of Health Bureau in rural area. Standard demographic characteristics of the probands and the identities of their next-of-kin were obtained from the death records of probands. Local community doctors were recruited to help initiate contact with family members of the case-proband. Provided with the identity of the proband's next-of-kin as well as the usual address of the deceased, the doctors generated a list of addresses of all family members in each pedigree.
Trained interviewers with standard protocols obtained information on each member of the family by face-to-face interviews from (in order of preference) the involved persons (except the proband or other deceased family member), spouse, parent, sibling, or adult offspring. Cancer histories were verified by two methods: 1) a review of death certificates on a sample (90.4 and 88.4 percent, respectively) of relatives of probands and spouses who were reported to have died in rural regions of Anhui and 2) corroborative information from additional family contacts. Because only first-degree relatives (parent and siblings) were included in the study, bias introduced by inability to verify all cancer deaths should be minimal[12].
For the protection of human subjects, all of the subjects in this study signed a consent form according to the guidelines of the World Medical Association Declaration of Helsinki.
Data analysis
Frequency statistics of the study population were computed. Mean age differences between proband and spouse relatives were tested for signficance using two-sided t tests. To determine whether the distribution of relatives was equivalent between study groups, contingency table chisquare tests were used. for each family, design variables were assigned as 1 for the presence and 0 for the absence of each of the following among the proband's or spouse's relatives: one, two, three, four or more cancers. Logistic regression analysis was used to the data to predict whether a family belonged to the case or control group based on these design variables. The resulting regression coefficients (βi)were used to calculate the relative risks of cancer by the formula: odd ratio = eβi, for the ith variables. To determine whether differences in environmental exposure between the case an control families could explain the difference in cancer occurrence, another logistic model was fitted to predict cancer occurrence in each family member based on age, sex, the occupation/industrial exposure pack-years of tobacco exposure and a variable that expressed family membership (case or control). Because the excess risk of cancer in the proband families remained statistically significant, a test of homogeneity was performed to determine whether the study groups differed in their distribution of cancer types. To isolate where these differences occurred, contingency chisquare tests were applied.
Crude odds ratios (OR) were calculated as estimates of the relative risks, and Woolf's method was used to determine 95% confidence intervals (CI)[13]. Maximum likelihood estimates of adjusted ORs were obtained from unconditional logistic regression analysis by the PHREG procedure in SAS software[14].
These variables examined as possible confounders and effect modifiers included: number of first-degree relatives (2–4, 5–7, 8–10, >10); smoking status (never, ex-smoker, current smoker); smoking duration (never, 1–29 yr, 30–45 yr, >45 yr); amount smoked (none, <20 cigarettes per day (cpd), 20–30 cpd, 31–40 cpd, >40 cpd); pack-years (defined as cigarette packs smoked daily multiplied by years of smoking, gram equivalents of leaf tobacco. Assuming 1 g per cigarette) of smoking (none, >0–20, >20–40, >40 pack-years); residence (females only) (selected three counties vs other counties); age (<55 yr, 55–65 yr, >65 yr); ethnicity (Han vs others); sex, passive smoking exposure (ever/never and by total years); education (through high school vs greater than high school and by level: none, primary school, middle school, high school, >high school); martital status (never married, married, and widowed, divorced or separated); high-risk industry/occupation (employed in jobs with exposure to asbestos, benzene, beryllium, bischloromethyl ether, ceramic dust, talc, chemical fertilizers, chromium or chromates, coke oven emissions, dyes, glues, lacquers, fiberglass, cotton dust, insecticides, pesticides, herbicides, fungicides, isopropyl oil, paint sprays, petroleum products, radioactive materials, vinylchloride or explosives); the cumulative exposure to smoky coal use for a given individual was determined by multiplying the annual rate of smoky coal use times the number of years (coal consumption was generally fixed for the households over the life cycle of the family and three exposure categories were formed: >0–70, 70–140, and >140 tons); alcohol consumption (ever/never); vital status; and type of respondent (self/spouse/other); history of COPD (yes/no).