When considering the results presented, there are a number of limitations that should be borne in mind. Considering first the lung cancer mortality data extracted from the WHO database, one should note that it is only available for all lung cancer and not by histological type, and that diagnosis may be inaccurate, with misdiagnosis rates varying by country and time . Although the definition of lung cancer under the various revisions of the ICD relevant to this report are essentially unchanged, coding practices may have varied. Excessive use of codes for ill-defined and unknown causes and incomplete death registration coverage may have detracted from the quality of the data, with only 33% of relevant countries recently assessed as providing “high quality” data . For some countries, data relate only to selected regions (Table 1), with data for China derived from a sample registration scheme including less than 10% of all deaths occurring in the country .
Furthermore, though survival rates remain very poor, trends in mortality may not necessarily reflect trends in disease incidence. Cancer incidence rates are available, but for a far narrower range of countries and time periods.
There are also a number of limitations with the data on relative risk by smoking habit obtained from the IESLC database. These include variations in definition of smoking, definition of disease and extent of adjustment for confounders, and bias due to misclassification of smoking status. These and some other issues are also discussed in the first paper on IESLC , but some of the principal points are considered below.
As regards definition of smoking, relative risks were selected for smoking of any product, if available, and of cigarettes (or cigarettes only) otherwise. In countries where pipe and cigar smoking is rare, this distinction may be of little consequence, but it may be more important in some countries. The type of cigarette smoked is also relevant, and though no clear difference in risk has been noted between the flue-cured cigarettes smoked in the UK and various other (mainly Commonwealth) countries [2, 32] or between mentholated and unmentholated cigarettes , there is clear evidence that risk is greater in handrolled than manufactured cigarettes , in black than blond tobacco cigarettes , and in higher tar plain cigarettes than in lower tar filter cigarettes .
As can be seen in Table 3, variation exists in the definition of all lung cancer, squamous and adeno. While for the great majority of studies the definitions include, respectively, all cases, only cases of squamous cell carcinoma, and only cases of adenocarcinoma, in a small number of studies alternative definitions were allowed. Thus, for all lung cancer our definitions also includes (i) all cases other than alveolar cell cancer, (ii) all cases except lung cancers of mixed cell types, (iii) only cases of squamous cell carcinoma and adenocarcinoma, (iv) as definition (iii) but also small cell carcinoma, and (v) as definition (iv) but also large cell carcinoma. Definitions of “squamous” also included (i) Kreyberg I lung cancers, (ii) all lung cancers except adenocarcinoma, and (iii) squamous cell and differentiated carcinomas and (iv) squamous cell and small cell carcinomas. Definitions of “adeno” also include Kreyberg II lung cancers, (ii) adenocarcinomas and large cell carcinomas, (iii) all lung cancers except squamous cell and undifferentiated carcinomas, and (iv) all lung cancers except squamous cell and small cell carcinomas. While it would have been possible to make the data “purer” by omitting such alternative definitions (and also only allowing data for smoking of any product), this would have reduced the number of studies available, and lost power.
A related issue is change over time in the diagnosis of lung cancer types. Though it is generally recognized that the relative frequency of adenocarcinoma to squamous cell carcinoma has changed over time (e.g. [16, 36]), there are reports [37, 38] of studies which re-evaluated diagnoses conducted in previous years, finding that many lung cancers initially considered to be squamous cell carcinomas should, according to more modern criteria, be considered adenocarcinomas.
Although we preferred to use unadjusted relative risks as being directly relevant to the national mortality rate, we did include adjusted relative risks for squamous and adeno due to the scarcity of unadjusted data. This is unlikely to have had any major effect as we previously demonstrated that adjustment had little effect on the relative risks .
The issue of misclassification of smoking status is perhaps more serious. Some years ago, we carried out extensive work on the misclassification of smoking status and the effect it has in biasing the estimates of the association between environmental tobacco smoke exposure and lung cancer [39–42]. For many of our calculations we assumed that, in Western populations, the bias may be equivalent to that caused by 2.5% of average lung cancer risk ever smokers reporting that they have never smoked. For Asian populations, the percentage is clearly higher (see e.g. ), perhaps 10% or 20%. If these rates apply, and there are considerable uncertainties [39, 44], misclassification will have a marked effect on the estimated lung cancer death rates in never smokers.
To illustrate this, consider a population in which 50% have ever smoked, and in which the true relative risk for ever vs never smoking is 8. Suppose also that the overall lung cancer death rate is 45. Based on these “true” data, the indirect estimates of rates by our method would be 10 in never smokers and 80 in ever smokers. If in fact 2.5% of ever smokers are misclassified as never smokers, one can then readily show that one will observe 48.75% to have smoked, and a relative risk of 6.83. Based on the “observed” data, the estimated rates will then still be 80 in ever smokers but will be 11.7, not 10, in never smokers. For misclassification rates of 10% and 20%, the estimated rates in never smokers will be higher still, respectively, 16.4 and 21.7, corresponding to “observed” relative risks of 4.89 and 3.69. The extent of the bias increases, not only with the misclassification rate, but also with the true proportion of ever smokers.
Other limitations concern combining the relative risk data from IESLC with the national rates from WHO. One relates to the fact that most of the relative risk estimates derive from studies that are not nationally representative but are drawn from populations of a variety of types. We have sought to minimize this problem by excluding studies conducted in populations that were grossly unrepresentative, as described in the Methods section. Relative risks based on a variety of populations are frequently subject to meta-analysis in an attempt to get an overall average risk which can be taken to apply generally, and our use of relative risks derived from somewhat unrepresentative populations involves essentially the same underlying assumption.
Lack of national representativeness of the IESLC study populations will also mean that the estimated distribution of smoking habits may not be the same as that seen in the country where the study was conducted. If the at risk population in a cohort study (or the control population in a case-control study) contains too low a proportion of ever smokers, national rates in both ever and never smokers will be overestimated, and if it contains too high a proportion they will be underestimated. For example, assuming that the relative risk is 9, the national lung cancer rate is 100 and the national population actually contains 50% ever smokers, the true rates of 20 in never smokers and 180 in ever smokers will be estimated as 23.8 and 214.3 if the control/at-risk population contains 40% ever smokers, and as 17.2 and 155.2 if the population contains 60% ever smokers. Such biases seem unlikely to affect our conclusions, as they seem much smaller than the marked differences seen by region and period. In any case it is unclear why such biases should cause spurious regional differences or trends.
Another issue relates to which WHO 5 year period data to use for a given study. For case-control studies we use the midpoint year of the interviews, while for prospective studies, we use a survival-adjusted mid-point of the follow-up period. Although both are open to question, this is unlikely to cause any major error. Nor is the use of substitute years (see Table 1). The need for this was relatively rare, and sometimes involved only quite small differences in time.
A major feature of our methodology is that it applies all age relative risks from studies based on populations of varying ages to estimate lung cancer rates by smoking habit for age 70–74, based on overall WHO rates for that age group. This issue is discussed in the Methods section “Testing the validity of the method with respect to age”. This gives justification for our decision to select age 70–74 rather than any other age range, and points out that studies of young populations were excluded from consideration. It should also be noted that age-specific data on lung cancer relative risks are very limited, and even then are not for five year age groups. Any weaknesses resulting from the decision to use age 70–74 rates seem likely to apply similarly in the various studies considered, and should therefore not affect conclusions regarding variations by sex, region and time period.
We should also point out that our meta-regressions are relatively limited. Better understanding of patterns in rates over time and region may be gained by additional analyses which take into account aspects of the studies used to generate the rates. The relevant data for others to attempt this are available from the Tables in this report and from our original paper based on the IELSC database .