Systematic review with meta-analysis of the epidemiological evidence in the 1900s relating smoking to lung cancer

Background Smoking is a known lung cancer cause, but no detailed quantitative systematic review exists. We summarize evidence for various indices. Methods Papers published before 2000 describing epidemiological studies involving 100+ lung cancer cases were obtained from Medline and other sources. Studies were classified as principal, or subsidiary where cases overlapped with principal studies. Data were extracted on design, exposures, histological types and confounder adjustment. RRs/ORs and 95% CIs were extracted for ever, current and ex smoking of cigarettes, pipes and cigars and indices of cigarette type and dose–response. Meta-analyses and meta-regressions investigated how relationships varied by study and RR characteristics, mainly for outcomes exactly or closely equivalent to all lung cancer, squamous cell carcinoma (“squamous”) and adenocarcinoma (“adeno”). Results 287 studies (20 subsidiary) were identified. Although RR estimates were markedly heterogeneous, the meta-analyses demonstrated a relationship of smoking with lung cancer risk, clearly seen for ever smoking (random-effects RR 5.50, CI 5.07-5.96) current smoking (8.43, 7.63-9.31), ex smoking (4.30, 3.93-4.71) and pipe/cigar only smoking (2.92, 2.38-3.57). It was stronger for squamous (current smoking RR 16.91, 13.14-21.76) than adeno (4.21, 3.32-5.34), and evident in both sexes (RRs somewhat higher in males), all continents (RRs highest for North America and lowest for Asia, particularly China), and both study types (RRs higher for prospective studies). Relationships were somewhat stronger in later starting and larger studies. RR estimates were similar in cigarette only and mixed smokers, and similar in smokers of pipes/cigars only, pipes only and cigars only. Exceptionally no increase in adeno risk was seen for pipe/cigar only smokers (0.93, 0.62-1.40). RRs were unrelated to mentholation, and higher for non-filter and handrolled cigarettes. RRs increased with amount smoked, duration, earlier starting age, tar level and fraction smoked and decreased with time quit. Relationships were strongest for small and squamous cell, intermediate for large cell and weakest for adenocarcinoma. Covariate-adjustment little affected RR estimates. Conclusions The association of lung cancer with smoking is strong, evident for all lung cancer types, dose-related and insensitive to covariate-adjustment. This emphasises the causal nature of the relationship. Our results quantify the relationships more precisely than previously.


Background
It has been known for many years that smoking causes lung cancer. An association was clearly documented in case-control studies conducted in Germany in the 1930s [1], and in the United States and Great Britain [2,3] in the 1950s, and was strengthened by surveys of large cohorts. This led the US Surgeon General to conclude in 1964 [4] that "cigarette smoking is a cause of lung cancer in men, and a suspected cause of lung cancer in women". Further reports [5,6] have defined the relationship in more detail, and it has been estimated that, in the United States, 90% of male lung cancer deaths and 75%-80% of female lung cancer deaths are caused by smoking [7].
While some meta-analyses of the evidence have been published in recent years [8][9][10] none consider more than a relatively small fraction of the published evidence. We attempt to rectify this omission, though the sheer extent of the available data, and resources available, has meant limiting attention to papers published in the last century and studies involving over 100 lung cancer cases. As will be seen, this still gives us an extensive database involving almost 300 studies.
Because the relationship of smoking to the two major types of lung cancer (squamous cell carcinoma and adenocarcinoma) is known to vary [5,6], we present detailed results relating, not only to total lung cancer risk, but also to these two histological types of lung cancer. We also present some more limited results for other lung cancer types. To provide a broad description of the relationship of smoking to lung cancer, we do not concentrate on a single primary analysis, but quantify the relationships to each of a range of indices of smoking, investigating how these relationships vary according to characteristics such as sex, age, location, study design, period considered, definition of exposure and extent of confounder adjustment. The style of this systematic review is similar to one we have recently published for smoking and COPD, chronic bronchitis and emphysema [11].

Methods
Full details of the methods used are described in Additional file 1: Methods, and are summarized below. Throughout this paper, we use the term relative risk (RR) to include its various estimators, including the odds ratio and the hazard ratio.

Inclusion and exclusion criteria
Attention was restricted to epidemiological prospective or case-control studies published up to and including 1999, which involved 100 lung cancers or more, and which provided RR estimates for one or more defined major, cigarette-type or dose-related smoking indices.
The "major indices" compare ever, current or ex smoking with never or non-current smoking, and refer to smoking of any product, cigarettes, pipes, cigars and combinations, or of specific types of cigarette. The "cigarette type indices" compare smokers of different types of cigarettefilter with plain, manufactured with handrolled and mentholated with non-mentholated. The "dose-related indices" concern amount smoked, age of starting to smoke, duration of smoking, duration of quitting, tar level, butt length or fraction smoked. Pack-years was not considered as it was felt more important to separate effects of extent and duration of exposure. Uncontrolled case studies were not included. There were no further exclusion criteria.

Literature searching
Between 1997 and 2001 potentially relevant papers were sought from Medline and Emtree searches, from British Library monthly bulletins, from files on smoking and health accumulated over many years by P N Lee Statistics and Computing Ltd, and from references cited in papers obtained, until ultimately no paper examined cited a paper of possible relevance not previously examined.

Identification of studies
Relevant papers were allocated to studies, noting multiple papers on the same study, and papers reporting on multiple studies. Each study was given a unique reference code (REF) of up to 6 characters (e.g. COMSTO or LUBIN2), based on the principal author's name and distinguishing multiple studies by the same author.
Some studies were noted as having overlaps with other studies. To minimize problems in meta-analysis arising from double-counting of cases, overlapping studies were divided into two categories, as shown in Additional file 2: Studies. The first category involved minor overlap, which could not be disentangled, and which it was decided to ignore. The second category contains sets of studies which probably or definitely overlap. Here the set member containing the most comprehensive data (e.g. largest number of cases or longest follow-up) was called the 'principal study' , other members being 'subsidiary studies' only considered in meta-analyses where the required RR was unavailable from the principal study.

Data recorded
Relevant information was entered onto a study database and two linked RR databases. Data entry was carried out in two stages. In 1997-2002, data were entered on the first RR database for the major smoking indices, cigarette type indices, and amount smoked. In 2009-2010, data were entered on the second RR database for the remaining dose-related indices.
The study database contains a record for each study, describing the following aspects: relevant publications; study title; study design; sexes considered; age range, race(s) and other details of the population studied; location; timing and length of follow-up; whether principal or subsidiary, with details of overlaps or links with other studies; number of cases and extent of histological confirmation; number of controls or subjects at risk; types of controls and matching factors used in case-control studies; use of proxy respondents, interview setting and response rates; confounding variables considered; availability of results by histological types; and availability of results for all smoking indices (including those indices not considered here, such as pack-years).
The RR databases hold the detailed results, typically containing multiple records for each study. Each record is linked to the relevant study and refers to a specific RR, recording the comparison made and the results. This record includes the sex, age range, race, lung cancer type, and (for prospective studies) the follow-up period. The smoking exposure of the numerator of the RR is defined by the smoking status (ever, current or ex), smoking product (e.g. any, cigarettes, cigarettes only, pipes only) and cigarette type (e.g. any, mainly handrolled cigarettes, filter cigarettes only, mentholated cigarettes). Similar information is recorded about the denominator of the RR. For dose-related indices, the level of exposure is recorded. The source of the RR is also recorded, as are details on adjustment variables. Results recorded include numbers of cases for the numerator and denominator, and, for unadjusted results, numbers of controls, persons at risk or person-years at risk. The RR itself and its lower and upper 95% confidence limits (LCL and UCL) are always recorded. These may be as reported, or derived by various means (see below), with the method of derivation noted.

Identifying which RRs to enter
RRs were entered relating to defined combinations of lung cancer type, smoking index (major, cigarette type or dose-related), confounders adjusted for, and strata, as described below.

Lung cancer type
Results were entered for all lung cancer, for Kreyberg I (as originally presented, or by combining squamous, small and large cell carcinoma) and Kreyberg II (as originally presented, or by combining adenocarcinoma and others not in Kreyberg I), and for squamous, small, and large cell carcinoma and for adenocarcinoma separately. Additionally, the following groups were constructed if not originally presented: all lung cancer or nearest equivalent, but at least squamous cell carcinoma and adenocarcinoma; squamous cell carcinoma or nearest equivalent; adenocarcinoma or nearest equivalent.

Major and cigarette type smoking indices
The intention was to enter RRs comparing current smokers, ever smokers or ex smokers with never or non smokers. Near-equivalent definitions were accepted when stricter definitions were unavailable, so that, for example, never smokers could include occasional smokers (or exceptionally, light smokers), while current smokers could include, and ex-smokers exclude, recent quitters. RRs were to be entered relating to smoking of defined products and, when the product related to cigarette smoking, to defined cigarette types (see also Additional file 1: Methods). If available, results (for each of current, ex and ever smoking) were entered for five comparisons: any product vs. never any product, cigarettes vs. never any product, cigarettes only vs. never any product, cigarettes vs. never cigarettes, and cigarettes only vs. never cigarettes (and also for five equivalent comparisons for current vs non smoking). Here "cigarettes" ignores whether other products (i.e. pipes and cigars) are also smoked, while "cigarettes only" excludes mixed smokers. Additionally, when the numerator related to the smoking of filter, handrolled or mentholated cigarettes, RRs were entered with the denominator defined as relating to plain, manufactured or non-mentholated smokers respectively.

Dose-related smoking indices
RRs were entered for seven measures: amount smoked, age of starting, duration of smoking, duration of quitting, tar level, butt length and fraction smoked. RRs were expressed relative to never smokers (or near equivalent), if available, or relative to non smokers otherwise. For duration of quitting, RRs were also expressed relative to current smokers. Except for amount smoked, further RRs were entered, restricted to smokers, and expressed relative to the level expected to have the lowest risk (e.g. shortest duration or latest age started).

Confounders adjusted for
For case-control studies, results were entered adjusted for the greatest number of potential confounding variables for which results were available, and also unadjusted (or adjusted for the smallest number of confounders). For prospective studies, results were entered adjusted for age and the greatest number of confounders, and for age only or age and the smallest number of confounders, with unadjusted results entered only if no age-adjusted results were available. These alternative RRs are subsequently referred to as "most-adjusted" and "least-adjusted". For dose-related RRs restricted to smokers, results with "most adjustment" but without adjustment for other aspects of smoking were also entered if available.

Strata
Three strata were consideredsex, age and race. Results were entered for males and females separately when available, with combined sex results only entered when sex-specific results were not available. Results were entered for all ages combined and for individual age groups, and for all races and for individual racial groups.

Derivation of RRs
Adjusted RRs and their 95% CIs were entered as provided, when available. Unadjusted RRs and CIs were calculated from their 2 × 2 table, using standard methods (e.g. [12]), noting any discrepancies between calculated values and those provided by the author. Sometimes the 2 × 2 table was constructed by summing over groups (e.g. adding current and ex smokers to obtain ever smokers) or from a percentage distribution. Various other methods were used as required to provide estimates of the RR and CI. Some more commonly used methods are summarized below, fuller details being given in Additional file 1: Methods.

Correction for zero cell
If the 2 × 2 table has a zero cell, 0.5 was added to each cell, and the standard formulae applied.

Combining independent RRs
RRs were combined over ℓ strata (e.g. from a 2 × 2 × ℓ table) using fixed-effect meta-analysis [13], giving an estimate adjusted for the stratifying variable.

Combining non-independent RRs
The Hamling et al. method [14] was used (e.g. to derive an adjusted RR for ever smokers from available adjusted RRs for current and ex smokers, each relative to never smokers, or to combine adjusted RRs for several histological types, each relative to a single control group).

Estimating CI from crude numbers
If an adjusted RR lacked a CI or p-value but the corresponding 2 × 2 table was available, the CI was estimated assuming that the ratio UCL/LCL was the same as for the equivalent unadjusted RR.

Data entry and checking
Master copies of all the papers in the study file were read closely, with relevant information highlighted to facilitate checking. Where multiple papers are available for a study, a principal publication was identified, although details described only in other publications were also recorded. Preliminary calculations and data entry were carried out by one author and checked by another, and automated checks of completeness and consistency were also conducted. RR/CIs underwent validation checks [15].

Meta-analyses conductedoverview
A pre-planned series of meta-analyses was conducted for various smoking indices for each of the three main outcomes (all lung cancer, squamous cell carcinoma, and adenocarcinoma) and also for some indices for two other outcomes (large cell carcinoma and small cell carcinoma). Nearest equivalent definitions are allowed for the three main outcomes, with the terms "squamous" and "adeno" used subsequently to distinguish these results from those specifically for these cell types. Each metaanalysis was repeated, based on most-adjusted RRs and on least-adjusted RRs. For each meta-analysis conducted, combined estimates were made first for all the RRs selected, then for RRs subdivided by level of various characteristics, testing for heterogeneity between levels.

Selecting RRs for the meta-analyses
All meta-analyses are restricted to records with available RR and CI values. The process of selecting RRs for inclusion in a meta-analysis must try to include all relevant data and to avoid double-counting. For a given analysis (e.g. of current cigarette smoking), several definitions of RR may be acceptable (e.g. cigarette smoking, or cigarette only smoking), so, for studies with multiple RRs, the one to be used is determined by a preference order defined for the meta-analysis. Preference orders may be required for smoking status, smoking product, the unexposed base, and extent of confounder adjustment. As the definitions of RR available may differ by sex (e.g. a study may provide RRs for any product smoking for males, but only for cigarette smoking for females), the RRs chosen for each sex may not necessarily have the same definition.. Sexes combined results are only considered where sex-specific results are not available. Similarly RRs from a subsidiary study are only used where eligible RRs are unavailable from the principal study. When multiple preference orders are involved, the sequence of implementation may affect the selection, so preferences for the most important aspects, usually concerning smoking, are implemented first.

Carrying out the meta-analyses
Fixed-effect and random-effects meta-analyses were conducted using the method of Fleiss and Gross [13], with heterogeneity quantified by H, the ratio of the heterogeneity chisquared to its degrees of freedom, which is directly related to the statistic I 2 [16] by the formula I 2 = 100 (H-1)/H. For all meta-analyses, Egger's test of publication bias [17] was also included.
Meta-analyses were conducted in various sets (A to N) corresponding to the sub-sections of the results section of the paper. A full list of the analyses is given in Additional file 1: Methods.

The major smoking indices
For the major smoking indices, the first four sets of meta-analyses relate to: A ever smoking, B current smoking, C ever smoking (but with current smoking used if ever smoking not available), referred to subsequently as "ever/current" smoking, and D ex smoking.
In what is referred to as the main analysis in each set, smoking of any product is preferred by selecting RRs in the following preference order: 1. smoking of any product vs. never smoked any product; 2. smoking of cigarettes vs. never smoked any product, 3. smoking of cigarettes only vs. never smoked any product; 4. smoking of cigarettes vs. never smoked cigarettes; 5. smoking of cigarettes only vs. never smoked cigarettes; with options 6-10 the same as options 1-5 except that "never smoked" is replaced by "never smoked near equivalent". A variant analysis prefers cigarette smoking (by changing the preference order to 4,5,2,3,1,9,10,7,8,6). In meta-analyses of type C, a further variant analysis reverses the preference so current smoking results are preferred to those for ever smoking, referred to subsequently as "current/ever" smoking. Other variant analyses are based on RRs for specified age ranges.
A further set of meta-analyses, E, concerns smoking of pipes and/or cigars (but not cigarettes), referred to subsequently as smoking of "pipes/cigars only", smokers of pipes only, smokers of cigars only, and smokers of cigarettes and pipes/cigars ("mixed" smokers). Separate metaanalyses were conducted for ever smoking, current smoking, ever/current smoking, current/ever smoking and ex smoking.

The cigarette type indices
Meta-analyses were conducted, in set F, for only filter vs. only plain, ever filter vs. only plain, only filter vs. ever plain, handrolled vs. manufactured, and mentholated vs. non-mentholated. These were only conducted for ever/ current smoking, and preferring RRs for cigarettes over RRs for cigarettes only. The analyses with only filter as the numerator used the preference order of filter only, always, mainly, both, equally, and ever, while the analyses with ever filter as the numerator used the reverse preference. Similar preference orders applied to the denominators. The analyses of handrolled vs. manufactured cigarettes used the preference order of any, both, mainly, and only for handrolled, and only ever, only current, any and ever for manufactured.

The dose-related smoking indices
For the dose-related indices, sets of meta-analyses were conducted for: G amount smoked, H age of starting to smoke, I duration of smoking, J duration of quitting compared to never smokers (or long-term ex smokers), K duration of quitting compared to current smokers (or short-term quitters), L tar level, and M butt length or fraction smoked (taking short butt length as being equivalent to a large fraction smoked). For any measure, a study typically provides a set of non-independent RRs for each dose-category, expressed relative to a common base. To avoid double-counting only one was included in any one meta-analysis. Two approaches were adopted. The first involves specifying a scheme with a number of levels of exposure ("key values"), then carrying out metaanalyses for each level in turn, expressed relative to never smokers. For an RR to be allocated to a key value, its dose-category has to include that key-value and no other. Schemes with a few, widely spaced, key values tend to involve more studies, whereas schemes with more key values, closely spaced, involve RRs from fewer studies, but ones with dose categories more closely clustered around the key value. The sets of key values used (with 999 indicating an open-ended category) were 5, 20, 45 and 1, 10, 20, 30, 40, 999 for amount smoked; 26, 18, 14 and 30, 26, 22, 18, 14, 10 for age of starting to smoke; 20, 35, 50 and 5, 20, 30, 40, 50, 999 for duration of smoking; 12, 7, 3 and 20, 12, 3 for duration of quitting vs. never; and 3, 7, 12 and 3, 12, 20 for duration of quitting vs. current. No key value analysis was conducted for tar level, or for butt length/fraction smoked. The second approach (not conducted for amount smoked) involves meta-analysing of RRs for the highest compared with the lowest categories of exposure within smokers available for each study.

Meta-regression analyses
While full multivariable analysis of the data is considered beyond the scope of this report, meta-regression analyses were also carried out using the sets of RRs selected for the main meta-analyses for ever smoking and for current smoking. Following preliminary metaregressions (not shown), a "fixed model" was fitted to examine the effect on the results of six different categorical variables (sex, location, start year of study, major study type, number of lung cancer cases and number of adjustment factors). Note that the number of lung cancer cases (in the study as a whole), which is referred to subsequently as "number of cases", is used as an indicator of study size. The significance of each of these variables was estimated by an F-test based on the increase in deviance resulting from its exclusion from the basic model. A list of secondary variables was also defined (relating to more detailed aspects of location, outcome, study type and confounder adjustment, national cigarette tobacco type, the product smoked, the denominator used in the RR, use of proxy respondents, whether the study required 100% histological confirmation of lung cancer, whether the population studied worked in risky occupations, the age of the subjects, and the derivation of the RR) with the significance of adding each characteristic to the fixed model estimated by an F-test based on the increase in deviance. Fuller details are given in Additional file 1: Methods.

Additional analyses
Additional tests of the relationship of lung cancer risk to various characteristics of interest were based on corresponding pairs of RR and CI estimates within the same study for the same definition of outcome and exposure, and deriving the ratio of the two RRs. Where the pairs involved independent sets of subjects, the variance of the ratio was also derived, and meta-analyses of the ratio were conducted. Where the pairs involved nonindependent sets of subjects the numbers of ratios greater and less than 1 were compared using the sign test. Tests of independent pairs related to sex (males vs. females), age (oldest vs. youngest age group) and race (white people vs. non-white or black people). Tests of non-independent pairs related to level of adjustment (most-adjusted vs. least-adjusted), and to comparisons of product smoked (mixed smokers vs. cigarette only smokers, and vs. smokers of pipes/cigars only). Tests were always carried out for all lung cancer and ever/current smoking. For sex, additional analyses were conducted for current and for ever smoking, for squamous and adeno, and also within level of amount smoked. For level of adjustment, two sets of analyses were run. The first, relating to RRs for ever/current smoking were based on the most-adjusted/least-adjusted ratio, while the second, for highest vs. lowest RRs for age of starting to smoke, duration, years quit and tar level, compared RRs that were most-or least-adjusted for other aspects of smoking.

Software
All data entry and most statistical analyses were carried out using ROELEE version 3.1 (available from P.N. Lee Statistics and Computing Ltd, 17 Cedar Road, Sutton, Surrey SM2 5DA, UK). Some analyses were conducted using Quattro Pro 9 or Excel 2003.

Studies identified
A total of 5,993 potentially relevant papers were identified, providing information on 287 eligible studies (Table 1). Table 2 presents selected details of the 287 studies while Table 3 gives the distribution of their major characteristics. Additional file 2: Studies gives fuller descriptions of the studies.
Of the 287 studies, 267 are classified as principal, 209 (78.3%) of these being case-control studies, 52 (19.5%) prospective, 5 (1.9%) nested case-control and 1 (0.4%) case-cohort. Note that the last three study designs, where exposure was determined before diagnosis, are combined into one category in Table 3 (and the text  below based on it). The other 20 studies are classified as subsidiary. Of the principal studies, 262 provide data for all lung cancer, 84 for squamous and 86 for adeno. Only rarely did these studies provide data only for squamous (1 study) or adeno (3 studies). The data come less often from case-control designs for all lung cancer (77.9%) than for squamous (86.9%) and adeno (87.2%).
Of the 267 principal studies, 158 (59.2%) provide results for both sexes, 90 (33.7%) for males only, and 19 (7.1%) for females only. One hundred and ninety-six (73.4%) of the studies included subjects who are under 30 years old (or allowed their inclusion by having no age restriction), while only 31 (11.6%) were restricted to subjects aged 40 or more. Subjects aged 80 years or more were included by 200 (74.9%), while only 16 (6.0%) were restricted to subjects aged 60 or less. Prospective studies were much more likely than case-control studies to specify age restrictions, e.g. 62.1% vs. 16         Of the total RRs, 5,061 relate to the major smoking indices, where the denominator is never or non smoking, with 3,614 of these relating to smoking of any product or cigarettes (regardless of pipe or cigar smoking), 678 to cigarette only smoking and 769 to pipe, cigar or mixed smoking. Four hundred and forty-eight relate to cigarette type comparisons, most commonly (303 RRs) to the filter vs. plain comparison. All the 25 RRs for the mentholated/non-mentholated comparison come from North American studies, while none of those for the handrolled/manufactured comparison do. There are 10,921 RRs for dose-related indices, based mainly on 3,625 sets, 2,047 vs. never or non smoking, 1,327 vs. the low level, and 251 vs. current smoking. There are most sets for amount smoked (1,145) and least for butt length (5). For amount smoked, age of starting, duration of smoking, years quit (vs. never and vs. current) there are sufficient numbers of dose-response sets to study variation in RR by sex, study type and continent.
None of the RRs included in the meta-analyses and meta-regressions show more than minor failures of the validation tests used, attributable to rounding errors or small imprecisions or uncertainties in estimating the RRs and CIs. Additional file 3: RRs provides further detail.
For dose-related indices, Additional file 4: Dose Not Meta gives results originally presented in forms unsuitable for meta-analysis.

The meta-analyses and meta-regressions
The main findings are summarized in the following sections, with tables and forest plots. Additional file 5: Detailed Analysis Tables fully presents all the meta-analyses and meta-regressions conducted. The interested reader should first see Additional file 1: Methods, which lists the other files, and describes their content and structure.
Findings are generally presented for three outcomes, referred to as "all lung cancer", "squamous" or "adeno". These outcomes are defined in the Methods section, and also in the footnotes to the tables, and allow the inclusion of results based on alternative similar definitions. (Note that the terms "squamous cell carcinoma" and "adenocarcinoma" are only used when reference is made to results specifically for the particular cell type).
A. Risk from ever smoking  Table 5 presents additional results subdivided by level of certain characteristics, while Table 6 presents results of some alternative meta-analyses of ever smoking. From these findings, various observations can be made.
First, the RRs for all three outcomes are markedly heterogeneous. As shown in Table 5 As shown in Table 6, the overall estimates for each outcome were virtually unchanged by using least-adjusted   Returning to the main meta-analysis (most-adjusted and preferring ever smoking any product), there is a a RRs relating to all lung cancer (all), squamous cell carcinoma (squamous) or adenocarcinoma (adeno) or to a near equivalent definition. b "never/non" indicates never smoker or non smoker. c Sets are dose response results for an identical definition of smoking product, strata and confounding variables. Sets vs. low are sets where the RR has been calculated compared to the level with the lowest expected risk, i.e. the smallest amount smoked, or the latest age of starting. Non-categorical RRs are typically where RRs and CIs are not available, but some other information was provided. d RRs from principal studies. Figure 1 Forest plot of ever smoking of any product and all lung cancerpart 1. Table 5 presents the results of a main meta-analysis for all lung cancer based on 328 relative risk (RR) and 95% confidence interval (CI) estimates for ever smoking of any product (or cigarettes if any product not available). The individual study estimates are shown numerically and graphically on a logarithmic scale in Figures 1, 2  Forest plot of ever smoking of any product and all lung cancerpart 2. This is a continuation of Figure 1, presenting further individual study data included in the main meta-analysis for all lung cancer shown in Table 5. For study DORGAN separate estimates, within sex, are shown for whites then blacks. For study HUMBLE they are shown for non-hispanic whites then hispanics, and for study KELLER for whites then non-whites.
Total (95% CI) 5.50 (5.07, 5.96) * Symbol size scaled separately from other studies Figure 5 Forest plot of ever smoking of any product and all lung cancerpart 5. This is a continuation of Figure 4, presenting the remaining individual study data included in the main meta-analysis for all lung cancer shown in Table 5. Also shown are the combined random-effect estimates. These are represented by a diamond of standard height, with the width indicating the 95% CI. Note that the sizes of the squares for the two estimates from study LIU4 indicate the relative weight of the male and female data, but are not comparable with the sizes of the squares for the other estimates. Figure 6 Forest plot of ever smoking of any product and squamouspart 1. Table 5 presents the results of a main meta-analysis for squamous based on 102 relative risk (RR) and 95% confidence interval (CI) estimates for ever smoking of any product (or cigarettes if any product not available for the 107 RRs (mean 14.1). Again, BROWN2 and LUBIN2 were the largest contributors, providing, respectively, 24% and 6% of the total weight. In investigating sources of heterogeneity, variation was studied firstly using a univariable approach, the results for the characteristics considered in Table 5 being summarized below, based on the random-effects estimates.

Sex
For all three outcomes, RRs were always somewhat lower for females than for males or for sexes combined, though the variation by sex was not significant (p ≥ 0.1) for squamous. Figure 7 Forest plot of ever smoking of any product and squamouspart 2. This is a continuation of Figure 6, presenting the remaining individual study data included in the main meta-analysis for squamous shown in Table 5. Also shown are the combined random-effect estimates. These are represented by a diamond of standard height, with the width indicating the 95% CI. Figure 8 Forest plot of ever smoking of any product and adenopart 1. Table 5 presents the results of a main meta-analysis for adeno based on 107 relative risk (RR) and 95% confidence interval (CI) estimates for ever smoking of any product (or cigarettes if any product not available). The individual study estimates are shown numerically and graphically on a logarithmic scale in Figures 8,9. The studies are sorted in order of sex within study reference (REF) within start year of study (START) within continent (CONT). In the graphical representation individual RRs are indicated by a solid square, with the area of the square proportional to the weight (inverse-variance of log RR). Arrows indicate where the CI extends outside the range allocated. For study SCHWAR separate estimates, within sex, are shown for whites then blacks.

Location
For all three outcomes, RRs were lower from studies conducted in Europe and Asia than from studies conducted in North America. While for all lung cancer and adeno RRs were noticeably lower in Asia than in Europe, this difference was not evident for squamous. The difference in RRs by continent was very marked and highly significant (p < 0.001) for all lung cancer and adeno, but less marked, though still significant (p < 0.01) for squamous.

Start year of study
For all lung cancer and squamous, variation by start year was not significant (p ≥ 0.05) although there was some tendency for RRs to be higher in more recent studies. For adeno, the variation was significant (p < 0.01) but there was no clear trend.

Study type
For all three outcomes, RRs were somewhat lower for case-control studies than for prospective studies (or other  Figure 9 Forest plot of ever smoking of any product and adenopart 2. This is a continuation of Figure 8, presenting the remaining individual study data included in the main meta-analysis for adeno shown in Table 5. Also shown are the combined random-effect estimates. These are represented by a diamond of standard height, with the width indicating the 95% CI.   study designs where the smoking data were collected before lung cancer diagnosis). However, the difference was never statistically significant (p ≥ 0.05).

National cigarette tobacco type
For all three outcomes, there was significant (p < 0.01 or< 0.001) variation. This was mainly due to low estimates in the "other" group, which mainly included results from China. For all lung cancer, RRs for Virginia (6.24, 5.16-7.54, n = 50) and blended (6.30, 5.79-6.87) were quite similar. For squamous and adeno, there were limited results for Virginia, and no clear difference from blended was evident.

Any proxy use
There was some evidence that RRs were higher where proxy respondents were used for squamous (p < 0.05) and adeno (p < 0.1), but not for all lung cancer.

Full histological confirmation
RR estimates were somewhat higher where full histological confirmation of diagnosis was a study requirement, but this was only significant at p < 0.05 for all lung cancer.

Number of cases
Some tendency for RRs to increase with increasing number of cases was evident for all three outcomes, but variation in number of cases was only significant for all lung cancer (p < 0.01).

Smoking product
The analyses in Table 5 are based on a preference order of any product, cigarettes (ignoring other products) and cigarettes only. For all lung cancer, where 205 of the 328 estimates were for any product, 114 were for cigarettes and 9 for cigarettes only, there was no evidence that the RRs included varied by smoking product. For squamous and adeno (both p < 0.001), however, RRs were lowest for smoking any product, intermediate for cigarettes, and highest for cigarettes only (though based on only two RRs for cigarettes only for each outcome).

Unexposed base
RRs were somewhat higher where the unexposed base group was never cigarettes than when it was never any product, though this was only significant (p < 0.05) for adeno. This result is somewhat counter-intuitive, as lower RRs might be expected where the base (never cigarettes) includes some smokers (pipe/cigar only), and probably arises from the strong correlation between the definitions of smoking product and unexposed base. Two combinationsany product vs. never any product (n = 203) and cigarettes vs. never cigarettes (n = 90)form a large proportion of the total RRs (with any product vs never cigarettes not a valid possibility).

Number of adjustment factors
There was no evidence that RR estimates varied by whether they were adjusted for 0, 1 or 2+ potential confounding variables. Between levels P B NS NS NS a Within each study, results for ever smokers are selected in the following preference order, within each sex, for: smoking productany, cigarettes (ignoring other products), cigarettes only; cigarette typeany, manufactured (with or without handrolled), manufactured only; unexposed groupnever any product, never cigarettes, near equivalent (see Methods); follow-up periodlongest available; lung cancer typesee notes c to e; raceall or nearest available, otherwise by race; overlapping studiesprincipal, subsidiary; agewhole study, widest available age group.
Results are then selected for: sexsingle sex results, combined sex results; adjustment for potential confoundersmost available. b n = number of estimates combined, F = fixed-effect meta-analysis RR (95% CI), R = random-effects meta-analysis RR (95% CI), H = heterogeneity chisquared per degree of freedom, P H = probability value for heterogeneity expressed as p < 0.001, p < 0.01, p < 0.05, p < 0.1 or NS (p ≥ 0.1), P E = probability value for Egger's test of publication bias similarly expressed, P B = probability value for between levels (see Methods) similarly expressed. c All or nearest available, must include at least squamous cell carcinoma and adenocarcinoma. d Squamous cell carcinoma or nearest available, but not including adenocarcinoma. e Adenocarcinoma or nearest available, but not including squamous cell carcinoma. f Or nested case-control or case-cohort in the case of 5 estimates for all lung cancer, 4 for squamous and 4 for adeno. g Or not known in the case of 10 estimates for all lung cancer, 2 for squamous, and 2 for adeno. h In the study as a whole. Table 6 Some alternative meta-analyses for ever smoking compared to those in Table 5 Analysis description Statistic b All lung cancer c Squamous d Adeno e As The full meta-analysis (see Additional file 5: Detailed Analysis Tables) also includes results by levels of some other characteristics. In an attempt to evaluate the independent role of a whole range of characteristics, preliminary meta-regression analyses were conducted for each outcome (results not shown). As a result, it was decided to present findings for a fixed model involving six major characteristics (see Table 7), test the effect of each by deleting each of the six individually from the fixed model (and also by allowing each to enter a step-wise model in order of significance), and test the effect of a range of other characteristics by adding each individually into the  Figure 10 Forest plot of current smoking of any product and all lung cancerpart 1. Table 8 presents the results of a main meta-analysis for all lung cancer based on 195 relative risk (RR) and 95% confidence interval (CI) estimates for current smoking of any product (or cigarettes if any product not available). The individual study estimates are shown numerically and graphically on a logarithmic scale in Figures 10, 11 Figure 11 Forest plot of current smoking of any product and all lung cancerpart 2. This is a continuation of Figure 10, presenting further individual study data included in the main meta-analysis for all lung cancer shown in Table 8 Figure 12 Forest plot of current smoking of any product and all lung cancerpart 3. This is a continuation of Figure 11, presenting the remaining individual study data included in the main meta-analysis for all lung cancer shown in Table 8 Figure 13 Forest plot of current smoking of any product and squamous.  Figure 14 Forest plot of current smoking of any product and adeno. Table 8 presents the results of a main meta-analysis for adeno based on 44 relative risk (RR) and 95% confidence interval (CI) estimates for current smoking of any product (or cigarettes if any product not available).    including location only into the model. As noted earlier this was mainly due to relatively high RRs in North America and low RRs in Asia. Other clear effects were also associated with start year of study (p < 0.001, higher risks in later studies, much more clearly evident than in the univariable analyses in Table 5), study type (p < 0.01, higher risks in prospective studies) and number of cases (p < 0.001, higher risks in larger studies). There was no significant effect of sex, and the weakly significant (p < 0.05) effect for number of adjustment factors was associated with an erratic pattern, with lower RRs where the number of factors was 1, and higher RRs where it was 0 or 2+. The heterogeneity for the fixed model including all the six characteristics included in Table 7  (i.e. other than China or Japan) into three smaller regions, with risk higher in India compared to Hong Kong and the rest of Asia (Taiwan, Thailand, Singapore and South Korea). No independent effect was evident for national cigarette tobacco type. Additional analysis (data not shown) confirmed the strong independent effect of start year of study separately within studies conducted in North America, Europe and Asia, though the tendency for higher RRs in more recent studies was stronger in North America than in Europe, and the pattern of variation was more erratic for Asia. It also confirmed the strong independent effects of location and start year of study separately for males and for females. For squamous, start year of study was the most important factor, on its own reducing the heterogeneity from 5.17 to 4.33 per d.f. (p < 0.001). Other significant characteristics included location (p < 0.001), with RRs high in North America and low in China, and number of cases (p < 0.05), with higher RRs in larger studies. Number of adjustment factors was also significant (p < 0.05), but the pattern was erratic and not the same as for all lung cancer. Though the pattern of results by study type was similar to that for all lung cancer, this characteristic did not contribute significantly to the model. The heterogeneity for the fixed model (   Table 9 Some alternative meta-analyses for current smoking compared to those in Table 8 Analysis description Statistic b All lung cancer c Squamous d Adeno e As RRs higher where flue-cured Virginia tobacco was smoked, than where blended tobacco was smoked. Also, RRs were higher (p < 0.01) where they had been derived by a relatively complex method (see Methods) than where they were as reported originally, or derived by more standard methods. For adeno, location was the most important factor, on its own reducing the heterogeneity from 8.78 to 4.36 per d.f. (p < 0.001), with the pattern of results (RRs high in North America and low in Asia) similar to that for all lung cancer. As for all lung cancer, there was variation by start year of study (p < 0.05) and number of cases (p < 0.05), with RRs higher for recent and larger studies. RRs were again higher for prospective studies, but here the difference was not significant (p ≥ 0.05). Here, variation by sex was significant (p < 0.05) with RRs higher . Four other characteristics significantly improved the model fit. One was "Other Asia" (p < 0.05) where RRs were high in India (based on a single RR from JUSSAW) and relatively low in Hong Kong, Taiwan, Thailand, Singapore and South Korea. National cigarette tobacco type was also significant (p < 0.05), with RRs for blended higher than for Virginia, opposite to the finding for squamous. RRs were also lower where there was any use of proxy respondents (p < 0.05). Also, RRs varied (p < 0.001) by the detailed definition of adenocarcinoma used. This appeared to be mainly because of a low RR for "not squamous or undifferentiated", a definition used only for LOMBA2/females, where the standardized residual of −3.721 SEs was the largest for any RR (see also above). The fixed model (Table 7) considered how RR estimates varied by six main characteristics and additional analyses (see Additional file 5: Detailed Analysis Tables) tested whether adding in further characteristics improved the model fit. Characteristics which did not improve the fit for any of the three outcomes considered included whether there was adjustment for specific factors (such as age), the age of the subjects studied, the definition of smoking product, the definition of the unexposed base, whether the study was conducted in a population working in a risky occupation, and whether the study procedures required full histological confirmation.  Figure 13 (squamous) and Figure 14 (adeno) present the results of the main meta-analyses for current smoking of any product. As before, RRs for smoking of cigarettes are used if RRs for any product smoking are not available, and RRs are most-adjusted. For prospective studies, current smoking refers to smoking status as at baseline. Table 8 presents additional results by level of the same set of characteristics considered in Table 5, while Table 9 presents results of alternative metaanalyses of current smoking.

B. Risk from current smoking
As for ever smoking, the RRs for all three outcomes are heterogeneous (p < 0.001), with the largest estimates seen being 104 Figure 15 Forest plot of ex smoking of any product and all lung cancerpart 1. Table 12 presents the results of a main meta-analysis for all lung cancer based on 182 relative risk (RR) and 95% confidence interval (CI) estimates for ex smoking of any product (or cigarettes if any product not available). The individual study estimates are shown numerically and graphically on a logarithmic scale in Figures 15, 16 Figure 16 Forest plot of ex smoking of any product and all lung cancerpart 2. This is a continuation of Figure 15, presenting further individual study data included in the main meta-analysis for all lung cancer shown in Table 12. For study KELLER the estimate shown for females is for non-whites.
clearly positive, larger than the corresponding estimates for ever smoking, and also show a stronger relationship with squamous than adeno. Similarly to ever smoking, the individual RRs are virtually all above 1.0, though varying substantially. The estimates are again little affected (Table 9) by preferring least, rather than most,  Figure 17 Forest plot of ex smoking of any product and all lung cancerpart 3. This is a continuation of Figure 16, presenting the remaining individual study data included in the main meta-analysis for all lung cancer shown in Table 12. Also shown are the combined random-effect estimates. These are represented by a diamond of standard height, with the width indicating the 95% CI. For study KREUZE separate estimates, within sex, are shown for age ≤ 45 and 55-69.
adjusted RRs, by restricting to a more precise outcome definition, or by preferring RRs for current smoking of cigarettes to those for current smoking of any product. Again estimates based specifically on cigarette only smoking were slightly higher than those shown in  Figure 18 Forest plot of ex smoking of any product and squamous. Table 12 presents the results of a main meta-analysis for squamous based on 33 relative risk (RR) and 95% confidence interval (CI) estimates for ex smoking of any product (or cigarettes if any product not available). were markedly lower than the corresponding estimates for current vs. never smokers, reflecting the increased risk in ex-smokers described later (see section D below). For the main meta-analysis, the studies contributing most to the total weight for current smoking for all lung cancer were STOCKW/sexes combined (17.8% of the total of 6,750) followed by BROWNS/males (6.0%) and BROWNS/females (5.4%). BROWNS was the major contributor for both squamous and adeno, with the two sex-specific results contributing 36.0% of the total weight of 646 for squamous, and 30.0% of the total weight of 1,017 for adeno. The huge LIU4 study did not provide results for current smoking.
For the characteristics considered in Table 8, the pattern of variation has a number of similarities to that for ever smoking in Table 5 Figure 19 Forest plot of ex smoking of any product and adeno. Table 5 presents the results of a main meta-analysis for adeno based on 34 relative risk (RR) and 95% confidence interval (CI) estimates for ex smoking of any product (or cigarettes if any product not available).    for all three outcomes tend to be higher for males, for North American studies, and where the unexposed base is never cigarettes, and smaller for older studies and smaller studies, with no clear variation by extent of adjustment. A tendency for RRs to be higher where data may be reported by proxy respondents seems somewhat stronger for current smoking, although based on few estimates for squamous and adeno. A tendency for RRs to be higher where the smoking product is cigarettes or cigarettes only than when it is any product is also evident, though not for squamous, whereas it was seen most clearly in squamous for ever smoking. There is also some indication that RRs are higher in prospective studies, though interestingly not for all lung cancer. Whereas for ever smoking, RRs for studies requiring full histological confirmation were higher than for those that did not for all three outcomes, the tendency was in the reverse direction for squamous and adeno for current smoking. For national cigarette tobacco type, current smoking RRs for squamous and adeno are virtually all for blended, so are unhelpful. For all lung cancer, RRs are quite similar for Virginia and blended, the significant (p < 0.001) variation shown in Table 8 arising because of the low RRs in the "Other" group, mainly for China. As for ever smoking, meta-regression analyses were conducted to give further insight, the results from the same fixed model including six characteristics being summarized in Table 10. Based on these results and those for other characteristics in Additional file 5: Detailed Analysis Tables various conclusions can be  drawn. For all lung cancer, as was the case for ever smoking RRs, by far the strongest source of variation in current smoking RRs was location with relatively high risks in North America and low risks in Asia. The overall heterogeneity reduced from 13.76 per d.f. to 6.73 per d.f. after including location only into the model. Higher risks were also seen in the fixed model in more recent studies (p < 0.001) and for males than females (p < 0.01). There was some evidence (p < 0.1) of higher RRs in larger studies and in prospective studies, but no association was seen with the number of adjustment factors. The heterogeneity for the fixed model shown in Table 10  No other characteristic significantly improved the fit when added to the fixed model. Additional analysis (data not shown) confirmed the effect of start year of study separately for North America and Europe (though no such relationship was seen in Asia) and also confirmed that the effects of location and start year of study were evident separately for males and for females.
For squamous and adeno, numbers of current smoking RRs (41 and 44 respectively) were much lower than those for all lung cancer, with no data for China or the United Kingdom, or for national cigarette type "other". For squamous, only two characteristics in the fixed model (Table 10) were significant, and then only at p < 0.05, and one of these was number of adjustment factors, where the pattern of response was erratic. Location was the other, with RRs again highest in North America and lowest in Asia. There were no estimates with large standardized residuals, and no other characteristic improved the model fit.
For adeno, three of the characteristics considered in Table 10 contributed significantly to the model, sex (p < 0.001), location (p < 0.001) and start year of study (p < 0.05), with the direction of effect similar to that noted earlier for ever smoking. There were no large standardized residuals, and the only additional characteristic which improved the model fit (p < 0.05) related to somewhat lower RRs being seen for studies with full histological confirmation.
For none of the three outcomes did characteristics associated with detailed location, national cigarette tobacco type, the precise definition of the outcome, adjustment for specific factors, the definitions of smoking product or of the unexposed base, whether the study was conducted in a population working in a risky occupation or whether proxy respondents were used, add significantly to the model.

C. Risk from ever or current smoking
In an attempt to incorporate data from a greater number of studies, additional analyses were carried out for ever/ current smoking and for current/ever smoking. The meta-analysis RRs are shown in Table 11. The number of studies included increased from 236 to 242 for all lung cancer, from 73 to 78 for squamous and from 75 to 81 for adeno, compared with Table 5. Note that the slightly higher number of RR estimates in the current/ a Within each study, results for the relevant smoking product and smoking status are selected in the following preference order, within each sex, for: unexposed groupnever any product, near equivalent (see Methods); follow-up periodlongest available; lung cancer typeall or nearest available, must include at least squamous cell carcinoma and adenocarcinoma; raceall or nearest available, otherwise by race; overlapping studiesprincipal, subsidiary; agewhole study, widest available age group; sexsingle sex results, combined sex results; adjustment for potential confoundersmost available. b n = number of estimates combined, F = fixed-effect meta-analysis RR (95% CI), R = random-effects meta-analysis RR (95% CI), H = heterogeneity chisquared per degree of freedom, P H = probability value for heterogeneity expressed as p < 0.001, p < 0.05, p < 0.1 or NS (p ≥ 0.1), P B = probability value for between levels (see Methods) similarly expressed. c Results differ from those shown earlier (Tables 6, 9) because RRs with unexposed group "never cigarettes" are excluded here. ever analysis arises from inclusion there of more sexspecific results. As many of the RRs are common between the specific ever smoking analyses in Table 5 and the ever/ current smoking analyses in Table 11, the meta-analysis RRs tend to be quite similar. However those for current/ ever smoking are intermediate between those specifically for ever smoking (Table 5) and those specifically for current smoking (  Figure 20 Forest plot of ever pipe and/or cigar smoking and all lung cancer.  Figure 18 (squamous) and Figure 19 (adeno) present the results of the a Within each study, results for the relevant smoking product and smoking status are selected in the following preference order, within each sex, for: cigarette typesee notes f to i; unexposed groupsee notes f to i; smoking productany, cigarettes (ignoring other products), cigarettes only; smoking statusever, current; lung cancer typesee notes c to e; raceall or nearest available, otherwise by race; follow-up periodlongest available; overlapping studiesprincipal, subsidiary; agewhole study, widest available age group; Results are then selected for: sexsingle sex results, combined sex results; adjustment for potential confoundersmost available. b n = number of estimates combined, F = fixed-effect meta-analysis RR (95% CI), R = random-effects meta-analysis RR (95% CI), H = heterogeneity chisquared per degree of freedom, P H = probability value for heterogeneity expressed as p < 0.001, p < 0.05, p < 0.1 or NS (p ≥ 0.1), P B = probability value for between levels (see Methods) similarly expressed. c All or nearest available, must include at least squamous cell carcinoma and adenocarcinoma.  main meta-analyses for ex smoking of any product (or cigarettes if any product was not available), based on most-adjusted RRs. Some results by levels of characteristics are shown in Table 12.
Again the RRs are markedly heterogeneous (p < 0.001 for all three outcomes), ranging up to 135.69 for all lung cancer (STUCKE/males), 22.90 for squamous (OSANN/males) and 13.10 for adeno (OSANN/males). The random-effects estimates (all lung cancer 4.30, 95% CI 3.93-4.71, n = 182, squamous 8.74, 6.94-11.01, n = 33, and adeno 2.85, 2.20-3.70, n = 34), though all clearly positive, are smaller than the corresponding estimates for current smoking. Individual RRs are only very occasionally below 1.0 and never significantly so. Estimates are little affected by using the more specific definition of each outcome, preferring leastadjusted RRs to most-adjusted RRs, or preferring RRs for ever smoking cigarettes to those for ever smoking any product. RRs for ever smoking cigarettes only were too few for useful analysis for squamous and adeno, but for all lung cancer were similar to those for ever smoking any product. Fuller details are given in the Additional file 5: Detailed Analysis Tables.
For the main meta-analysis of ex smoking, the studies contributing most to the total weight for all lung cancer were STOCKW/sexes combined (22.4% of the total of 4,739), followed by BROWNS/males (8.5%) and BROWNS/ females (6.5%). BROWNS was the major contributor for both squamous and adeno, with the two sex-specific results contributing 49.4% of the total weight of 446 for squamous, and 45.2% of the total weight of 619 for adeno.
For the characteristics considered in Table 12 the sources of variation for all lung cancer are generally quite similar to those seen for ever smoking in Table 5 and for current smoking in Table 8 Figure 22 Forest plot of handrolled vs. manufactured cigarette smoking and all lung cancer. Table 14 presents the results of a meta-analysis for all lung cancer based on 20 relative risk (RR) and 95% confidence interval (CI) estimates for handrolled vs. manufactured cigarette smoking. The individual study estimates are shown numerically and graphically on a logarithmic scale. The studies are sorted on sex within study reference (REF) within start year of study (START) within continent (CONT). In the graphical representation individual RRs are indicated by a solid square, with the area of the square proportional to the weight (inverse-variance of log RR). Arrows indicate where the CI extends outside the range allocated. Also shown are the combined random-effects estimates. These are represented by a diamond of standard height, with the width indicating the 95% CI.
clearly lower for prospective than for case-control studies. Numbers of ex smoking RRs are less for squamous (33) and for adeno (34) than for all lung cancer (182), but nevertheless some associations are evident in relation to location for adeno, to study type for squamous, to number of adjustment factors for adeno, and to number of cases, smoking product and unexposed base for both squamous and adeno. Meta-regression analyses were not attempted for ex smoking.
E. Risk from smoking specific products compared to smoking of any product Table 13 summarizes the results of meta-analyses for all lung cancer for cigarette only smokers, smokers of pipes/cigars only, smokers of pipes only, and smokers of cigars only. In each analysis, the base is never smokers of any product. The results for ever smoking of pipes/ cigars only are also shown in Figure 20.
For ever smoking, current smoking and ex smoking the random-effects RRs are similarly elevated for pipes/ cigars, pipes only and cigars only, but to a markedly lesser extent than for cigarettes only. As for cigarette smoking, RRs for pipe and cigar smoking are clearly higher for current smokers than for ex smokers.
Available results for squamous and adeno are limited, and mainly for ever smoking. For pipe and/or cigar smoking, the RR for squamous (3.72, 95% CI 1.95-7.10, n = 8) is somewhat higher than that for all lung cancer (2.92, 2.38-3.57, n = 38), but the RR for adeno is not elevated (0.93, 0.62-1.40, n = 7). The lack of association of adeno with pipe and cigar smoking is also evident in the RRs for pipes only (0.50, 0.23-1.10, n = 4) and for cigars only (0.55, 0.11-2.88, n = 3).
The results for pipe and cigar smoking mainly apply to males, as the few available estimates for females have wide variability. The increased risk in smokers of pipes and cigars is evident in each location studied, though data for Asia are extremely sparse. Unlike for cigarettes, higher RRs are seen for Scandinavia (7.02, 4.72-10.44, n = 6) and for Other Europe (5.17, 2.91-9.19, n = 8) than for North America (2.27, 1.79-2.89, n = 26) or the UK (4.32, 2.73-6.84, n = 11). These results are for ever/ current smoking, with the full results given in Additional file 5: Detailed Analysis Tables. Table 13 also shows results for lung cancer for mixed smokers. For ever, current and ex smoking, the randomeffects RRs are slightly, but not significantly, higher than those for smokers of cigarettes only. Available results for squamous and adeno are again limited, and mainly for ever smokers. The RRs for squamous (9.78, 4.94-19.35, n = 6) and for adeno (2.48, 1.25-4.95, n = 6) do not clearly differ from the RRs for squamous (11.09, 7.19-17.09, n = 10) and for adeno (2.63, 1.32-5.24, n = 10) for smokers of cigarettes only.  Figure 23 Forest plot of mentholated vs. non-mentholated cigarette smoking of any product and all lung cancer. Table 14 presents the results of a meta-analysis for all lung cancer based on six relative risk (RR) and 95% confidence interval (CI) estimates for mentholated vs. for three comparisons, including, for studies where there is a choice, the nearest available equivalents to only filter vs. only plain (with results for all lung cancer also shown in Figure 21), ever filter vs. only plain, and only filter vs. ever plain. Results are also shown for the comparison of handrolled and manufactured cigarette smoking, and for mentholated vs. non-mentholated cigarette smoking, with results for all lung cancer also shown in Figures 22  and 23.

F. Risk by type of cigarette smoked
The random-effects RRs show a reduction in risk for only filter vs. only plain cigarette smoking that is significant for all lung cancer (RR 0.69, 95% CI 0.61-0.78, n = 42), and squamous (0.52, 0.40-0.68, n = 13), though not for adeno (0.84, 0.66-1.08, n = 10). The alternative comparisons for filter and plain, where only a third to a half of the RRs included actually differ, show clear reductions for all lung cancer and squamous associated with filter cigarette smoking, though no difference for adeno (see Table 14). The reductions for all lung cancer and squamous are evident in both sexes and all continents (see Additional file 5: Detailed Analysis Tables).
The risk associated with handrolled smoking is greater than that with manufactured cigarette smoking, with RRs of 1.29 (1.12-1.49, n = 20) for all lung cancer and 1.62 (1.18-2.21, n = 5) for squamous. The RR of 2.09 (0.83-5.25, n = 4) for adeno is based on very heterogeneous estimates, varying from 0.43 to 8.76, and allows no clear conclusion. As results for females are limited, and have wide variability, the conclusions mainly apply to males. The estimated RR for all lung cancer is greater than 1 in all locations studied, though not always statistically significant. However, there are no data from North America.
Data on mentholated cigarette smoking are limited, particularly by histological type. For all lung cancer, the RR of 0.98 (0.80-1.20, n = 6) is consistent with no effect of mentholation on risk, five RR estimates close to or below 1.0, counterbalancing one reported a Within each study, results are selected in the following preference order, within each sex, for: smoking statusever, current; smoking productany, cigarettes (ignoring other products), cigarettes only; cigarette typeany, manufactured (with or without handrolled), manufactured only; unexposed groupnever any product, never cigarettes, near equivalent (see Methods); follow-up periodlongest available; lung cancer typesee notes c to e; raceall or nearest available, otherwise by race; overlapping studiesprincipal, subsidiary; agewhole study, widest available age group; Results are then selected for: sexsingle sex results, combined sex results; adjustment for potential confoundersmost available.
Results are by number of cigarettes or cigarette equivalents. b n = number of estimates combined, F = fixed-effect meta-analysis RR (95% CI), R = random-effects meta-analysis RR (95% CI), H = heterogeneity chisquared per degree of freedom, P H = probability value for heterogeneity expressed as p < 0.001, p < 0.05, p < 0.1 or NS (p ≥ 0.1). c All or nearest available, must include at least squamous cell carcinoma and adenocarcinoma. d Squamous cell carcinoma or nearest available, but not including adenocarcinoma. e Adenocarcinoma or nearest available, but not including squamous cell carcinoma. f Number of sets of RRs available for the key value analysis, where base for comparison is never smoked. g Category for which results are provided includes 5 cigs/day but does not include 20 cigs/day. h Category for which results are provided includes 20 cigs/day but does not include 5 or 45 cigs/day. i Category for which results are provided includes 45 cigs/day but does not include 20 cigs/day.   a Within each study, results are selected in the following preference order, within each sex, for: smoking statusever, current; smoking productany, cigarettes (ignoring other products), cigarettes only; cigarette typeany, manufactured (with or without handrolled), manufactured only; unexposed groupnever any product, never cigarettes, near equivalent (see Methods), but see also footnote j; follow-up periodlongest available; lung cancer typesee notes c to e; raceall or nearest available, otherwise by race; overlapping studiesprincipal, subsidiary; agewhole study, widest available age group; Results are then selected for: sexsingle sex results, combined sex results; adjustment for potential confoundersmost available. b n = number of estimates combined, F = fixed-effect meta-analysis RR (95% CI), R = random-effects meta-analysis RR (95% CI), H = heterogeneity chisquared per degree of freedom, P H = probability value for heterogeneity expressed as p < 0.001, p < 0.05, p < 0.  comparison is with an inappropriate base group). These results do not appear inconsistent with those summarized in Table 15. Dose-response by amount smoked was investigated for pipe and cigar smoking, but the number of estimates available was small, and referred only to males. However, there was some evidence of dose-response. Thus for all lung cancer, one can compare RRs for cigar only smoking for the highest (8.21, 4.36-15.49, n = 6) and lowest exposure groups (1.84, 1.22-2.79, n = 5), and can also compare RRs for pipe only smoking for the highest (5.99, 3.57-10.04, n = 9) and lowest exposure groups (3.68, 2.75-4.93, n = 8).  a Within each study, results for ex smokers are selected in the following preference order, within each sex, for: smoking productany, cigarettes (ignoring other products), cigarettes only; cigarette typeany, manufactured (with or without handrolled), manufactured only; unexposed groupnever any product, never cigarettes, near equivalent (see Methods) but see also footnote j; follow-up periodlongest available; lung cancer typesee notes c to e; raceall or nearest available, otherwise by race; overlapping studiesprincipal, subsidiary; agewhole study, widest available age group; Results are then selected for: sexsingle sex results, combined sex results; adjustment for potential confoundersmost available. b n = number of estimates combined, F = fixed-effect meta-analysis RR (95% CI), R = random-effects meta-analysis RR (95% CI), H = heterogeneity chisquared per degree of freedom, P H = probability value for heterogeneity expressed as p < 0.001, p < 0.05, p < 0.1 or NS (p ≥ 0.1). c All or nearest available, must include at least squamous cell carcinoma and adenocarcinoma. d Squamous cell carcinoma or nearest available, but not including adenocarcinoma. e Adenocarcinoma or nearest available, but not including squamous cell carcinoma. f Number of sets of RRs available for the key value analysis, where base for comparison is never smoked. g Category for which results are provided includes 12 years but does not include 7 years. h Category for which results are provided includes 7 years but does not include 3 or 12 years. i Category for which results are provided includes 3 years but does not include 7 years. j For this analysis only, the exposed and unexposed group have the same smoking status, product and cigarette type. There is an additional preference to select the results with least adjustment for other aspects of smoking, followed by a preference to select the results for the shortest (=exposed) and longest (=unexposed) duration quitters. Alternatively preferring results with most adjustment for other aspects of smoking gives n = 65, F = 3.59 (   Results are then selected for: sex -single sex results, combined sex results; adjustment for potential confoundersmost available. b n = number of estimates combined, F = fixed-effect meta-analysis RR (95% CI), R = random-effects meta-analysis RR (95% CI), H = heterogeneity chisquared per degree of freedom, P H = probability value for heterogeneity expressed as p < 0.001, p < 0.05, p < 0.1 or NS (p ≥ 0.1). c All or nearest available, must include at least squamous cell carcinoma and adenocarcinoma. d Squamous cell carcinoma or nearest available, but not including adenocarcinoma. e Adenocarcinoma or nearest available, but not including squamous cell carcinoma. f Number of sets of RRs available for the key value analysis, where base for comparison is never smoked. g Category for which results are provided includes 3 years but does not include 7 years. h Category for which results are provided includes 7 years but does not include 3 or 12 years. i Category for which results are provided includes 12 years but does not include 7 years. j For this analysis only, the exposed and unexposed group have the same smoking status (i.e. ex-smokers), product and cigarette type. There is a preference (instead of that for comparison group) to select the results for the longest (=exposed) and shortest (=unexposed) duration quitters. Note that (unlike the inverse results shown in Table 18), the "shortest" quitters here may omit recent quitters, but subject to a limit of no more than two years.  smoking productany, cigarettes (ignoring other products), cigarettes only; cigarette typeany, manufactured (with or without handrolled), manufactured only; unexposed groupnever any product, never cigarettes, other; follow-up periodlongest available; raceall or nearest available, otherwise by race; overlapping studiesprincipal, subsidiary; agewhole study, widest available age group; Results are then selected for: sexsingle sex results, combined sex results; adjustment for potential confoundersmost available. b n = number of estimates combined, F = fixed-effect meta-analysis RR (95% CI), R = random-effects meta-analysis RR (95% CI), H = heterogeneity chisquared per degree of freedom, P H = probability value for heterogeneity expressed as p < 0.001, p < 0.05, p < 0.1 or NS (p ≥ 0.1), P B = probability value for between levels (see Methods) similarly expressed. c All or nearest available, must include at least squamous cell carcinoma and adenocarcinoma.

K. Risk by duration of quitting (vs. current smoking)
For duration of quitting compared to current smoking the number of data sets available are somewhat less than the corresponding number for duration of quitting compared to never smoking. Results included in the longest vs. shortest analysis shown in Table 19 are generally the inverse of those in the shortest vs. longest analysis in Table 18 (exceptions arising for studies which combined current smokers and recent quitters of more than 2 years). While the key value analyses shown in Table 19 echo the trends shown in   Table 5 against their weight (inverse-variance of log RR). The dotted vertical line indicates the fixed-effect meta-analysis estimate.
is slightly elevated. Longer quit durations are, however, clearly associated with a reduction in risk. For all lung cancer, almost 40% of the RRs used in the key value analyses included short-term quitters (of up to 2 years) in the current smoker base. No difference was seen between those RRs and those with a more precisely defined current smoker base.

L. Risk by tar level
Due to the variety of different methods of quantifying tar levels, only highest vs. lowest analyses have been carried out. No data were available by histological type, and all data relate to cigarette smoking. For all lung cancer and for ever/current smoking of cigarettes the 14 available estimates, from 9 studies, showed some evidence of heterogeneity (H = 2.29, p < 0.01). However, 12 of the estimates showed a higher risk in the higher tar group, and the random-effect estimate (1.42, 1.18-1.71) confirmed the relationship between risk and tar level. The increase was evident for males (1.29, 1.08-1.53, n = 7) and females (1.48, 1.05-2.09, n = 6). There was no evidence of heterogeneity by any specific characteristic, including extent of adjustment, 7 of the 14 estimates being adjusted for one or more of aspects of smoking. These results are based on RRs that are selected as being least adjusted for other aspects of smoking. Alternatively, using RRs selected as most adjusted for other aspects of smoking, the overall estimate was 1.34 (1.16-1.56, n = 14).

M. Risk by butt length and fraction smoked
All the available data relate to cigarette smoking. As the number of available estimates were quite limited, particularly for butt length, they have been combined into a single analysis including RRs for shortest vs. longest butt lengths and for greatest vs. smallest fraction smoked, and including results for ever smoking and current smoking. The combined estimates were 1.43 (1.14-1.79, n = 11) for all lung cancer, 1.39 (1.04-1.86, n = 7) for squamous, and 1.30 (1.07-1.58, n = 6) for adeno. There was some evidence of heterogeneity for all lung cancer (H = 2.29, p < 0.05) and for squamous (H = 2.96, p <0.01), though not for adeno (H = 0.75), but a clear majority (18/24 = 75.0%) of the estimates indicated a higher risk associated with smoking more of the cigarette.

N. Further analyses by histological type
The results so far have been restricted to all lung cancer, squamous or adeno. Table 20 gives results for ever, current and ever/current smoking of any product (or cigarettes if not available) for small cell carcinoma and large cell carcinoma, with corresponding results also shown for all lung cancer, squamous cell carcinoma and for adenocarcinoma. For ever/current smoking, the RR for large cell carcinoma (5.33, 4.02-7.07, n = 29) is quite similar to that for all lung cancer (5.48, 5.07-5.93, n = 342), while the RR for small cell carcinoma (11.14, 8.59-14.46, n = 61) is markedly higher, and similar to that for squamous cell carcinoma (11.62, 9.80-13.78, n = 82). This pattern is also true for current smoking, where RR estimates are higher than for ever/current smoking, and for ever smoking. Additional file 5: Detailed Analysis Tables gives results by level of the various characteristics studied. As for all lung cancer, squamous and adeno, RRs for small  As sex differences may reflect greater cigarette consumption in males, meta-analysis estimates of the sex ratio for ever/current smokers and for all lung cancer were also calculated within levels of amount smoked (as defined in section G). The sex ratio is 1. 33  A number of studies provide RR estimates for ever/ current smoking separately by age, and random-effects meta-analysis were conducted, based on the ratio of the estimate for the oldest age group for which data were available compared to that for the youngest. Despite only 22 of the 45 (48.9%) of the ratios showing a greater risk in the oldest age group, the meta-analysis showed a significantly higher risk in the oldest age group (ratio 1.17, 95% CI 1.10-1.25), the seven ratios with most weight all being greater than 1.0.
There were also eight studies, all conducted in the US, which provide comparable sex-specific results for ever/ current smoking separately for white people and black people (or non-white people). Random-effects metaanalyses of the white/black race ratio showed no difference between the races (1.05, 0.90-1.23, n = 14).

P. Further analyses based on non-independent pairs of relative risks
Some studies also provide separate non-independent least-adjusted and most-adjusted RRs for the same definition of exposure. There is little evidence that adjustment reduces the RR for ever/current smoking. Using the same preferences as in Table 11, the most-adjusted estimate is lower than the least-adjusted estimate for 57 of the 126 (45.2%) pairs for all lung cancer, for 14 of the 36 (38.9%) pairs for squamous, and for 21 of the 41 (51.2%) pairs for adeno. In no case do the percentages differ from 50% (at p < 0.05), and in each case the random-effects meta-analysis estimate based on the most-adjusted pair members is similar to the corresponding estimate based on the least-adjusted pair members (data not shown).
RRs for a dose-related index of smoking may be adjusted for other such indices. For all lung cancer, and for four dose-related indices of smoking, pairs of otherwise similar highest vs lowest RRs were identified in which one of the pair was adjusted for the most available other aspects of smoking, and the other had no such adjustment. Both were also chosen as adjusted for the most possible other variables (although those other variables may differ between the pair). There was a clear tendency for the additional adjustment for other aspects of smoking, typically including amount smoked, to produce lower RR estimates. This was true for 18/22 (81.8%, p < 0.01) of the pairs of estimates for age of starting to smoke, 12/15 (80.0%, p < 0.05) of the pairs for duration of smoking, all 17 (100%, p < 0.001) of those for years quit, and 5/7 (71.4%, NS) of those for tar level.
Based on results for ever/current smoking and for all lung cancer, RRs for mixed smokers were compared with those for smokers of cigarettes only. For 22 of the 34 (64.7%) pairs, the RR was lower for mixed smokers, but this tendency was not significant (p = 0.12). RRs for mixed smokers were also compared with those for smokers of pipes/cigars only. Here 23 of the 24 (95.8%, p < 0.001) pairs showed a lower risk in the smokers of pipes/cigars only.

Q. Publication bias
Some results of Egger's test [17] for publication bias are presented in Tables 5, 8 and 12, with further results given in Additional file 5: Detailed Analysis Tables, but have not previously been referred to in the text. For ever smoking there is evidence of publication bias for all lung cancer (p < 0.001) and adeno (p < 0.01), but not for squamous (p ≥ 0.1). For current smoking, some evidence of publication bias is seen for all lung cancer (p < 0.05), but not for squamous or adeno (p ≥ 0.1). For ex smoking, there is again evidence of bias for all lung cancer and for adeno (p < 0.001) but not for squamous. Figure 24 (all lung cancer), Figure 25 (squamous) and Figure 26 (adeno) show funnel plots for ever smoking. Where asymmetry is seen, this in the direction of there being more higher-weight RRs above the mean. This is consistent with the evidence in Table 5 of higher RRs for larger studies. Inspection of a funnel plot for ex-smoking for all lung cancer (data not shown) also showed the high weight RRs tended to be above the mean.

Evidence of a relationship
The meta-analyses carried out demonstrate a clear relationship of smoking to overall lung cancer risk. This is evident for ever, current and ex smoking, for pipes and cigars, and for all types of cigarette studied. The increased risk in smokers is evident in both sexes, in younger and older subjects, in all continents studied and in prospective and case-control studies. That this relationship is causal is supported by the evidence of a dose-response, risk increasing with increasing amount smoked, duration of smoking, tar level and fraction smoked, and with earlier age of starting to smoke, and decreasing with duration of quitting. It is also supported by the similarity of results based on most-adjusted and least-adjusted RRs (though adjustment for amount smoked reduces the association with other dose-response indices of smoking). The association is clearly evident with each of the major histological types of lung cancer studied, being stronger for squamous and small cell carcinoma, intermediate for large cell carcinoma, and weakest for adenocarcinoma. Exceptionally, no relationship is seen between adenocarcinoma and pipe or cigar smoking.

Heterogeneity
The studies are remarkably consistent in reporting an increased risk in ever smokers. Only two of the 328 all lung cancer RRs, none of the 102 squamous RRs, and nine of the 107 adeno RRs considered in Figures 1, 2, 3, 4, 5, 6, 7, 8, 9 are less than 1.0. However, studies also vary markedly in the magnitude of the estimated RR, as illustrated by the high values of H seen in the metaanalysis of the major smoking indices, which often exceed 5 and sometimes exceed 20. (H values of 5, 10 and 20 are the same as I 2 values [16] of 80%, 90% and 95%). This heterogeneity is perhaps unsurprising given the many sources of variation involved, including sex, location, timing, study design and populations, definition of outcome and type of product smoked, and extent of confounder adjustment.
Using univariable and multivariable (meta-regression) methods, we investigated variation in risk by a number of characteristics of the study and the RR for the outcomes all lung cancer, squamous and adeno. While our "fixed" multivariable models involving six characteristics (sex, location, start year of study, study type, number of cases and number of adjustment factors) explained a substantial proportion of the variation (e.g. reducing H from 22.84 to 4.72 for all lung cancer for ever smoking), there was always substantial residual heterogeneity (with H varying from 2.43 to 4.72 in the six analyses in Tables 7 and 10). Of the six characteristics studied, location was generally the most important characteristic, with RR estimates for ever and for current smoking and for all three outcomes always highest in North America, and lowest in China, and (with the exception of ever smoking for squamous) lower in the rest of Asia than in Europe, with no consistent differences seen between results for the United Kingdom, Scandinavia and the rest of Europe. Another consistently seen relationship was the tendency for RRs to vary by start year of study, with higher RRs seen in more recent studies. Three other tendencies were generally seen, though the level of significance varied according to the analysis. One was the tendency for RRs to vary by number of cases, with the lowest estimates always seen for the smaller studies, (involving 100 to 249 cases), another was the tendency for RRs to be higher in prospective studies than in casecontrol studies, and the third was the tendency for RRs to be somewhat higher in males than females. The final characteristic included in the fixed model, number of adjustment factors, showed no clear relationship with the RR, with significance either not present or weak (0.01 < p < 0.05), and the direction of effect inconsistent.
We also tested for the effect of a number of other characteristics on the estimated RR. A number of relationships were seen in the univariable models that were significant. However, these mainly became nonsignificant in the multivariable models, presumably due to correlations between the characteristics. Where a characteristic was significant, this tended to be only in one of the six analyses, so not providing convincing evidence of a true effect. It would have been possible, for each of the six combinations of smoking status and outcome we considered, to present analyses of "best" models, based on forward stepwise regression, that each included a different set of predictive characteristics. However we felt that the regressions we presented based on a fixed model were more useful. Sources of variation are discussed further in the following paragraphs.

Sex
If possible, sex-specific results are included in the metaanalyses, with combined sex results included only if not. Though variation by sex was not significant in all the main analyses, risk estimates generally tended to be higher for males than females. This is supported by additional analyses comparing RRs within study for the same outcome and exposure definition. Somewhat higher RRs were found in males even in analyses where comparisons were made within the same levels of daily cigarette consumption (about 5, 20 or 45 cigs/day). Even so, the existence of somewhat higher RRs for males does not necessarily indicate any greater susceptibility, as it may reflect their increased exposure to occupational carcinogens, or other differences in smoking history such as greater duration of smoking or increased use of plain and higher tar cigarettes. It should be noted however that in prospective studies where smoking habits were determined at baseline, the greater tendency of males to quit during follow-up may cause bias in the reverse direction. It should also be noted that comparison of smoker/never smoker RRs for men and women does not take account of possible differences in risk between male and female never smokers, the base groups for these comparisons. A detailed overall assessment of this aspect is beyond the scope of this paper, and ideally would involve direct comparison of risk in male and female smokers, with detailed adjustment for age, smoking characteristics and major potential confounding variables. We note that Bain et al. [18] concluded, based on analysis of two large prospective studies and review of results from six other such studies, that "women do not appear to have a greater susceptibility to lung cancer than men, given equal smoking exposure".

Age
While it is clear that absolute risk of lung cancer rises markedly with age, both in smokers and never smokers, it is far less clear whether the smoker/never smoker RR also does. Predictions based on the multistage model [19] suggest that there should be a modest rise, but there is difficulty in establishing this, especially when the great majority of the studies do not give results by age. Possible effects of age were investigated in two ways. The first method (see Tables 6 and 9) was to compare RRs which were specific to subjects in specific age groups. Data here were limited for squamous and adeno, and for all lung cancer suggested a possible increase in RR with age for current smoking, but not for ever smoking. More reliable are the comparisons (described in results section O), of RRs for the highest and lowest age groups within study for ever/current smoking; between-study differences are automatically controlled for under this approach. These showed a 17% greater risk for the highest age group (95% CI 10% to 25%). Whether or not a RR was adjusted for age was considered as a characteristic in the meta-regression analyses, but it never added significantly to the fixed model for either ever or current smoking for any of the three outcomes.

Race
Although RRs were entered onto the database, if available, there were few studies that provided such data. For eight studies which provided pairs of comparable RRs for ever/ current smoking, there was no indication that RRs for white people differed systematically from those for black people (or non-white people). This, of course, does not rule out the possibility that absolute risks for white people and black people with similar smoking habits may differ. As our concern was only with RRs for smoking, and whether these vary by other characteristics, we have not attempted to collect data comparing absolute risk according to these characteristics, such as white/black RRs within never smokers, or within smokers. Detailed analysis and discussion of racial differences in lung cancer risk between black people and white people is therefore beyond the scope of this paper. Elsewhere Lee [20] points out that in the USA black men have a higher risk of lung cancer than do white men. However, interpretation of this difference in terms of effects of smoking is not straightforward for various reasons. Thus Lee notes that though black people are more often current smokers, are less likely to quit smoking, smoke cigarettes with a higher tar level, and have higher cotinine levels, all characteristics predictive of a higher risk of lung cancer, they are also less likely to have ever smoked, smoke fewer cigarettes a day and start to smoke later, all characteristics predictive of a lower risk. Also little or no difference in lung cancer rate is seen between black and white women. Black people are much more likely than white people to use mentholated cigarettes, but no evidence of a difference in lung cancer risk associated with mentholation was found, either in the present analysis or in other reviews [20,21].

Location and national cigarette tobacco type
A consistent tendency in our meta-analyses was for RRs to be highest in studies in North America, intermediate in Europe and lowest in Asia, particularly in China. There was no very clear evidence of a difference between European countries, or between other countries in Asia, though some of the analyses suggested relatively lower RRs in Greece and Turkey than in the rest of Europe, and higher RRs in India than in the rest of Asia. In an attempt to study a possible explanation for this difference we divided countries into three groups by national cigarette tobacco type. One was the countries (Australia, Canada, India, South Africa, UK and Zimbabwe) which typically use flue-cured Virginia tobacco, another was the countries (all except those in the other two groups) which typically use blended tobacco, and the third included Taiwan and China (countries which used both types quite commonly or where we lacked confirmed information). Including this variable into the meta-analyses did not consistently improve the prediction of our model, a finding which is consistent with the conclusions of other analyses we have conducted based on national data on lung cancer rates and smoking frequency [22]. There are, of course other possible explanations of the clear differences in lung cancer RRs between continents, including genetic differences, and differences in baseline rates of the disease.

Study timing
Our meta-regressions generally showed a tendency for RRs to be lower in studies which started earlier. There may be a number of reasons for this, such as changes in the relative use of cigarettes and pipes or cigars, and improvement of study quality, with better standardization of questionnaires and definition of products smoked. However we consider the most plausible reason to be changes in patterns of uptake of smoking, with smokers in earlier born cohorts being less likely to have a lengthy smoking career than smokers in later born cohorts.

Study type
Though this was only clearly significant in the analyses of ever smoking for all lung cancer, there was a consistent tendency for RRs to be somewhat higher from prospective studies than from case-control studies. If this is a true effect, the explanation for it is unclear.

Number of cases
In order to limit the considerable amount of work needed, we limited attention to studies involving at least 100 lung cancer cases. Given that smaller studies would have contributed much less weight to the meta-analyses than would the studies that were included, we consider that this restriction unlikely to have any material effect on our conclusions. The meta-regression analyses did show a consistent tendency for RRs to be higher in larger studies, though this was only significant for ever smoking (all lung cancer p < 0.001), squamous and adeno p < 0.05). This tendency is in the opposite direction to that predicted from publication bias. The explanation is unclear.

Adjustment for other factors
Generally our analyses showed that adjustment for age and other factors had very little effect on the metaanalysis estimates of smoking-related RR, whether one considered the total number of adjustment factors, or the effect of specific factors. This conclusion of a minimal effect of confounding is consistent with that of a detailed analysis of data from the huge CPSII prospective study [23], and means that though the main results we report are based on most-adjusted estimates, this decision had little or no effect on our conclusions or on the magnitude of our estimates.
Adjustment for other aspects of smoking is, however, important when considering the dose-related variables. Though studies rarely, if ever, present results to allow detailed analysis of the effect of adjustment for one specific aspect of smoking on RRs for another aspect, we have shown that adjustment for other aspects of smoking (which typically includes amount smoked) consistently tends to reduce associations with age of starting to smoke, duration of smoking, years quit and tar level. This is presumably due to the tendency for earlier starters and high tar smokers to smoke more heavily than do later starters and low tar smokers, and for lighter smokers to be more ready to quit smoking. Below, we further discuss the effect of adjustment on results for type of cigarette.

Product smoked
There was consistent evidence that risk of lung cancer was higher for cigarette only smokers than for smokers of any product, and substantially higher than for smokers of pipes only, cigars only or pipes/cigars only. For current smokers, for example, RRs were 9.57 (7.90-11.59) for cigarettes only, as compared to 4.76 (3.44-6.59) for pipes/cigars only. Mixed smokers tended to have similar risks to cigarette only smokers. Interpretation of this finding is difficult as mixed smokers and cigarette only smokers may have a different total exposure to tobacco, as well as a different cigarette consumption. Data on the types of cigars or pipes smoked have not been recorded on the database, but the increased risk is evident in each continent. The results for pipes and cigars mainly apply to males and to RRs for all lung cancer. Though there are only limited results by histological type, it is interesting that there is no indication of an increased risk of adenocarcinoma for pipe and cigar smokers.

Type of cigarette smoked
The conclusions drawn from the results in Table 14 are consistent with those drawn by one of us in a review of the relationship between lung cancer and type of cigarette conducted in 2001 [24]. This is unsurprising, because the data sets considered are very similar. The conclusions are also very similar to those of a review by Kabat carried out in 2003 [25].
Comparisons between filter and plain smoking are made more difficult by the variety of ways in which different reports present their results, but based on the index most closely equivalent to only filter vs. only plain, the present report shows a reduction in risk that is significant for all lung cancer (0.69, 95% CI 0.61-0.78) and for squamous (0.52, 0.40-0.68), though not for adeno (0.84, 0.66-1.08). Significant reductions in risk for all lung cancer and squamous, but not for adeno were also evident for the alternative comparisons ever filter vs. only plain, and only filter vs. ever plain. Our analyses were based on most-adjusted RR estimates, with many of the estimates adjusted for other aspects of smoking, such as number of cigarettes smoked. In 2001, a National Cancer Institute monograph [26] claimed that apparent benefits of filter vs. plain and of low tar vs. high tar cigarettes may be illusory if RRs are adjusted for daily consumption, as switching to cigarettes with a lower machine-smoked delivery of tar and nicotine leads to "compensation" for the reduced nicotine intake by increasing numbers of cigarettes smoked. Lee and Sanders [27] investigated this claim in detail by comparing RRs for all lung cancer adjusted and unadjusted specifically for daily cigarette consumption, and concluded that "whether or not relative risk estimates are adjusted for cigarette consumption is not crucial to the conclusion of a clear advantage to filter cigarettes and tar reduction". This analysis is more precise than that used in this report, but its conclusions are similar, as we also found adjustment not to affect our overall conclusion that filter vs. plain cigarette smoking was associated with a lower risk of all lung cancer and of squamous. It should be noted that although no significant reduction in risk for filter cigarette smoking was seen for adeno, there was also no evidence of an increase. This would seem to argue against the claim often made that the observed rise over time in the incidence of adenocarcinoma relative to squamous cell carcinoma seen in many countries is due to changes in cigarette design increasing the risk of smoking-related adenocarcinoma. In this context, it should be noted that though our database contains evidence by histological type for filter vs. plain cigarette smoking, no such data were found relating to tar level.
Our conclusions of a higher RR in handrolled vs. manufactured cigarette smokers is consistent with that of the 2001 review [24], with the increased risk evident, despite the limited amount of data, for squamous and adeno as well as for all lung cancer.
Our review also found no difference in risk between smokers of mentholated and non mentholated cigarette smokers, though based on data from only three studies, only one of which provided results by histological type. Though no more recent studies have reported results by histological type, five further studies have reported results for all lung cancer, and a recently published systematic review [20] confirms the lack of apparent effect of cigarette mentholation on the lung carcinogenicity of cigarettes.

Dose-response relationships
We have investigated the relationship of lung cancer risk to various indices of the dose-response relationship. We did not record data on our database for pack-years, as we wished to investigate the separate roles of daily amount smoked and duration of smoking. Indeed, previous work (e.g. [19,28]) has in fact suggested that pack-years is not a valid measure, as for example, smokers of 20 cigs/day for 40 years and smokers of 40 cigs/day for 20 years have very different smoking RRs despite their identical pack-years. For those indices that we did consider where there were substantial amounts of datadaily amount smoked, duration, age of starting to smoke, and time of quit (relative both to current smoking and to never smoking)there was very clear evidence that greater exposure leads to greater risk, not only for all lung cancer, but also for squamous and adeno. The results by time of quit extend the observation that RRs in ex smokers are intermediate between those of never smokers and current smokers. Because dose-response results are expressed in categories of exposure which vary from study to study, there are difficulties in combining the evidence over studies. We have used two approaches. One is to consider the RR for the highest vs. lowest level of exposure (where highest and lowest refer to expected risk, so that early ages of starting, for example, are considered highest). The other is the key value approach where we consider categories including a specified level of exposure and not including another specified level. Both approaches have limitations. The highest vs. lowest approach will vary between study in the ratio of exposures considered, while the key value approach, although combining results relating to different exposures in different studies to a lesser extent, necessarily omits results from studies with broader categories while somewhat arbitrarily selecting or discarding RRs from studies with narrow categories. Work is ongoing on a third approach to fit a doseresponse curve to the RRs and estimated dose midpoints of the categories for each study. This approach is complex, and was considered outside the scope of the current paper, which was more intended to summarize major features of the data. However, a future paper is planned which will describe the shape of these dose-response relationships including characteristics of the curves, such as the estimated time after quitting by which half the excess risk associated with continued smoking has disappeared. We note that, when considering RR for time of quitting, the problem of "reverse causation" needs to be taken into account, as evidenced by the data in Table 19 showing no decrease in risk compared to current smokers for quitters of about 3 years. Our analyses also showed that for all lung cancer, risk increased with increasing tar level and with increasing fraction smoked (or equivalently short butt length), data here being more limited and non existent by histological type. As noted earlier, when discussing cigarette type, the relationship with tar level is not an artefact of inappropriate adjustment for amount smoked [27], as has been claimed [26].

Derivation of RRs
Almost a third of RRs used in meta-analyses were not directly available from the source or calculated directly from cross-tables of exposure by outcome, and required more complex methods to derive the required RR. It was reassuring that whether or not the RR was derived did not (with one minor exception) add predictive power to the main meta-regression models, suggesting that our extensive use of derived RRs caused no material bias.

Effect of studies with high RRs or large weight
The statistical analyses investigated the role of various characteristics on the estimated risk of all lung cancer, squamous and adeno in relation to ever and current smoking, but generally did not formally test the effect of exclusion of specific studies with extreme RRs or large weights. An exception was the case of study LIU4 for ever smoking and all lung cancer, this study not giving data for current smoking or by histological type. The two sex-specific RRs for this study together contributed 50.9% of the weight for the 328 available RRs from all the studies, and its exclusion increased the overall fixedeffect RR from 4.22 (95% CI 4.16-4.28) to 6.47 (95% 6.34-6.60). However there was little difference in the random-effects estimates, and in the meta-regression analysis the two LIU4 RRs did not produce unusual standardized residuals, suggesting that the relatively low RRs from this study (2.76, 2.69-2.83 for males, and 2.86, 2.77-2.95 for females), were due to the characteristics of the study included in the model (in particular that it was conducted in China) and not due to its unusual results. While there are other large studies, none involved nearly as many lung cancer cases as LIU4, and we feel it unlikely that excluding other specific studies would have had a major effect on our meta-analysis estimates or on our conclusions as to how RRs varied by exposure, outcome and study and RR characteristics.

Representativeness
We did not exclude studies on the basis of the population studied. However, most studies include subjects broadly representative of the general population. A small number of studies were conducted in miners or in other occupations with a known or suspected lung cancer risk, such as welding or foundry working. Risky occupation was considered as a characteristic in the meta-regression models but was never found to be an independent predictor of RRs associated with ever or current smoking.

Publication bias
It is well known that researchers are more likely to wish to publish, and editors more likely to accept for publication, studies finding a statistically significant association between exposure and disease. The published literature may therefore overstate any true association or produce a false-positive relationship. As part of each metaanalysis we have carried out Egger's test of publication bias, though results are generally shown only in the detailed tables. While evidence for such bias generally is mixed, the results for all lung cancer suggest that, where significant bias is seen, it is not in the direction of smaller studies with lower-weight RRs producing higher RRs. Rather it is, as noted above, the larger studies that tend to produce higher RRs. The reason for this finding is unclear. It should also be noted that our analyses are based only on those studies satisfying the inclusion criteria, and that one of these criteria restricted attention to studies with at least 100 lung cancer cases.
We have not attempted to try to correct for publication bias for four reasons. Firstly, we feel that evidence for its existence is not strong. Second, any adjustment for it seems unlikely to affect our main conclusions. Third, any adjustment for it would be complicated by the restriction on study size. Finally, any correction for publication bias would be open to question, as it inevitably involves assumptions that are impossible to verify.

Bias due to misclassification of smoking status
Another source of bias is misclassification of smoking status. Random misclassification would dilute the association, as would any tendency for cases to deny or understate their smoking more than for the general population. Any tendency for current smokers to claim to be ex-smokers, as might happen in a study conducted in a clinical setting or where patients have been advised to stop smoking, would tend to inflate the risk for ex smoking. Adjustment for misclassification would be difficult, as denial rates are likely to vary by aspects of the study design, the way questions are asked, and also by sex, age, location and other demographic variables.

Limitations
This review has various limitations, many unavoidable. Lack of access to individual subject data limits the ability to carry out meta-analyses using similar exposure indices and confounder adjustment throughout, but obtaining such data was not feasible given many studies were conducted years ago. Obtaining a reliable definition of outcome and exposure is often hindered by incomplete information in the source papers. We do not consider that limiting attention to studies of 100 cases or more is of particular importance as results from smaller studies would contribute little weight to the overall meta-analyses. Limiting attention to studies conducted up to 1999 may be more relevant for some exposures and issues (particularly the trend in RR over time), though we feel that our consideration of data from 287 published studies should give a very reliable overall picture. The problem is that the procedures conducted for this review were extremely time-consuming and it would take some years to update the database and include smaller and more recent studies.
It may also be argued that the analyses presented here do not make full use of all the data collected. This is inevitable, given the extensive amount of information collected and the need to present the findings in a paper of reasonable length. As noted, when discussing doseresponse, we do plan further analyses. We would also be willing to make the database available to bona fide researchers for further analysis.

Conclusions
After excluding studies involving less than 100 lung cancer cases, we identified 287 epidemiological studies of lung cancer which provided information on risk in relation to one or more of a defined list of smoking indices [2,3,6,. Of the 267 independent principal studies, 262 provided RRs relating to all lung cancer, 84 provided RRs relating to squamous cell carcinoma, and 86 provided RRs relating to adenocarcinoma (or to outcomes that are closely equivalent). One major conclusion is that for each outcome the RRs for all major smoking indices were markedly heterogeneous.
Another conclusion is that RR estimates for ever, current or ex smoking of any product (or cigarettes if not available) are clearly elevated for all three outcomes. Individual study RRs virtually all exceed 1.0, and based on random-effects meta-analyses of most-adjusted RRs, increases were seen for ever smoking (all lung cancer 5.50, CI 5.07-5. 96 91-4.56). While pipe and cigar smoking is associated with an increased risk for squamous, there is no increase for adeno. The consistency and strength of the relationships are consistent with a causal relationship (except for pipe and cigar smoking and adenocarcinoma). A causal relationship is also supported by the fact that estimates are generally not materially affected by adjustment for confounding variables, and by the strong evidence of a dose-response relationship, with RRs for all outcomes clearly increasing with amount smoked, duration and earlier starting age, and decreasing with time quit, and for all lung cancer increasing with tar level and fraction smoked. Relationships were also clearly seen between smoking and RRs for the other major histological types, small cell carcinoma and large cell carcinoma.
Our review also provides evidence that risk varied by type of cigarette smoked, with filter cigarette smokers having lower risks than plain cigarette smokers (a conclusion not explained by "over-adjustment" for amount smoked), and that handrolled cigarette smokers have higher risks than manufactured cigarette smokers, though mentholation of cigarettes seems unrelated to risk. It also shows that various characteristics of the study and of the RR affect risk estimates. Thus RRs were generally highest for studies in North America and lowest for Asia, particularly in China, and higher in later starting, larger and prospective studies. RRs were also somewhat higher in males than in females, though this may be related to differences in their detailed smoking habits. There is no clear tendency for the smoking/lung cancer relationship to vary with age.
This comprehensive review provides further insight into the relationship of smoking to lung cancer and its major histological types.

Additional file 3: RRs.
Additional file 4: Dose-response data, not eligible for inclusion in meta-analysis. Tables (Individual file names  Competing interests PNL, founder of P.N.Lee Statistics and Computing Ltd., is an independent consultant in statistics and an advisor in the fields of epidemiology and toxicology to a number of tobacco, pharmaceutical and chemical companies. This includes Philip Morris Products S.A., the sponsor of this study. BAF and KJC are employees of P.N.Lee Statistics and Computing Ltd.