Data source of depression
Summary statistics for depression were retrieved from the largest GWAS meta-analysis for depression up to date, which were conducted by Howard et al. [16]. It consists of three large-scale GWAS including 23andMe, Psychiatric Genomics Consortium (PGC) and UK Biobank, which included 807,553 individuals in total (246,363 cases and 561,190 controls). Hyde et al. used self-reported data of clinical diagnosis of depression through web-based surveys from 23andMe, Inc., a consumer genetics company, providing a total of 75,607 cases and 231,747 controls (n = 307,354) for analysis [17]. Within UK Biobank, Howard et al. used the broad definition of depression defined by the participants’ response to the questions ‘Have you ever seen a general practitioner for nerves, anxiety, tension or depression?’ or ‘Have you ever seen a psychiatrist for nerves, anxiety, tension or depression?’, providing a total of 127,552 cases and 233,763 controls (n = 361,315) for analysis. Within PGC cohorts, depression should be diagnosed by international consensus criteria (DSM-IV, ICD-9, or ICD-10), and the cohorts provided a total of 12,149,399 variant calls for 43,204 cases and 95,680 controls (n = 138,884) for analysis. The participants from the cohorts above were all European ancestry. 102 independent SNPs associated with depression were identified in this meta-analysis. Among these three GWAS, the summary statistics for all assessed genetic variants were only publicly available for UK Biobank and PGC, so we included the full summary statistics from 2 cohorts, PGC and UK Biobank, provided by Howard et al. to perform bi-directional MR analysis. Considering that the exclusion of data of the 23andMe cohort from MR analysis might lower the power, we utilized the summary statistics of depression as exposure from the meta-analysis of 23andMe, PGC and UK Biobank cohorts as a replication set for sensitivity analysis to explore the validity of the causal effect of depression on certain types of cancer.
Data source of different types of cancer
The summary statistics from GWAS for multiple kinds of cancers in publicly available databases were retrieved from MRC IEU OpenGWAS (MR-base) database [18]. The two-sample MR method requires two independent samples from the same population. If the population of the GWAS of cancers were not European ancestry, such GWAS will be excluded. Besides, to reduce the bias caused by overlapping datasets of exposure and outcome, if the GWAS for cancer included participants of the UK biobank, such GWAS will also be excluded.
Supplementary Table S1 presents the summary of the data source of different traits, including number of SNPs, number of cases, number of controls, sample size, etc. The estimates for the association between the genetic variants and risk of ovarian, breast, lung, glioma, and pancreatic cancer were obtained, respectively, from the publicly available summary statistics of Ovarian Cancer Association Consortium (OCAC) [19], Breast Cancer Association Consortium (BCAC) [20], International Lung Cancer Consortium (ILCCO) [21], Cohort-Based Genome-Wide Association Study of Glioma (GliomaScan) [22], and Pancreatic Cancer Cohort Consortium (PanScan) [23]. The estimates for the association between the genetic variants and risk of lymphoma, colorectal cancer, thyroid cancer, bladder cancer, and kidney cancer excluding renal pelvis were obtained, respectively, from the publicly available summary statistics of FinnGen consortium (www.finbb.fi). The above studies included participants of European ancestry only.
As the data included in this study is publicly available, we did not apply for any specific ethical consent or review from any participants of the GWAS above.
Statistical analysis
To assess the causal relationship between depression and multiple kinds of cancers, we conducted a bidirectional two-sample MR analysis for each pair of exposure and outcome. Figure 1 presents the workflow of our study.
For depression as exposure, we utilized 96 out of the 102 independent SNPs identified in the meta-analysis by Howard et al. as genetic instruments [16]. Meanwhile, for a certain type of cancer as exposure, we selected the genome-wide statistically significant (P < 5 × 10−8) SNPs associated with this type of cancer from the corresponding GWAS. To mitigate the bias caused by linkage disequilibrium (LD), we clumped the SNPs within 5 kb and sharing a LD with r2 > 0.001 together, and only selected the SNPs with the strongest effect on exposure as genetic instruments.
The summary statistics of these SNPs were retrieved from the GWAS meta-analysis for depression by Howard et al. and the GWAS of different types of cancer respectively. We tried to find a proxy SNP with high LD (r2 > 0.8) for those SNPs without matched records in the GWAS or meta-analysis of GWAS of outcome. Finally, these SNPs were excluded from analysis if no proxy SNP could be identified. Supplementary Tables S2 and S8 present all SNPs included in the MR analysis of each pair of exposure and outcome.
We used the conventional fixed-effect inverse-variance weighted (IVW) method to estimate the causal effect of exposure on outcomes [24]. For those MR analyses with high variant heterogeneity measured by the Cochran’s Q statistics, we used the random-effect IVW method to correct for the heterogeneity [25]. For those exposures with only one associated SNP as genetic instrument, we use Wald ratio method to estimate the causal effect. IVW is the most efficient MR method with the greatest statistical power, but it assumes that all instrumental variables are valid, and it will be biased if the average pleiotropic effects differ from zero. Weighted median method is more robust to outliers and only assumes that the majority of the instrumental variables are valid [26]. Thus, we performed sensitivity analysis to assess the robustness of the estimate of causal effect, including the weighted median method [27], the leave-one-out sensitivity test [28], and the Steiger filtering [29]. In Steiger filtering, we first calculated R2, the proportion of variance in the exposures and outcomes explained by SNPs, and the SNPs that explained less variance in exposures than that in outcomes were filtered. Causal effect estimation with IVW method was repeated after filtering. We also performed MR directionality Steiger test to confirm whether the direction of effect is oriented from exposure to outcome.For exposures with at least 5 associated SNPs as genetic instruments, we used MR Egger intercept test [30] to evaluate the horizontal pleiotropy across all genetic instruments. However, it is sensitive to outliers and violations of INstrument Strength Independent of Direct Effect (INSIDE) assumption, thus less efficient. Therefore, we also conducted MR pleiotropy residual sum and outlier (MR-PRESSO) global test [31], which is more robust to outliers [26]. Furthermore, where there was any evidence of horizontal pleiotropy, we performed MR-PRESSO outlier test which detects genetic instruments of horizontal pleiotropy as outliers and provides the estimate of causal effect again after the removal of outliers based on IVW method. We also performed MR-PRESSO distortion test to detect whether there was statistically significant difference in the estimate of causal effect before and after removal of outliers.
The conclusion of causality will be drawn if it shows consistent direction and estimate of causal effect in IVW and weighted median method, right orientation of causal relationship confirmed by Steiger test, and a P-value of IVW method less than the Bonferroni-corrected significance level of 1.2 × 10−3 (P-value threshold = 0.05/43: corrected for 43 pairs of exposure and outcome) after the correction for heterogeneity and horizontal pleiotropy. A P-value between 1.2 × 10−3 and 0.05 will be considered as suggestive evidence of causality.
Power and F-statistics calculation
We first calculated the power for our IVW analyses using an online web tool (http://cnsgenomics.com/shiny/mRnd/) [32], in which type-I error rate (α = 0.05), corresponding proportion of cases in the study (Supplementary Table S1) and point estimate of odds ratio calculated by fixed-effect IVW method (Supplementary Tables S3 and S9) were also used. F-statistics equals to ((N − k − 1)/k) * (R2 /(1 − R2)), in which N and k denotes the sample size and number of SNPs respectively [33]. F-statistics is the measurement of the strength of genetic instruments. A F-statistics less than 10 usually indicates the weak instrument bias.
All statistical analyses were performed with the MR-Base ‘TwoSampleMR’ v0.5.5 package, “MRPRESSO” v1.0 package (R Foundation for Statistical Computing, Vienna, Austria).