Skip to main content

Table 1 The estimated proportions with each health behaviour, the phi coefficient between imputed values and the estimated excess matches for each analysis

From: Augmenting cancer registry data with health survey data with no cases in common: the relationship between pre-diagnosis health behaviour and post-diagnosis survival in oesophageal cancer

Behaviour 5 years before diagnosis

N

Estimated proportion with behaviour, \( \hat{p_i} \)

Estimated phi coefficient, \( \hat{\rho} \) = φ

Estimated excess matches, \( n\hat{p_i}\left(1-\hat{p_i}\right)\hat{\rho} \)

Median

95% CI

Median

95% CI

Median

95% CI

Current smoking

 overall

27,835

0.159

0.157,0.162

0.071

0.059,0.084

262.2

220.1312.2

 ESCC

8914

0.166

0.162,0.170

0.077

0.061,0.097

94.8

74.5120.7

 EAC

15,726

0.157

0.153,0.159

0.066

0.052,0.081

137.0

107.4169.5

Binge drinking

 Overall

27,750

0.100

0.098, 0.102

0.060

0.049,0.077

150.5

121.5192.1

 ESCC

8891

0.086

0.082,0.089

0.060

0.042,0.086

42.2

29.8,61.1

 EAC

15,673

0.109

0.106,0.111

0.058

0.042,0.079

88.6

63.6120.3

Heavy drinking

 Overall

27,749

0.048

0.047,0.050

0.011

0.002,0.025

14.3

2.7,32.0

 ESCC

8888

0.046

0.043,0.049

0.015

−0.002,0.036

5.7

− 0.7,14.2

 EAC

15,676

0.050

0.048,0.052

0.008

−0.004,0.028

6.0

−3.0,20.8

Physical activity

 Overall

27,830

0.737

0.734,0.740

0.034

0.026,0.046

185.1

139.4247.4

 ESCC

8912

0.716

0.709,0.721

0.036

0.016,0.056

64.7

29.6100.2

 EAC

15,724

0.750

0.746,0.754

0.031

0.013,0.047

91.4

40.0,138.4

Obese

 Overall

27,796

0.257

0.254,0.261

0.030

0.020,0.042

160.2

108.4226.8

 ESCC

8898

0.262

0.255,0.268

0.045

0.024,0.061

77.0

41.4104.6

 EAC

15,709

0.256

0.251,0.261

0.023

0.012,0.041

67.8

35.0,122.4

Current smoking with regular drinking

 Overall

27,735

0.034

0.033,0.035

0.022

0.009,0.038

19.8

8.0,34.2

 ESCC

8883

0.031

0.029,0.033

0.024

−0.000,0.049

6.2

−0.0,13.5

 EAC

15,670

0.035

0.034,0.037

0.021

0.004,0.042

11.5

2.1,22.4

  1. \( \hat{p_i} \) proportion of imputed values where the health behaviour is present
  2. \( \hat{\rho} \) = φ the correlation between the pairs of imputed values (calculated as the phi coefficient)
  3. \( n\hat{p_i}\left(1-\hat{p_i}\right)\hat{\rho} \)= the excess number of correct matches greater than would be expected through chance alone
  4. Median median of 100 repetitions of the imputation algorithm,
  5. 95% CI = empirical 95% confidence interval created from the 2.5 and 97.5 percentiles obtained from 100 repetitions of the imputation algorithm,
  6. N number of SEER oesophageal cancer cases receiving data from two donor records from the BRFSS health behaviour datasets
  7. ESCC oesophageal squamous cell carcinoma,
  8. EAC oesophageal adenocarcinoma