Objectives To examine the effect on estimated levels of health conditions produced from large-scale surveys, when either list-wise respondent deletion or standard demographic item-level imputation is employed. To assess the degree to which further bias reduction results from the inclusion of correlated ancillary variables in the item imputation process. Design Large cross-sectional (US level) household survey. Participants 218 726 US adults (18 years and older) in the 2006 Behavioral Risk Factor Surveillance System Survey. This survey is the largest US telephone survey conducted by the Centers for Disease Control and Prevention. Primary and secondary outcome measures Estimated rates of severe depression among US adults. Results The use of list-wise respondent deletion and/or demographic imputation results in the underestimation of severe depression among adults in the USA. List-wise deletion produces underestimates of 9% (8.7% vs 9.5%). Demographic imputation produces underestimates of 7% (8.9% vs 9.5%). Both of these differences are significant at the 0.05 level. Conclusion The use of list-wise deletion and/or demographic-only imputation may produce significant distortion in estimating national levels of certain health conditions.
Frankel, M. R., Battaglia, M. P., Balluz, L. & Strine, T. (2012). When data are not missing at random: implications for measuring health conditions in the Behavioral Risk Factor Surveillance System. BMJ Open, 2(4), e000696. doi:10.1136/bmjopen-2011-000696.