Income Imputation Strategies for the European Health Interview Survey in Luxembourg
Mauro Baldacchini  1@  , Maria Ruiz-Castell  1@  , Gwenaëlle Le Coroller  1  
1 : Luxembourg Institute of Health
1 A-B Rue Thomas Edison, 1445 Strassen, Luxembourg. -  Luxembourg

In the third wave of the European Health Interview Survey (EHIS) conducted in Luxembourg in 2019, 24.4% of responses lacked household income data (merge of “I do not wish to answer” and/or missing value), making that variable unsuitable for analysis, thus prompting the need for an imputation process (Lee and Huber 2021). However, the absence of additional information related to income and household assets within the EHIS questionnaire rendered the income estimation process particularly challenging. Household income was expressed in classes that ranged from “Less than €1,000” to “More than €12,500”. This, combined with the fact that the data in Luxembourg were collected using a self-administered survey (without interviewer), leads to the presence of different errors and imprecisions on the answers. Having an unbiased estimation of the income information is crucial for developing policy-making aimed results that can be effective for a determined stratum of the population. The purpose of the present work was to find a feasible way of imputing this information using data gathered from the EHIS. For the analysis, income missing information was assumed to be Missing Not At Random (MNAR), i.e. that income information is missing due to income itself, in addition to other respondent's characteristics (Kim et al. 2007).The work is divided into different steps. The first one consisted in conducting a literature review on the methods to impute income, focusing especially on MNAR data imputations. Secondly, after an initial examination of the variables, missing data patterns were analyzed, as well as inconsistencies among answers. Thirdly, relevant methodologies were applied to simulated data, and their effectiveness was evaluated in terms of Weighted Cohen's Kappa. Finally, we selected Random Forest as the best method for our data, and we compared the proportions obtained from the imputation with those obtained from the complete cases. Results of this work will be beneficial for other surveys in which income may serve as crucial variable for the analysis, as well as for future waves of the EHIS, such as the one scheduled for 2025, where the household income variable will be imputed when the rate of missing values exceeds 5%.


Personnes connectées : 3 Vie privée
Chargement...