General guidance notes on pooling HSE data from 1999 and 2004 for comparing ethnic differences
This note gives guidance to users who want to pool the 1999 and 2004 HSE datasets in order to increase the number of respondents from minority ethnic groups.
There are differences in how ethnic minorities were sampled in the two years. In particular, there is no boost for Black Africans in 1999 and so this group cannot be included. It should also be noted that the sampling methodology for the Chinese differed in 2004 and 1999. For more information on the sampling design of both years see Volume 2 (methodological report) of the HSE reports for 1999 and 2004.
There are a number of weights within the 2004 HSE but the 1999 weights are less complex. Our website contains 2 detailed FAQs (FAQ1 FAQ2) on using the weights within the 1999 and 2004 datasets. These contain information on how to prepare the weights for pooling the 1999 and 2004 datasets.
Primary Sampling Units (PSUs)
If you are planning on using the PSU variables to account for the survey design, you need to bear in mind that PSUs with the same numbers in 1999 and 2004 do not represent the same PSUs across the 2 years. You can make the PSU variables unique within each year by adding 1000 to each PSU in 1999 and 2000 to each PSU in 2004.
Merging the datasets
First it is necessary to prepare the weights in the 1999 and 2004 data – see the FAQ on using the HSE individual weights if pooling the 1999 and 2004 data.
Having prepared the weights you can simply append/aggregate the 1999 and 2004 together. An example of the STATA syntax for appending the 2 datasets is given below:
use "<insert name of 1999 file>"
append using "<insert name of 2004 file>"
save “<insert name of new 1999/2004 dataset>”
Differences in topics between years
Extra topics in 1999
- Use of health services: GP, hospital, dental services
- Demispan measurement
- Mid-upper arm circumference measurement
Extra topics in 2004
- Heating/cooking appliances, mould and dampness, household pets
- Fruit and vegetable consumption: detailed section in the individual questionnaire in 2004 but only four questions in 1999 in the self-completion booklet on ‘eating habits’
- Complimentary and alternative medicines
- Parental health
- Cycling safety
- Euroqol general health (EQ-5D)
- Social capital
- Infant length
- Urine sample
Please note that not all respondents were asked each of these topics, for example in 2004 the nurse visit was only given to those from minority ethnic groups. Appendix A of the HSE User Guide for 1999 and 2004 give a detailed list of topics and the population/sub-sample of respondents covered by each topic.
Differences in classifications
The questions on ethnicity within the 1999 individual section and the 2004 individual and household ethnicity sections are very similar. The following differences are noted
- Irish has a slightly different definition
- Asian/Asian British – 2004 includes Indian Caribbean (not in 1999)
- Mixed – more detailed questions in 2004 if ‘other’ (mother’s and father’s cultural background)
- In 2004 for ‘Other family origins’ if more than one answer is given, respondent is asked for their mother’s cultural background (not in 1999)
The questions on ethnicity in the 1999 household section are different to the 1999 individual section and 2004 household and individual sections. Please refer to the 2004 and 1999 questionnaires for more detail
2. Socio-economic classification
In 1999 Socio-Economic Group (SEG) and Registrar General’s Class (RG Class) were used; in 2004 NS_SEC was used (SEG and RG Class are also available in 2004). The 3 or 5-category version of NS_SEC gives reasonable comparability with RG Class. See Appendix 2 of The National Statistics Socio-economic Classification: Origins, Development and Use
3. Index of Multiple Deprivation (IMD)
IMD is not in the 1999 dataset (but is in 2004)
NB: there may be other differences that are not noted here.