Health Survey for England Frequently Asked Questions
Questions:
Which years of the HSE are available?
Annual datasets are available from 1993 onwards. The 1991 and 1992 data are only available as a combined file. For a list of up-to-date datasets please refer to the Dataset Titles page.
What format is the data available in?
Most years of the HSE are available in SPSS, STATA and ASCII format
Where can I obtain statistics and tables from the HSE?
These are published in annual reports and are available from a good academic library or from The Stationery Office. Please also refer to the HSE online reports and results on the NHS Information Centre website (see the Department of Health website for earlier publications) and the HSE reports on the NatCen Health and Lifestyle web page.
Does the HSE include information at the person level?
Yes. For example, the 2001 data consists of an individual and a household file. The individual file contains records for all individuals in co-operating households who gave a full interview. It also contains information from the household questionnaire, main individual schedule, self-completions and the nurse visit (where one occurred).
The 2001 household file contains records on household composition and sex, age and marital status for all individuals in co-operating households. It is provided as an aid to household level analysis. Other household level variables are stored on the individual file.
What is the most detailed geography I can analyse the data at?
District Health Authority. There is also a variable that corresponds to 720 postcode sectors - although the values are not labelled, this allows you to analyse the responses of small groups of people living in the same postcode sector. The HSE also includes information on the type of area in which an individual lives including an indicator of the level of urbanisation, the index of multiple deprivation and the ONS area classification scheme.
Before I order, how do I find out what questions/variables are included?
Variable lists and PDF user guides (including questionnaires) are freely available via the Doc column in the Dataset Titles page.
Are the same questions asked each year?
The Health Survey for England contains a 'core' which is repeated each year and each survey year has one or more modules on subjects of special interest. Modules are often repeated periodically allowing analysis over time.
The 'core' includes: questions on general health and psycho-social indicators, smoking, alcohol, demographic and socio-economic indicators, questions about use of health services and prescribed medicines and measurements of height, weight and blood pressure.
The modules may be about a single topic, several topics or about population groups. The modules to date have been: 1993 cardiovascular disease; 1994 cardiovascular disease; 1995 asthma, accidents, and disability; 1996 asthma, accidents, and special measures of general health (Euroquol, SF36); 1997 children and young people; 1998 cardiovascular disease; 1999 ethnic groups; 2000 older people, and social exclusion; 2001 respiratory disease and atopic conditions, disability and non-fatal accidents; 2002 children and young people (aged 0-24); 2003 cardiovascular disease; 2004 ethnic minority groups; 2005 older people; 2006 cardiovascular disease; 2007 knowledge and attitudes; 2008 physical activity and fitness.
Why is a case coded as -1 (not applicable) when the question/variable is clearly applicable to that case?
The HSE generally uses standard missing value conventions which are shown below (a few of the variables do not conform to this scheme but the value labels are clearly marked). However, sometimes missing values are coded incorrectly, for example all missing values for the 2002 and 2003 variable WhVal (Waist-hip measurement value) are coded as -1 when they should be coded as either -6 or -7.
Missing value conventions on the HSE:
-1 Not applicable: Used to signify that a particular variable did not apply to a given respondent usually because of internal routing. For example, men in women only questions.
-2 Schedule not applicable: Used mainly for variables on the self-completions when the respondent was not of the given age range, also used for children without legal guardians in the home who could not participate in the nurse schedule.
- 6 Schedule not obtained: Used to signify that a particular variable was not answered because the respondent did not complete or agree to a particular schedule (i.e nurse schedule or self-completions).
-7 Refused/ not obtained: Used only for variables on the nurse schedules, this code indicates that a respondent refused a particular measurement or test or the measurement was attempted but not obtained or not attempted.
-8 Don't know, Can't say.
-9 No answer/ Refused.
Does analysis of the HSE require the use of weights?
The use of weighting variables depends on the year. Where a population has been oversampled (e.g. the population in care homes in 2000) weights are required to compensate for this. Weights are also required for analysis of children in the HSE. However, more generally, the HSE gives a good match to the total population and so weights are often not required. For example, in 2002 no weights need to be applied if using the adult sample only. Since 2003 non-response weights have been introduced to keep up with changes on many large-scale surveys with the aim of reducing possible biases.
More information on the use of weights for analysis of social surveys is available in the analysis guide Weighting the Social Surveys.
How do I use the weights in the 2004 ethnic boost dataset if combining the general population and ethnic boost files?
We have written a detailed briefing note on how to use the 2004 Health Survey for England (HSE) weights.
How do I use the Health Survey for England individual weights if I’m pooling the data from the 1999 and 2004 datasets (ethnic boost years)?
The 1999 and 2004 Health Surveys for England (HSE) both contained ethnic boost samples in order to increase the number of ethnic minority respondents in the survey.
It is possible to pool the 1999 and 2004 datasets to further increase the number of ethnic minority respondents for your analyses. When pooling the 2 datasets you need to ensure that you use the weights from 1999 and 2004 in the correct way.
First you need to prepare the 2004 weights, see the detailed briefing note on how to use the 2004 Health Survey for England (HSE) weights.
Next, you need to prepare the 1999 weights which are less complex than the 2004 weights.
For the 1999 dataset you need to follow the same procedure as 2004 in terms of dropping the ethnic minority cases from the general population dataset, merging the 2 files together and adjusting the weight so that the weighted total equals the actual total. All of this is illustrated in the detailed briefing note on how to use the 2004 Health Survey for England (HSE) weights. You can check the population is in the correct proportion for 1999 (e.g. the white group reflects the correct proportion of the population) by comparing with the 2001 Census estimates.
You should then scale the weights for both years (i.e. divide each year by the mean weight). Scaling the weights for both years means you're not giving prominence to any one year - both years have a mean weight of one. You then simply give the new scaled weight in each year the same name (e.g. wt_intnew) and then combine the datasets and use the new scaled weight (e.g. wt_intnew) as described in the detailed briefing note on how to use the 2004 Health Survey for England (HSE) weights.
We have also written some general guidance notes for users who want to pool the 1999 and 2004 HSE datasets.
Can I use multiple imputation to deal with missing data in the Health Survey for England?
Multiple Imputation (MI) for missing data can be carried out using the ice and mim commands in STATA (version 9.2 onwards). The results from the MI can then be compared with the results before MI is applied. However, the ice and mim commands in STATA cannot be carried out in conjunction with the STATA svyset commands (the svyset commands adjust the analyses to take account of the complex survey design). As the HSE has a complex survey design it is important that you use the svyset commands for analyses; using the svyset commands might have a larger effect upon the results of analyses than using the ice and mim commands. For more information on missing data (including MI using alternative packages) go to: http://www.missingdata.org.uk/
What population is covered by the HSE?
In most years the HSE covers the private household population, however, in 2000 older people in residential and care homes are also included in the survey. Children are not included in the HSE between 1991 and 1994. However, since 1995 the survey has been extended to include those aged 2-15 and since 2001 data is available for all ages. It should be noted that not all topics are considered suitable for children and in such cases data are not available for the child sample.
Are health surveys available for other UK countries?
Yes. Wales (Welsh Health Survey (WHS)), Scotland (Scottish Health Survey (SHeS) and Northern Ireland (Northern Ireland Health and Wellbeing Survey (NIHW)) all have health surveys that are downloadable from our web site. However, it is important to note that there are issues of compatibility between these surveys. The questions that are used are not always identical and there are methodological differences between surveys. For example, the WHS currently uses a paper questionnaire to collect information whilst the HSE, SHeS and NIHW gather data through interviews. Further information on the large scale surveys that contain information suitable for health research can be found in the Introductory guide to using large-scale government surveys for health research.
How is the Government Office Region (gora) variable coded in the HSE 2001?
The gora variable does not have labels in the deposited data in the HSE 2001. The coding for this variable is as below:
A North East
B North West
D Yorkshire
E East Midlands
F West Midlands
G East England
H London
J South East
K South West
Is there a more general FAQ?
There is a generic FAQ for all surveys available.