|
|
The UK is fortunate in its wealth of available major cross-sectional surveys,
with most government surveys available for secondary use since their inception.
These surveys share some common features:
-
they are large micro data files which contain a large number of detailed
variables which require analysis within an appropriate package
-
they form series of repeated cross-sections which enable comparisons over time
for groups
-
they are nationally representative, although 'nation' may mean the United
Kingdom as a whole, Great Britain, or constituent countries: England, Wales,
Scotland or Northern Ireland
-
they are sample survey data, which may involve a degree of complexity, both in
terms of their structure (many are household files, whereby data are collected
for all household members) and sampling strategy
-
the data holdings and documentation are extensive; users are more likely to be
overwhelmed than starved of detail
There is a wealth of data for the study of topics of contemporary social
interest and concern available using archived datasets.
What can a user do with the data?
Government survey datasets such as the Health Survey for
England (HSE) and the General Household Survey (GHS) are well suited to
particular research uses, including multivariate analysis, analyses that look
within households, and analyses that look at change over time. As micro data
these can look at relationships between multiple individual characteristics.
The depth of many questionnaires allows users to explore the validity of
existing means of operationalising concepts, or to use new ones.
Primatesta et al. (2001), for example, use the HSE to explore the relationship
between smoking and blood pressure. Adda and Cornaglia (2005) use saliva test
data from the HSE to demonstrate that while cigarette consumption declines for
some groups when tax is increased, the intensity with which the cigarette is
smoked increases to compensate.
|
|
 |
 |
|
Household datasets like the GHS enable household members to be associated with
each other by means of a household ID and inter-person relationship data.
Jarvis (1996) has used this aspect of the GHS to look at the association
between parenthood and smoking behaviour. Having controlled for a range of
socio-economic factors, he finds that parents with dependent children are more
likely than their childless peers to give up smoking. Sample size was increased
by ‘pooling’ several consistent datasets together.
|
A relatively high degree of consistency over time within survey series enables
trends to be monitored. Researchers can produce their own summary statistics
across time to generate their own time series (for example smoking by social
class for men and women (Evandrou and Falkingham 2002)), or may pool data over
time, to allow the data to be analysed not only by period but also by
pseudo-cohort (e.g. Kemm (2001) combined data for the period 1974 to look at
smoking by age and by birth cohort to find that smoking falls with age for all
cohorts).
But how does a potential user locate, understand and use data such as these for
a topic like smoking?
Finding data
Naïve users may simply start their search with a web search: a Google search on
'health survey' or 'smoking survey' will result in links to appropriate ESDS
web pages within the first page or two of hits. From the ESDS home page users
can readily access a range of tools to find data and/or information. These
include:
-
a catalogue search tool (the Data Catalogue), which enables users to perform
keyword searches on the metadata information about the survey such as its
abstract (with the ability to restrict results to ESDS sub-services and to
order results both by relevance and date)
-
an ESDS web site search which allows users to locate information about surveys
-
a list of major studies which allows users to quickly locate the best known
datasets
-
a browse by subject facility which allows users to drill down to look at
datasets for particular sub-topics such as 'Drug abuse, alcohol and smoking'
Data search results link to the appropriate catalogue records which provide
summary information about the surveys and links to the full documentation, the
data (including links to download the data, in standard formats such as SPSS
and STATA, for offline data analysis and links to the online data analysis
tool, Nesstar), and specialist support and/or registration facilities as
appropriate.
Using data
If researchers wished to use the GHS to undertake an analysis of people who
would like to give up smoking they would need to know whether there were a
sufficiently large number of people in the dataset who smoke but would like to
give up. The screenshot below shows the Nesstar distribution of the GHS
'giveup' variable in 2004/5; it gives the wording and applicability of the
question as well as the distribution of the variable. Users can see that that
this dataset contains 2,438 individuals who smoke but would like to give up.
See Analysing
health data for more information and movie tutorials on how to analyse
health data using Nesstar.
For further information or to access the datasets referred to in this case study
see:
|