The use of 'real', as opposed to synthetic, datasets in teaching adds interest and relevance to courses, and, if the
data are updated on a regular basis, ensures that the courses are pertinent to current issues. Students who gain their
experience of data analysis from the use of specially constructed data rarely have a good understanding of the complexity
of data analysis in the 'real' world. An appreciation of this complexity through the use of unadulterated data can give
appropriate training for applied careers. Students also have the opportunity to understand the rationale for collecting
data and can develop critical faculties to judge the strengths and weaknesses of particular data. Data can be chosen to be
of particular relevance to the subject being taught and thus can bring both substantive and methodological topics alive.
If access to the data is obtained via the Data Catalogue then information is also provided on the main publications resulting from
the data and thus students can use the data in conjunction with associated publications.
Documentation
Access is provided to both data and documentation, and increasingly the value of the documentation is being
recognised as a resource in its own right. It can be used to train students in data collection methods and to provide
model surveys which the students might copy or adapt. For example, it might be feasible to involve students in conducting
their own mini British Social Attitudes Survey and then comparing their results
with the main survey.
Substantive research
The Data Catalogue contains a rich source of demographic, behavioural and attitudinal data with which to address many substantive
topics. Such data may be used in conjunction with the main publications arising from the analysis conducted by the study's
originators. Students could be asked to replicate research already conducted, to extend this research or to examine the data
from an entirely different perspective. By using the data directly, students gain a good appreciation of the limitations and
variations amongst different measures. For example, they might work through the method of calculating the weights for the
retail price index or they might explore the relationship between the estimate of unemployment arising from the
Labour Force Survey and that from unemployment benefit claimant counts. Several
datasets are good sources for comparative analysis, most notably the
Eurobarometer
Survey Series conducted across all EU
countries simultaneously using the same questionnaires, and the
International Social Survey
Programme in which an identical core of questions are included in surveys in twenty-two countries. Students may find the
British Social Attitudes Survey of particular interest since the topics change in each annual survey but are repeated
periodically, thus enabling the analysis of the change in attitudes over time. There is an annual report which provides
excellent material for further exploration by students at a range of levels of statistical expertise. In fact, a large
number of data sources in the Data Catalogue permit the exploration of change over time, including change in the demographic or
other aspects of the structure of society, change in behavioural patterns or in attitudes. Longitudinal data can result
from a number of different designs - fresh cross-sectional designs such as the British Social Attitudes Survey; panels such
as the British Household Panel Survey; rotating panels such as the
Labour Force Survey; and cohorts such as the
National Child Development Study. Such material can thus be used for an exploration of the various methods of collecting
data over time and their implications for the analysis and interpretation of both point-in-time estimates and estimates of
change.
Methodological issues
As indicated above, there are rich veins of data to be tapped to assist in the teaching of the analysis of data and their
use particularly within the policy process. In addition, the data can be used in teaching about various methodological
issues connected with the collection of data. The potential is substantial and the list here does not aim to be
comprehensive but is intended to give a flavour of what might be done:
- Sample size
Students could be asked to explore the relationship between sample size and precision using a small number of key datasets
or to discuss why different datasets are based on very different sample sizes.
- Sampling frame and the study methodology
The datasets held in the Data Catalogue have been generated from samples drawn from a range of different sampling frames which thus
provides an opportunity for the exploration of the effects of different sampling frames on the distribution of the achieved
samples. Several of the key official surveys have changed their sampling frame from the Electoral Register to the Postcode
Address File in recent years and could be used to explore the effects of such changes. Similarly there are other aspects of
the methodology of the study which might change over time, such as the method of data collection, thus permitting an
analysis of these effects. In a small number of cases changes have been introduced on a split panel basis which gives more
precise estimates of the methodology effects.
- Coder variability
Datasets sometimes contain the text of verbatim material in addition to coded material. It can be a valuable exercise to
have students code a sample of the material themselves and to explore different aspects of coder variability such as the
differences between the original coding and the students' coding, the within coder variability, the between coder
variability, and the variability in the use of a particular code. More advanced students might then examine the likely
impact of such variability on estimates, particularly estimate of change over time.
- Non-response and missing data
Surveys suffer from a substantial range of non-response and there is a wealth of hypotheses which
can be explored by students in examining why non-response varies, how it may be reduced and what factors correlate with
non-response. The large scale Government surveys which, in Census years, were subject by ONS to detailed non-response
checks (all of which are published) are particularly rich sources to exploit. Students might go on to look at the use of
different weighting schemes to adjust for non-response. Similarly, students might explore the level and distribution of
missing data on studies, and evaluate the effects of this on the analysis and interpretation of the data. Advanced students
could devise imputation procedures to adjust for missing data and, where imputation has been used by the original
researchers and the imputed values are flagged, examine the data with and without these values.