Data archives
What are data archives?
What is the Economic and Social Data Service?
What service does the UK Data Archive provide?
What is meant by data?
Where have the data come from?
How to find and access the data
How do I find a particular dataset?
How do I find studies on a particular subject?
Who can obtain data?
What formats are the data available in?
How do I gain access to the data?
How much will it cost?
Can I check the contents of a dataset before I obtain it?
Can I obtain data from other archives?
Using data
Can I obtain guides to using the data and questionnaires?
What help is available?
Are there any restrictions on the use of the data?
Can I use the data to identify individuals, households or organisations or for tracing family histories?
What is the most detailed geographical level I can analyse the data at?
Can I obtain publications?
Can I obtain statistics?
Publication
Why cite data?
How do I acknowledge and cite data?
Some journals ask authors to make available the data used for a publication. How do I comply with this?
Depositing data
What are the benefits of depositing data with the UK Data Archive?
Are there guidelines on creating and depositing data?
What are data archives?
Data archives are resource centres for analysts who use data for research and
teaching. Their functions usually include:
- ensuring that data are preserved against technological obsolescence and physical damage
- checking, validating and preparing data and accompanying user documentation
- cataloguing their technical and substantive properties for information and retrieval
- supplying them in an appropriate form to secondary users
- supporting users in using the data
The social science data archiving movement began in the 1960s within a number of key social science departments
in the United States who stored original coded interview data deriving from academic surveys. The movement
spread across Europe and in 1967 the UK Data Archive (the Archive) was established by the UK Social Science Research
Council (now the Economic and Social Research Council (ESRC)). In the late 1970s many national archives joined wider professional organisations to foster co-operation
on key archival strategies, procedures and technologies; encourage the exchange of data and technology
across national boundaries; and promote the acquisition, archiving and distribution of electronic data for social
science teaching and research.
What is the Economic and Social Data Service?
The Economic and Social Data Service (ESDS) is a national data service
that brings together the following centres of expertise in data creation, preservation and use:
UK Data Archive (the Archive), Institute for Social and Economic Research (ISER), Manchester Information and Associated Services (MIMAS) and
Cathie Marsh Centre for Census and Survey Research (CCSR). The ESDS promotes and supports the use of social science data in
research and teaching providing preservation, dissemination, user support and training.
What service does the UK Data Archive provide?
The UK Data Archive (the Archive) acquires, preserves and disseminates current
and historical quantitative (numerical) and qualitative data and descriptive information about those data (metadata) in the fields of social science,
economics, and humanities.
The Archive is a service provider for the ESDS and is responsible for the overall management function and the
central activities of data acquisition, processing, preservation and dissemination.
What is meant by data?
In the context of data archives, data means computer-readable data.
Quantitative (numeric) data can be either microdata or macrodata. Microdata consist of the coded responses to survey
questions (for example a 'yes' response could be coded as '1' and a 'no' response as '0') where each row of data
corresponds to an individual, household, family, or organisation and each column corresponds to a survey question.
Microdata are usually made available in SPSS, Stata and tab-delimited formats. Macrodata consist of aggregate
figures (for example country-level economic indicators) and can usually be viewed and analysed using MS Excel.
Qualitative material includes in-depth interviews, diaries, anthropological field notes and the complete answers
to survey questions. Qualitative material is typically available as word-processed documents or databases.
To analyse the data, users usually need to have access to the appropriate software.
However, a range of data is available via
Online data analysis tools.
Where have the data come from?
Datasets are deposited from a variety of sources, including academic researchers,
government departments, intergovernmental organisations, independent research institutes, and market research organisations.
Academic research funded by the Economic and Social
Research Council (ESRC) is an
important source of data, as the council operates a mandatory Datasets Policy whereby all award holders are expected to offer
data to the UK Data Archive for archiving. Examples of large-scale ESRC datasets are the British Social Attitudes Survey (BSAS), the British Election
Studies (BES) and the British Household Panel Study (BHPS).
Censuses and large surveys carried out by governments
for their own policy purposes are particularly rich sources of data for further
exploration. Central government, and in particular, the
Office for National Statistics (ONS) is a major and
regular supplier of data series, including the General Household Survey (GHS), the Labour Force Survey (LFS), and the Health Survey
for England (HSE).
Some datasets may not have been collected specifically for research purposes. Administrative databases, such as
the National Health Service Patient Re-registrations, although collected for a very different purpose, can provide valuable
and timely information for researchers.
Many of the multi-nation aggregate databanks are the current releases of the major statistical
publications produced by intergovernmental organisations such as the World Bank, International Monetary Fund or United
Nations.
How do I find a particular dataset?
Datasets can be found in the Data Catalogue
by searching for specific information such as the title, the person or organisation associated
with the study (e.g. data creator or sponsor), or the date of the study.
The Help on searching guide suggests the kind of searches
that can be performed and gives tips on how best to formulate queries.
How do I find studies on a particular subject?
Each dataset in the Data Catalogue is assigned one or more subject
categories to reflect the overall subject area of the data at study level.
The Browse by subject
option enables a user to browse through the data holdings by broad subject area e.g. education, or at a very specific level such as literacy.
Studies are also assigned keywords at question or variable level for survey data, or at a more
descriptive conceptual level for qualitative data. Keywords can be searched using the
Data Catalogue by selecting 'Keyword' from the drop-down list.
The keywords are taken from a controlled vocabulary list held in the UK Data Archive's
HASSET thesaurus which can also be browsed.
Who can obtain data?
Researchers, students and teachers from any field, organisation or country may register with the ESDS and obtain
data. However, some datasets have restrictions on access.
For example: ESDS International macrodata and Census data (from 1971 onwards) are only made available to users from
UK higher or further education institutions; commercial usage may be restricted - further details can be found on the
Commercial users page;
permission may be required from the depositor; publications may need to be vetted by the sponsoring organisation.
Details are available in the individual records of the Data Catalogue.
What formats are the data available in?
Most survey datasets are available to download in SPSS, Stata and tab-delimited
(suitable for use in MS Excel) formats and can also be requested in other formats such as SAS. Other ESDS systems,
including the ESDS Nesstar Catalogue,
provide those and additional data formats, such as Statistica and Dbase. ESDS International multi-nation aggregate databanks
are made available online via Beyond 20/20 software and tables can be downloaded in Beyond 20/20, MS Excel and comma-separated
formats. Qualitative data formats include MS Excel, MS Word and RTF.
How do I gain access to the data?
Access to data requires registration and uses federated access management (shibboleth) user authentication. You will need to have
a username and password to register - see How to
access data and Login and registration help for further
information.
Registered users can access or request data directly from the Data Catalogue
and via the Major Studies pages.
How much will it cost?
Data required for non-commercial purposes can be downloaded at no cost. If data are
requested on portable media e.g. CD, handling and postage and packing fees will apply.
See Charges for further details.
Can I check the contents of a dataset before I obtain it?
The Data Catalogue contains a full study description for each dataset
and also provides access to online documentation and variable lists. Online documentation includes user guides that contain information
on how to use the data, how the data were collected, and usually the original questionnaires or topic guides.
Online variable lists provide the variable names and variable and value labels.
The ESDS Nesstar Catalogue
provides details for all the variables within datasets available from the Nesstar system, displaying the full question text, frequency
counts and other summary statistics.
Access to both catalogues does not require registration. However, registration is required to conduct online data analysis or
to download data.
Can I obtain data from other archives?
Support is available to help users locate and acquire data from other archives within Europe and worldwide.
For example, through reciprocal agreements with a network of social science data archives including the
Inter-university Consortium for Political
and Social Research (ICPSR) at the University of Michigan in the USA, and members of
the Council of European Social Science Data Archives (CESSDA).
Data for several key international series can be found and requested via a
Data Catalogue search. Users can also search for data at individual archives via the clickable maps
at Other data archives. Additionally users are able to search seamlessly via
the CESSDA catalogue
to locate data and variables within a selection of datasets stored at a number of European social science data archives.
UK users can request data located at other archives through their account - see How to access data.
Can I obtain guides to using the data and questionnaires?
User Guides accompanying the data contain information on how to use the data, how the data were collected,
usually the original questionnaires, and occasionally frequency counts. These are freely available via the
Data Catalogue and the Major Studies
pages and, where available, are supplied with orders/downloads.
What help is available?
Dedicated User Support is available to help users with
selecting data, ordering data and using data.
Are there any restrictions on the use of the data?
Restrictions on the use of the surveys are outlined in the
End User Licence (EUL)
that all users agree to when registering. In particular there is a fundamental restriction concerning the
confidentiality of data. Users should not attempt to use the data to deliberately compromise the confidentiality of
individuals, households or organisations and are required to abide by the current Data Protection Act. The EUL also covers
requirements for citation of publications and safeguarding of data.
The sharing of data with other researchers or students and the re-use of data for a new purpose is restricted by the
terms and conditions outlined in the EUL. See Restrictions on use
for further information.
Certain datasets/usages may also require depositor permission and details are available in the 'Access' section of each catalogue
record in the Data Catalogue.
Can I use the data to identify individuals, households or organisations or for tracing family histories?
Unless respondents have given their permission or data are in the public domain, then data are anonymised.
When registering, users agree to preserve at all times the confidentiality of information pertaining to
individuals and/or households in the data collections (where the information is not in the public domain). Also, not to use
the data to attempt to obtain or derive information relating specifically to an identifiable individual or household,
nor to claim to have obtained or derived such information. In addition, to preserve the confidentiality of information
about, or supplied by, organisations recorded in the data collections.
Some History Data Service datasets in the Data Catalogue that are in the public
domain may be of interest to family historians/genealogists. However, the History Data Service is funded to preserve and disseminate
electronic data created by or for historians, and genealogical research is not part of their remit - see
Links for family historians.
What is the most detailed geographical level I can analyse the data at?
Most survey datasets contain one or more geographical variables e.g. place of residence, place of work. In most cases the most detailed
geographical variable available is a Government Office Region (GOR) variable which
allows researchers to identify broad regions, for example 'South East', 'South West, 'North East', 'North West'.
See Government Office Regions on the National
Statistics website for more information.
Most survey participants are informed that their responses will only be passed on to researchers under certain conditions
and that the data will be fully anonymised. Including more detailed geographical variables in the data, although still anonymised,
can increase the risk of data disclosure.
However, it is recognised that some researchers need access to more detailed data. To facilitate this and to increase
the range of data available for statistical research, the Office for National Statistics (ONS), in collaboration with the
UK Data Archive and ESDS Government, developed a strategy to provide access to social survey datasets that are
detailed, yet anonymised. Since these data pose a higher risk of disclosure, they have additional special conditions
attached to them that take the form of a Special Licence (SL).
Other data depositors may have their own mechanism in place for access to more detailed data or may use a form of the Special Licence.
To find out which geographical variables are available in a particular dataset, users should consult the relevant
catalogue record in the Data Catalogue and scroll down to the
'Spatial units' field within the 'Coverage' section. Variable and value labels, or further information on standard codings
and where to find them, are usually available in the associated documentation (freely downloadable via the catalogue record).
A 'Variables' search, for a key range of data, is available from the Data Catalogue
or via the ESDS Nesstar Catalogue.
Users may also find the
Beginner's Guide to UK Geography on the Office for National Statistics website helpful.
Can I obtain publications?
ESDS and UK Data Archive are not able to supply copies of publications (other than User Guides accompanying the data).
However, references to publications and journal articles produced by the data creators as well as those produced by secondary
analysts are available in the 'Publications' section (at the top) of the Data Catalogue
record for each dataset. There are also a number of searchable databases of publications which cite
particular datasets in the collection - see Publications citing ESDS Government Surveys and
Publications citing ESDS International data.
Can I obtain statistics?
The survey datasets we supply are usually computer-readable data files that require specialist software, such as SPSS or Stata, to analyse.
A number of datasets are available to most registered users to analyse and subset online using
Nesstar via the
ESDS Nesstar Catalogue, where basic frequency counts are
freely available to all users.
Links to sources of ready-made statistics can be found at Links to statistics.
Why cite data?
By establishing datasets as bibliographic entities and 'publishing' them as such, and by offering advice on citation, the UK Data Archive
plays a major role in extending research and scholarship. The creation of a dataset which is properly documented and usable by other researchers
deserves equivalent recognition and acknowledgement to a printed work of scholarship. Citation identifies sources for validation and further
research by different researchers. Failure to cite datasets means that valuable data sources will not be indexed by bibliographical services
such as social science citation indexes, and, more importantly, other researchers who would like to analyse these data may not have sufficient
information to acquire them.
How do I acknowledge and cite data?
Information on the citation and acknowledgement in publications is set out in the 'Study information and citation'
file from the online documentation table via the relevant record in the Data Catalogue.
An acknowledgement is a general statement giving credit to the source and distributor and includes copyright
information. It can be given at the start of, or within, the text, or at the end of the article before the bibliographic
references/citations. The information required (e.g. depositor, sponsor) can be found in the study description.
A suggested format for acknowledging data, using the example of research using the Health Survey for England 1995 and 1997, is:
This research was based on the Health Survey for England, 1995 and the Health Survey for England 1997,
produced by the Joint Health Surveys Unit of Social and Community Planning and University College London,
sponsored by the Department of Health, and supplied by the UK Data Archive. The data are Crown Copyright.
A citation is more formal than an acknowledgement. It follows a standard format and should include enough information so that the exact version of the data being cited can be located. It does
not include information on sponsor or copyright. A recommended form of the bibliographic citation is set out in the
'study information and citation' file. Each dataset should have a separate citation.
As an example the recommended format for citations for the Health Survey for England 1995 and 1997 are:
Joint Health Surveys Unit of Social and Community Planning Research and University College London, Health Survey for England, 1995 [computer file]. 3rd Edition. Colchester, Essex: UK Data Archive [distributor], March 2001. SN: 3796.
Joint Health Surveys Unit of Social and Community Planning Research and University College London, Health Survey for England, 1997 [computer file]. 2nd Edition. Colchester, Essex: UK Data Archive [distributor], December 2000. SN: 3979.
Also see Citing
international data on the ESDS International website and Citing data on the Census.ac.uk website.
Some journals ask
authors to make available the data used for a publication. How do I comply with this?
Some journals require authors to submit data alongside a publication so that the published results can be replicated
by others. Note that data obtained from the UK Data Archive/ESDS, including subsets and derived data, cannot be submitted to journals alongside
publications as this would be a breach of the End User Licence (EUL) that users agree to when they register. However, In most cases it is sufficient for the
author of the publication to supply information to the journal about how an individual can register and access the data.
The Help desk can provide this information if necessary.
In addition, for derived data there are a number of options available including that:
- the author can supply the syntax used to the journal
- the author can offer the data to the UK Data Archive/ESDS - it is a requirement of the EUL that any derived data be
offered for deposit - see Depositing data
- the author can request that anyone wishing to replicate the results should apply to the author
- before passing on the data, it is essential for the author to first check with the UK Data Archive/ESDS that any
applicant is a registered user and also entitled to access the data
What are the benefits of depositing data?
Data archiving has great benefits for data owners and creators. Deposit ensures the safekeeping of data
with control maintained on behalf of the data owner. This can include informing the owner of applications for use and
maintaining registers of users and usage. The ability to demonstrate continued usage of the data after the original
research is completed can influence funders to provide further research money.
Depositing data allows data owners to avoid the administrative tasks associated with external users and their queries.
At the same time the data holders can foster a fruitful dialogue between original and secondary researchers by running user
groups and data-use workshops while shielding the original researchers from the more tedious aspects of dissemination. It
is also an essential part of the scholarly research process to be able to identify information sources. Bibliographic
control of books, papers, journals and other printed sources is taken for granted: they are identifiable in library
and publishers' catalogues and when used as source material in scholarly publications
are fully referenced.
The deposit of data enables research datasets to be as fully identifiable as printed materials by
ensuring that: they are fully documented with all bibliographical details (statements of responsibility, titles, dates of issue, names of
distributors); they are fully catalogued in the Data Catalogue; users of the data are
aware of the need to reference the data sources in publications.
Are there guidelines on creating and depositing data?
Further information is available from the creating
and depositing data pages.