UK DATA ARCHIVE: IMPORTANT STUDY INFORMATION

Study Number 6697 - Business Structure Database, 1997-2011: Secure Data Service Access


LEGAL AGREEMENT ON CONDITION OF USE

Users should note that this dataset is subject to restrictive Secure Data Service Access conditions (see catalogue record for full details).

NEW EDITION INFORMATION

The third edition (October 2012) includes data files for the year 2011.

DATA PROCESSING NOTES


Data Archive Processing Standards

The data were processed to the UK Data Archive's A standard. A rigorous and comprehensive series of checks was carried out to ensure the quality of the data and documentation.�Firstly, checks were made that the number of cases and variables matched the depositor's records. Secondly, checks were made that all variables had variable labels and all nominal (categorical) variables had value labels. Where possible, either with reference to the documentation and/or in communication with the depositor, absent labels were created. Thirdly, logical checks were performed to ensure that nominal (categorical) variables had values within the range defined (either by value labels or in the depositor's documentation). Lastly, any data or documentation that breached confidentiality rules were altered or suppressed to preserve anonymity.

All notable and/or outstanding problems discovered are detailed under the 'Data and documentation problems' heading below.

Data and documentation problems

First Edition:
Some of the data files are missing anonimised postcodes for a number of cases.
For the 2005 Enterprise level data files, there are no 'Birth' values and numerous 'Death' values for variable 'demvar'.
For the 1998-2004 and 2006 Enterprise level data files and 1998-2004 Local Unit data files, most variables have missing values for the first thousands of cases.
The variable 'demvar' has missing values for all cases in the Enterprise data files for 2006-2009.
The variables 'demvar' and 'demvarred' have missing values for all cases in the Local Unit data files for 2006-2009. For the 1997 Local Unit data file, the variable 'demvarred' has values of 0 for all cases.
The Local Unit data files have unlabelled values of 'D' for the variable 'death_code' for many cases.
The variable 'status' has unlabelled values of 8 in a number of data files.
All data files have unlabelled values for the sic, foc and gor variables.
There are errant values for the variables 'birth' and 'death' in a number of the data files.

Second Edition:
The variables 'demvar' and 'deprivation' have missing values for all cases in the Enterprise data file.
The variables 'demvar' and 'demvarred' have missing values for all cases in the Local Unit data file.
The sic, foc and gor variables have unlabelled values.
The data files are missing anonimised postcodes for a number of cases.

Third Edition:
The variables 'demvar' and 'prev_wow' have missing values for all cases in the Enterprise data file.
The variables 'imm_foc', 'ult_foc', 'demvar' and 'demvarred' have missing or zero values for all cases in the Local Unit data file.
The sic, foc, gor, ua, coa and soa_lower variables have unlabelled values in the Enterprise data file.
The sic, gor, ttwa, ua, coa and soa_lower variables have unlabelled values in the Local Unit data file.

Useful Notes

Real and pseudoanonymised postcodes
The postcodes available in the first and second editions of these data (i.e. data files prior to 2011) are pseudo-anonymised postcodes. Real postcodes are available for the 2011 data (third edition).

Number of observations for 2011 data
The number of observations in the 2011 data files are reduced considerably compared to the 2010 files. This is because the files have been cleaned up by removing redundant data.

Values of 'D' for the variable 'death_code'
The variable 'death_code' in the local unit files takes the value 'D' if the associated enterprise which owns the local unit is 'dead'. This necessarily means that the local unit is also dead. There could be a time lag, and the recorded death date of the associated dead enterprise may not be recorded until the next available extract (i.e. next year of the BSD). For example, if a local unit has a death_code of 'D', a death date for the associated enterprise may not be given (missing in the data). This death date might become available the following year.

Suppose that in 1998, a local unit has a death_code of 'D'. The enterprise, for 1998, may not have a valid death date (it will appear as a missing value). If one examines the same enteprise in the 1999 or 2000 data, a valid death date of 1998 or previous, may now exist. This reflects the fact that there is a time lag associated with updating the status of an enterprise due to administrating the IDBR.

Please note that the 'D' value is missing for the local unit file after 2005, except in 2008 when it is present.

Variables 'live_vat' and 'live_paye'
Variables 'live_vat' and live_paye' refer to the number of VAT or PAYE registrations that an enterprise has.

Data conversion information

From January 2003 onwards, almost all data conversions have been performed using software developed by the UK Data Archive. This enables standardisation of the conversion methods and ensures optimal data quality. In addition to its own data processing/conversion code, this software uses the SPSS and StatTransfer command processors to perform certain format translations. Although data conversion is automated, all data files are also subject to visual inspection by a member of the Archive�s Data Services team.

With some format conversions, data, and more especially internal metadata (i.e. variable labels, value labels, missing value definitions, data type information), will inevitably be lost or truncated owing to the differential limits of the proprietary formats. A UK Data Archive Data Dictionary file (generally in Rich Text Format (RTF)) is usually provided for each data file, enabling viewing and searching of the internal metadata as it existed in the originating format. These files are called: [data file name]_UKDA_Data_Dictionary.rtf

Important information about the data format supplied

The links below provide important information about the Archive's data supply formats. Some of this information is specific to the ingest format of the data, i.e. the format in which the Archive received the data from the depositor. The ingest format for this study was STATA

Please follow the appropriate link below to see information on your chosen supply (download) format.

SPSS (*.sav)

STATA (*.dta)
Tab-delimited text (*.tab)
MS Excel (*.xls/*.xslx)
SAS (*.sas7bdat and *.sas)
MS Access (*.mdb/*.mdbx)

Conversion of documentation formats

The documentation supplied with Archive studies is usually converted to Adobe Portable Document Format (PDF), with documents bookmarked to aid navigation. The vast majority of PDF files are generated from MS Word, RTF, Excel or plain text (.txt) source files, though PDF documentation for older studies in the collection may have been created from scanned paper documents. Occasionally, some documentation cannot be usefully converted to PDF (e.g. MS Excel files with wide worksheets) and this is usually supplied in the original or a more appropriate format.