Labour Force Survey Frequently Asked Questions
• What population is covered by the LFS?
• Where can I obtain statistics and tables from the LFS?
• Before I order, how do I find out what questions/variables are included?
• When did the LFS move from seasonal to calendar quarters?
• Does the LFS include information at the person level?
• What format is the data available in?
• Can I use the LFS for teaching purposes?
• What are the differences between the various Labour Force Survey datasets?
• What is the relationship between the APS, the IHS and the LFS?
• Do I need to use weights to analyse the LFS?
• What shall I do with negative values in the weight variable for the longitudinal LFS dataset?
• What was the 2011 reweighting project for and which datasets does it affect?
• Can I use the LFS to analyse change over time?
• Do the longitudinal LFS datasets contain all the variables in the quarterly LFS datasets?
• How can I link the longitudinal, quarterly and household editions of the LFS?
• What is the lowest geographical area for which LFS data is available?
• Why is there a combined missing value category ‘-10’ in some of the household datasets?
• What are the access restrictions to LFS data under the End User License due to privacy protection reasons?
• Are age (continuous) and SOC occupation (4 digits) variables available in the QLFS dataset?
• When will the transition to the new SOC 2010 classification of occupations happen for the LFS?
• I cannot find what I am looking for in the documentation for pre 1990 editions of the LFS. What can I do about it?
• What should I do about missing variables in Labour Force Survey Calendar Datasets?
• What are the ID variables on the Quarterly Labour Force Survey datasets
• Is there a more general FAQ?
What population is covered by the LFS?
The LFS is intended to be representative of the total UK population. The survey covers all adults in private households. It also includes those living in NHS accommodation and young people (aged 16-24) in student halls of residence or similar institutions. Analyses requiring representative data should be used with weighted data.
Where can I obtain statistics and tables from the LFS?
Latest key statistics are published by ONS and detailed results are published quarterly in the Labour Market Statistics Bulletin. For older statistics see the Economic & Labour Market Review. Hard copies are available from good academic libraries or from The Stationery Office.
Before I order, how do I find out what questions/variables are included?
Variable lists and PDF user guides (including questionnaires) are freely available via the Doc column on the LFS catalogue page (you will need to select a flavour of the LFS first). A list of the variables in the QLFS Special Licence Dataset is available in this Excel file. A variable search tool is also available which allows users to find out whether a specific topic or question is covered in each of the LFS surveys (or any of the other large-scale government surveys).
The Social Surveys Data Service at ONS can also provide customized LFS data analysis and/or tabulations (including dialup access) according to customer requirements. Data runs are costed according to the specifications of individual requests, however a minimum fee of of £135 incl. VAT applies. Requests or enquiries should be addressed to email@example.com.
NOMIS is a database of labour statistics run on behalf of ONS by the University of Durham. Tel: +44(0)191 334 2680, email: firstname.lastname@example.org.
When did the LFS move from seasonal to calendar quarters?
In accordance with EU regulations, the LFS moved from seasonal (spring, summer, autumn, winter) quarters to calendar quarters (January-March, April-June, July-September, October-December) in 2006. The last seasonal quarter dataset issued was the Quarterly Labour Force Survey, December 2005 - February 2006 (SN 5356) and the first calendar quarter dataset was the Quarterly Labour Force Survey, January - March, 2006 (SN 5369). Users should note that there is some overlap between these two datasets. ONS have produced a limited series of historical LFS datasets on a calendar-quarterly basis. This will allow users to make meaningful comparisons of labour market statistics from the LFS microdata over time. Further information on the seasonal to calendar quarter change and its impact on LFS data may be found in the following online article:
Madouros, V. (2006) Impact of the LFS switch from seasonal to calendar quarters: an overview of the switch of the LFS to calendar quarters and the potential effects of this change on users, London: ONS.
Does the LFS include information at the person level?
Yes. The Quarterly Labour Force Survey and the 2 quarter and 5 quarter longitudinal datasets only contain information at the person level. Users should download the Quarterly Labour Force Survey Household dataset for analysis at the household level.
What format is the data available in?
Most years of the LFS are available in SPSS, STATA, SAS and Tab-Delimited format. Most of these datasets can also be used with R. Although one can also use Excel to analyse the data, we do not recommend it: users might find it difficult to have to constantly refer to the codebook while handling the data, or to control for missing data.
Can I use the LFS for teaching purposes?
Yes. The LFS teaching dataset (2002) gives a subset of data drawn from the UK Labour Force Survey, containing data from all four quarters of the 2002/3 LFS, for respondents aged 16-65 and resident in the UK (n=63,559). For ease of use within a teaching context, the dataset is restricted to a subset of 58 key (mainly individual level) variables. A Quarterly Labour Force Survey, June - August, 2005: Ethnicity Teaching Dataset is also available.
What are the differences between the various Labour Force Survey datasets?
(Quarterly) Labour Force Survey
The Labour Force Survey is a unique source of information using international definitions of employment and unemployment and economic inactivity, together with a wide range of related topics such as occupation, training, hours of work and personal characteristics of household members aged 16 and over. (Q)LFS is the main dataset including results from all questionnaire items and is available at the level of the individual with government office region being the lowest geographical variable for analysis. It is available as an annual dataset from 1975 (excluding 1976, 1978, 1980 and 1982) until 1991. From June-August 1992 it is available as a quarterly dataset and can be accessed here.
QLFS Special Licence Datasets
From the March-May 2005 quarter, a Special Licence version of the QLFS data is also available in addition to the version of the QLFS data made available under the standard End User Licence. This contains extra variables, including more detailed geographical information, and therefore is subject to more restrictive access conditions. Prospective users of the Special Licence edition will need to complete the Approved Researcher form and demonstrate to the data owners exactly why they need access to the extra variables, in order to get permission to use that version. Most users should order the standard version of the data. In order to help users choose the correct dataset, a Special Licence Access section has been added to the dataset title page.
QLFS Local Area Data
Available only between 1992 and 2006, the data provided aggregate estimates of the following subjects for Local Authority Districts and Training and Enterprise Council areas (not at the individual level):
• employment (by age groups)
• ILO unemployed (by age groups)
• economically active (by age groups)
• economically inactive (by age groups)
• industry Sectors (by Standard Industrial Classification codes)
• occupations (by Standard Occupational Classification codes)
• ethnic minorities
• full-time education
• job related training
Due to perceived confidentiality issues, this series was discontinued in 2006. Users interested in producing analysis for Local Authority Districts should consider using the Special Licence version of the quarterly datasets and derive their own estimates.
QLFS Household Datasets
The LFS household datasets are produced twice a year (April-June and October-December) from the corresponding quarter's individual-level LFS data. They include a number of new derived variables at household and family unit level only, to facilitate the analysis of the economic activity patterns of whole households. For 1992 to 1995, the household datasets include adjusted variables for household and family type, which are adjusted for several inconsistencies and discontinuities during this period.
It is recommended that the existing individual-level (Q)LFS datasets continue to be used for any analysis at individual level and that these household datasets be used for any analysis involving household or family level data. The lowest geographical variables for analysis are government office region and a county level indicator.
Two- and Five-quarters LFS Longitudinal Datasets
Any respondent to the LFS is interviewed at five consecutive quarters. Data on each individual has now been linked together to provide longitudinal information. This longitudinal data consists of two types of linked datasets, created using the weighting method to adjust for non-response bias. The two-quarter datasets link data from two consecutive waves, while the five-quarter datasets link across a whole year (for example summer 1999 to summer 2000 inclusive) and contain data from all five waves. A full series of the data has been produced going back to winter 1992. The longitudinal datasets include a subset of the most commonly used variables from the QLFS, covering the main areas of the survey.
From its origin in 1973, the UK Labour Force Surveys were conducted under an initiative of the European Union and were further expanded in order to provide comparable labour market data between member states. They remain regulated by European law; the latest EU LFS regulation was passed in 1998. EU regulations require only a core set of the variables available in the LFS to be present, as well as nomenclatures used (ISCO occupations / NACE industry codes). As a result, LFS Eurostat datasets only contain a subset of the variables accessible in the quarterly LFS. The list is available here (ADD HYPERLINK).
In addition, occasional ad-hoc modules are also included in the LFS as a result of Eurostat requirements, such as the 2003 Regular National Education System, the 2006 Transition from Work into Retirement, or the 2007 Accidents at Work and Work-Related Health Problems modules. More information on the background of the EU LFS and the variable list may be found here (ADD HYPERLINK).
What is the relationship between the APS, the IHS and the LFS?
The Annual Population Survey is made of several components, one of which is the waves 1 and 5 of the Labour Force Survey, together with the Welsh, Scottish Labour Force Surveys, and the English Local Labour Force Survey. Sample size is much bigger than the LFS, the APS uses only a subset of the LFS variables.
Together with several other governments surveys the Annual Population Survey is part of the Integrated Household Survey. As a result, the IHS includes a much broader set of variables than either the LFS or the APS.
Do I need to use weights to analyse the LFS?
Yes. The Quarterly LFS (QLFS) datasets have two weights (Pwt07 and Piwt07). Pwt07 is the weight for individual data - this compensates for non-response and grosses to population estimates. Piwt07 is the weight for income data - this weights so that that the weight of a sub-group corresponds to that sub-group's size in the population and also weights to give estimates of the number of people in certain groups. This is restricted to employees' earnings: other income data are not (yet) weighted.
The QLFS household datasets contain individual level data for households, but have been designed for household analyses. They have one weight to gross to population estimates. The weight is the same for all household members. The 2009 weighting variable is called phhwt09.
The QLFS longitudinal datasets (2-quarter and 5-quarter) contain one weight to compensate for attrition and to produce population estimates. The weighting variable is called LGWT
What shall I do with negative values in the weight variable for the longitudinal LFS dataset?
Some longitudinal LFS datasets have negative weights (LGWT). These are the results of the algorithm used to produce the weights. However, there appear to be a relatively small number of cases with negative weights and not all datasets are affected by it. The ONS is aware of this problem and will monitor all future files for negative weights. Meanwhile, users are advised to set all negative weights to 1.
What was the 2011 reweighting project for and which datasets does it affect?
During 2011, the Office for National Statistics (ONS) undertook a project to reweight LFS data to 2010 population estimates. It is planned that reweighted QLFS data will be released in due course, but it is not yet known when these data will be deposited at ESDS. Currently, LFS household datasets from late 2006 onwards remain reweighted to 2009 population figures, apart from April-June quarters from 1997-2006, and October-December quarters for 2004-2005, which remain weighted to 2007-2008 figures. Datasets prior to 1997/2004 will remain weighted to Census 2001 figures or earlier; users should consult the individual datasets for details.
Can I use the LFS to analyse change over time?
Yes. The LFS contains both panel and (short term) repeated cross sections elements. For more information see the Analysing change over time guide.
Do the longitudinal LFS datasets contain all the variables in the quarterly LFS datasets?
No. Because of the resources involved in production and the size of the resultant datasets, the longitudinal datasets include only a subset of the full LFS variable set. This subset has been agreed in consultation with users and represents the most important and commonly used variables covering the main areas of the survey.
How can I link the longitudinal, quarterly and household editions of the LFS?
Users who have access to the Special License quarterly datasets and who want to add in extra variables to the longitudinal data can do so by creating the PERSID variable that uniquely identifies each individual and simply merging the variables on. Instructions on how to create the persid variable using SPSS can be found in the User Guide for Two- and Five-Quarter LFS Longitudinal Datasets. If using STATA then it is important to ensure that PERSID is in double format when creating this variable (see syntax below):
gen double persid= (quota*10000000000)+(week*100000000)+ ///(w1yr*10000000)
What is the lowest geographical area for which LFS data are available?
The quarterly LFS (QLFS) datasets are available for Government Office regions (North East, North West, Merseyside, Yorkshire and Humberside, East Midlands, West Midlands, Eastern, London, South East, South West, Wales, Scotland and Northern Ireland), with additional identifiers for Inner and Outer London, the West Midlands Metropolitan County, Greater Manchester and Merseyside, South, West Yorkshire, and Strathclyde in Scotland.
Datasets containing lower geographical level are available under a Special License which requires accreditation by the UK Statistics Authority as an Approved Researcher. There are three such datasets based on the LFS that allow users to obtain finer geographical detail: The QLFS Special Licence access data distinguishes Unitary / Local Authorities in England and Wales. It also includes an urban/rural indicator. Alternatively, the Annual Population Survey (APS) combines results from a number of sources including the Labour Force Survey (LFS) (waves 1 and 5) contains data for unitary/local authorities in the UK. Its large sample size make it ideal for detailed geographical analysis
Finally between 1992 and 2006, the QLFS Local Area Data provided data aggregated by area for Counties/Scottish Councils, Local Authority Districts (LADs), Training and Enterprise Councils (TECs), Local Enterprise Companies (LECs), Government Offices (GOs) and Training, Enterprise and Education Directorate (TEED) regions.
Why is there a combined missing value category ‘-10’ in some of the household datasets?
All missing values in the data have been set to one '-10' “No answer/Does not apply” category instead of the previous separate '-8' “No answer” and '-9' “Does not apply” categories. The ONS has introduced a new imputation process for the LFS household datasets and it has thus been necessary to code the missing values into one new combined category ('-10'), to avoid over-complication. The '-10' categorisation is in line with the Annual Population Survey household series. The change also applies to LFS household data back to April-June 1997 (currently being updated at the UK Data Archive), to ensure continuity for analytical purposes. The change only applies to the household data, and there are no current plans to extend it to other LFS series.
What are the access restrictions to LFS data under the End User License due to privacy protection reasons?
Due to the perceived risk of identifying respondents to the LFS, codes for geography levels smaller than Government Office Regions are only available in the Special License version of the LFS. Additional restrictions have recently been put in place by the ONS on the End-User License datasets: it is no longer possible to link quarterly and longitudinal editions, by computing unique identifiers. Users interested in doing so need to apply for a Special Licence.
Are age (continuous) and SOC occupation (4 digits) available in the QLFS dataset?
Yes. The ONS has reversed its previous decision to remove AGE and SC2KMMJ from the End User License LFS datasets.
When will the transition to the new SOC 2010 classification of occupations take place in the LFS?
Fieldwork for the LFS with the new ONS' 2010 Standard Occupational Classification has begun in January 2011. The first dataset that will include the SOC 2010 variables is the January-March 2011 edition. It is due to be released in May 2011.
I cannot find what I am looking for in the documentation for pre 1990 editions of the LFS. What can I do about it?
Early editions of the LFS documentation did not offer the same level of detail as it does now. Variable names and definitions changed several times, and the description of pre 1985 classifications of occupations, education subjects, or industries is not always well documented. If you don't find the information you are looking for please contact the Help Desk.
What should I do about missing variables in Labour Force Survey Calendar Datasets?
The LFS moved from seasonal to calendar quarters in 2006 and the seasonal datasets have been removed and replaced by the calendar datasets. As a result a number of variables have been dropped from the datasets. A page which lists each calendar dataset with the corresponding seasonal datasets is now available, together with more information about the changes.
Unfortunately the LFS User Guides so far do not reflect the changes. As a result you will find that certain variables noted in the User Guides as being present in the datasets are not actually in the datasets.
What are the ID variables on the Quarterly Labour Force Survey datasets
a. Datasets available under End-User Licence (EUL)
The ID variables available under End-User Licence (EUL) have changed over time. The sections below outline these changes with the most recent first.
From January - March 2011 (Study Number 6782)
From Jan-March 2011 (6782), the QLFS data files contain the ID variables/numbers:
CASENOP - ‘Case Identifier - pseudoanonymised’
HSERIALP - ‘Number uniquely identifies a household - pseudoanonymised’
These variables are designed to improve the confidentiality of respondents. These ID numbers are designed to permit users to link household members together, but not to link across waves to create longitudinal datasets (users who wish to do this should use the LFS longitudinal files).
Summary of identifier variables in the QLFS files available under End-User Licence
The household identifier (hserialp) is an identifier which starts at 1 and simply counts up. The numbers are no longer based on administrative data and common values of this variable across waves no longer indicate that the case is from the same household. It is still possible to combine datasets in order to increase sample size as variables indicating which wave of the survey the respondent is in and when the respondent entered are still present. However, it is no longer possible to check that individual cases have not been duplicated as a result of failures in the merge process. Users are reminded that when undertaking household analysis, or analyses which involve household context, the most suitable data are the household LFS files.
A variable called QUOTAP is available which is the ‘stint number where interview took place – pseudoanonymised’.
All future QLFS datasets will contain these two new identifier variables.
October - December 2009 (Study number 6404) to Oct - Dec 2010 (6715)
Caseno was removed from the EUL datasets from October 2009. Between Oct-December 2009 and Oct-December 2010, CASENO was replaced by IDREF, which was designed to improve the confidentiality of respondents.
There is no household identifier in these datasets. However, July-September 2010 (6632), Jan-March 2010 (6457) and Oct-December 2009 (6404) contain the administrative variables (Quota, Week, Add etc) needed to generate individual and household identifiers. The datasets for Oct-December 2010 (6715) and April-June 2010 (6548) do not include quota, which means it is not possible to generate individual and household identifiers that can be used to identify members of the same household or link individuals across waves.
July - September 2009 (Study number 6334) and before
QLFS datasets under EUL from July-Sep 2010 and earlier contain the original ID variables (CASENO and the variables needed to generate persid)
||Casenop and hserialp
||Quota and other administrative variables
||Quotap and other administrative variables
||Can identify members of the same household?
||Can link across waves?
|Jan- March 2011 (Study Number 6782)
|October-December 2010 (6715)
|July-September 2010 (6632)
|April-June 2010 (6548)
|Jan-March 2010 (6457)
|October-December 2009 (6404)
|(Up until) July-September 2009 (6334)
b. Special Licence (SL)
All Special Licence QLFS datasets should still contain the original ID variables (caseno, add, quota etc) so they are not affected by the above changes.
Is there a more general FAQ?
There is a generic FAQ for all surveys available.