The first decision to be made is what data to use. There are two routes to sourcing microdata for analysis.
Unit 1 of this series has already described many of the data sources available. Using international datasets such as those available on the ESDS - these are datasets that are compiled for this purpose and are described in overview in unit 1. These types of datasets will usually contain data for many countries in a single data file or in separate datafiles but with a similar structure. They will have been through a process of preparation which means that they are standardised in format and the metadata is provided to guide the user. The extent to which this is done will vary so the usual rule is "user beware".
A second approach is using datasets from different countries that are either from the same series on a particular topic. Such surveys are not necessarily designed to support cross-national analysis nor presented as an international dataset - however, on some topics they may be just as good as those intended as international surveys or indeed may be the only data available on a particular topic. Examples include surveys such as Household Income and Expenditure Surveys (HIES) or Labour Force Surveys which are conducted in most countries or surveys such as the Demographic and Health Surveys (DHS).
In using this approach you will need to do quite a bit of additional work to get the data into a suitable format and make your own decision about whether they are sufficiently similar to use for the analysis intended. This of course applies also when combining sub-national surveys such as in federal systems where data may be available form separate state surveys but not compiled into a single data set. For example, in making comparisons between England, Scotland, Wales and Northern Ireland or in analysis of the different territories in Australia or provinces in Canada or other countries which have a federal structure such as India.
It is very easy to fall into simple traps by not checking that your assumptions hold true. For example, when localising a questionnaire some countries will change the order of the codes either to fit in with the norm for that country or simply so as to own the localised version. So in Country A it might be that 1= female and 2=male while in Country B the opposite applies. These subtle changes can mean you are completely wrong footed in your analysis. Either way, not spotting the difference can waste a lot of time.