Learning from Cross-National Research
Suppose we are interested in the impact of a particular policy or treatment ‘x' on a specific outcome ‘y'. As an example we might think of y as an active health index and x as a particular diagnostic treatment, say, screening for some symptom. In any national data set we observe x, y, and a set of covariates z. The covariates z include individual and local variables (for example, age, education, local unemployment).
Ignoring, for a moment, the observable covariates, without loss of generality we may write
(1) y(i,j,t) = b(i,j,t) × (i,j,t) + u(i,j,t) for individual i, in country j and time period t
where b(i,j,t) measures the response by individual i in country j at time t to the policy intervention x. If the effect of the policy given by b(i,j,t) varies across countries and time periods, there is little to be gained from cross-country, longitudinal, cross-cohort, or repeated cross-section analyses. Thus, one of the basic hypotheses underlying a call for cross-national longitudinal data collection or cross-national analysis of repeated cross-sections is the assumption that basic behavioral responses are stable across countries and time.
Making this assumption, we rewrite equation (1) as
(2) y(i,j,t) = b(i) × (i,j,t) + u(i,j,t)
where b(i) is the individual response coefficient to the policy or treatment x.
An extreme version of equation (2) assumes a common response effect across all individuals, i.e., a “homogeneous effects” model. An intermediate specification might allow the response parameters to vary according to observed covariates (the z variables defined above). In general, however, the “heterogeneous effects” model of equation (2) has become the standard reference model for evaluating policy interventions.
PARAMETERS OF INTEREST
To fully understand the impact of a policy intervention or treatment ‘x' on the outcome measure ‘y', the best-case situation would be to know the full distribution of the response parameters b(i). For example, although the mean or median response may be positive, the lower quartile of the response distribution could still show a negative impact. However, we do not see the same individual with and without the treatment at the same time and in the same country. Typically, therefore, we must settle for the average effect.
A properly designed experiment measures the expected impact of the treatment on individuals drawn at random from the population. Again, this can usually be broken down into the average response for subgroups according to observed covariates ‘z'.
For nonexperimental data, a popular alternative parameter of interest is the average impact of the intervention on those who are included in the program, that is, the average treatment effect on the treated. Suppose we divide a particular group according to the observed variables z; for example, we might choose women who are between 50 and 60 years of age who live in a high-unemployment area. Among these women, let some subsample be subject to the treatment, and the average response for this subsample is the impact of the treatment on the treated.
When the treated and comparison groups are chosen randomly as in an experiment, the average treatment of the treated measures the average treatment effect. But when the treatment group occurs by self-selection or by some other nonrandom mechanism, we are simply measuring the average treatment effect among the treated. This is a much less interesting parameter but one that is used regularly in the ex-post evaluation of policy interventions.
One simple measure of the average response parameter is to take the difference in the outcomes between the treated group and the comparison
group. Suppose x(i,j,t) = 1 for those who are treated and = 0 for the comparison group. Also, suppose y(1) and y(0) represent the average outcome measures for each of these groups, respectively. Then we have
(3) y(1) − y(0) = b(1) + u(1) − u(0)
Provided the bias term [u(1) − u(0)] is zero in the subpopulation, equation (3) consistently estimates the average treatment effect on the treated b(1) for this subpopulation. But how do we guarantee that this bias term is zero?
(A): If the comparison group (x(i,j,t) = 0) is chosen by randomized control, then for large enough samples, the bias term u(1) − u(0) is identically zero by design.
(B): If the bias term [u(1) − u(0)] is constant before and after the treatment, then comparing the difference in the outcome variable before the reform [y'(1) − y'(0)] with the difference after the reform [y*(1) − y*(0)] again consistently estimates b(1).
(4) [y*(1) − y*(0)] − [y'(1) − y'(0)] = b(1)
Unfortunately, the conditions for equation (4) are difficult to satisfy in a nonexperimental setting. Three conditions are required:
There is a sizable comparison group with similar observable characteristics.
The comparison group is completely unaffected by the reform.
The treatment group and the comparison group are subject to the exact same trends over time.
Suppose all that is available are national samples. Where treatments are global for a particular subpopulation, such as the introduction of national screening (or a universal pension provision), the first condition fails immediately; no suitable comparison group exists. When there are spillover effects on the rest of the community, the second condition fails. Finally, the comparison group can be chosen within a country but the two groups are sufficiently different that they have systematically different health experiences over time, the final condition fails.
Cross-national comparisons can help in all three of the above cases. Interventions or policies—for example, a universal health insurance
scheme—often occur in one country and not another. So even if they are global within a country, there is variation across countries. The before and after contrasts can then be drawn across countries. Alternatively different countries may introduce similar interventions or treatments but with different timings, so that the contrast in equation (4) can still be made. Second, spillover effects are typically limited to within national boundaries, so that the contrast across countries is still valid. Finally, similar comparison groups can be chosen across countries, e.g., high-income and well-educated individuals who are likely to experience the same overall trends. Note also that even where within-country variation is informative, cross-country comparisons can add substantially to the informative variability in the data, and therefore considerably improve the precision of estimates of the impact of such interventions.
The hypothesized stability of responses that allows us to move from the general but vacuous equation (1) to the stable form of equation (2) assumes that the measurements of the variables y(i,j,t) and x(i,j,t) are comparable across time and space. Again restricting attention to linear relations, the general relationship between measurements in two countries, j and j', and two time periods, t and t', may be written as
(3) x(i,j',t') = d(j,t) + c(j,t) × (i,j,t)
Substituting equation (3) into equation (1), we obtain
(4) y(i) = b(i)(d(j,t) + c(j,t)) × (i) + u(i) = d(j,t) + b(i)c(j,t)×(i) + u(i)
Ideally, if we could measure x(i,j,t) comparably across countries and time, we could assume that d = 0 and c = 1 for all j and t and thus (potentially) test the hypothesis of behavioral stability using approaches described in the preceding section (e.g., test the hypothesis that b(i,j) = b(i,j')). If we cannot make this assumption, it is easy to see that the estimated effect of x on y in country j' may differ from its estimated effect in country j even if behavior is the same in both countries, simply because the measurement of x differs (b(i) is not equal to b(i)c(j)). Failure to have comparable measures of important variables can severely reduce the possibility of exploiting cross-national variations in policies and other variables to enhance scientific knowledge of behavioral responses.
In some cases, these measurement problems are trivially easy to correct. For instance, temperature measured in Fahrenheit in the United States can be converted to centigrade to conform to European measure
ments. In other cases, the theoretical idea is well understood but not trivial to implement. An example is the conversion of monetary measures into a common value. Here, observed foreign exchange rates may convert francs into dollars, but this conversion may not conform to a purchasing power parity rate that could be used to equate the true purchasing power of given incomes in France and the United States. For many variables used in studies of health, psychology, and economics, methods for obtaining common measurements are not well understood, in part because they have received inadequate systematic attention from the scientific community. Progress is currently being made on a number of fronts. For example, there is a continuing large-scale, cross-national effort to create instruments that can produce valid measures of depression that are comparable across countries, cultures, and language groups. To continue making progress along these lines, the active collaboration of scientists from different disciplines and countries is imperative.