PART II
PAPERS



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 53
PART II PAPERS

OCR for page 53
This page in the original is blank.

OCR for page 53
Demographic Analysis of Community, Cohort, and Panel Data from Low-Income Countries: Methodological Issues Andrew Foster Department of Economics and Community Health Brown University In any examination of the methodological issues related to the use of longitudinal data, it is helpful to consider three distinct uses for these data: measurement, evaluation, and structural analysis. As discussed in greater detail later in this paper, measurement focuses on the description of patterns of demographic change; evaluation is concerned primarily with measuring the consequences of policies or programs; and structural analysis involves testing and measuring underlying mechanisms or structures. This grouping is somewhat arbitrary, and, arguably, most demographic analyses involve more than one of these purposes, but the categorization is useful, because it helps to clarify the extent to which different forms of demographic data are linked to their eventual use. In particular, dissimilarities in the appropriateness of different data collection strategies are highest in areas of measurement, more moderate for evaluation, and somewhat limited in the context of structural analysis. This approach also helps to highlight the essential role played by longitudinal data collection in general as a basis for undertaking structural analysis. Specifically, it is argued that longitudinal data are critical for the modeling and analysis of the temporal antecedents of current behaviors and that clarifying the nature of these temporal antecedents provides important leverage for disentangling the mechanisms underlying demographic behavior. This paper examines three categories of longitudinal data collection: panel, cohort, and community studies. These forms of data collection are

OCR for page 53
distinguished as a group because they follow respondents over time. They are distinguished from each other by the criterion used to include individuals in the sample. Panel studies generally use a representative sample of households from a broad region or nation at a particular point in time. Household members are then tracked and reinterviewed over time, regardless, given logistical considerations, of whether they have moved or otherwise changed their living arrangements. Such studies tend to be large in size and general-purpose in nature. A cohort study follows a similar procedure in tracking individuals but is directed toward a specific group of individuals such as those born or married at a particular point in time, although supplementary information on other people residing with the primary respondent also may be collected. These studies tend to focus initially on relatively specific research questions, but they sometimes broaden over time to become multipurpose in nature. A community study, by contrast, interviews individuals or households in a particular location and tracks entries and exits into this population, but it undertakes limited, if any, follow-up of participants leaving the region. Like cohort studies, community surveys usually are designed with a fairly narrow purpose in mind, such as the design and testing of a particular set of interventions, but they can later evolve into or provide a basis for more general-purpose data collection activities. MEASUREMENT The role of longitudinal data in measurement, as noted, is one of describing patterns of demographic change. Measurement, as the term is used in this paper, includes the computation of vital rates and other aspects of individual and household welfare, the distribution of these measures both within and across populations, and, most critically for a discussion of longitudinal data, changes in these measures over time. Although measurement in this sense may be considered a kind of precursor or motivation for more detailed evaluative or structural analysis, it has important purposes in its own right in characterizing and comparing different societies and for purposes of planning. Measurement is not often listed as one of the key benefits of longitudinal data collection, but for certain purposes longitudinal data are critical, such as to track mobility. For example, a recent study examined the distri-

OCR for page 53
bution of welfare changes in India over the last 30 years using panel data from rural India (Foster and Rosenzweig, 2001a). A series of census and other surveys such as the National Sample Survey for India provide the opportunity to measure changes in the distribution of income over time. From these surveys, investigators may, for example, establish whether rural poverty rates in India are increasing or decreasing over time and whether poverty rates are declining differentially in areas with higher agricultural productivity growth. For attributes such as sex, cohort, or, to a lesser extent, schooling that are fixed over time, one also can examine whether members of the groups as a whole did relatively well or poorly over a given interval. Such data, however, can say little about the relative gains or losses of groups in society that are more fluid over time. Longitudinal data seem critical in evaluating whether income in poor households, for example, has declined relative to income in better-off households. Retrospective data can be useful for examining mobility with respect to relatively salient and well-defined events such as place of residence. In some cases, retrospective data may be the only or the most cost-effective way to construct such measures. But retrospective data seem particularly problematic for the evaluation of differential income mobility, because it would be a significant challenge to obtain accurate retrospective reports of income and expenditures. Moreover, because of attrition and household division, it would be difficult to use retrospective data to establish appropriate weights for the purpose of obtaining population estimates of previous income or income growth over a previous interval. For example, the average 1970 household income of individuals in a representative sample of individuals in 1980 will tend to exceed the actual average 1970 household income to the extent that large households are more likely than small households to divide. With retrospective data on household division rates, one could in principle correct these estimates, but there would be no way of correcting for households whose members had all migrated or died in the intervening years. The ability to capture changes over time in individual, household, or community attributes does not appear to be an important distinguishing characteristic of the three types of data collection considered here. But a key requirement in the use of longitudinal data for measurement is a clear delimitation of the population being measured, and it is evident that these three forms of longitudinal data collection capture mobility in very different sets of populations. Not only is this true in the sense that different groups of people are being followed over time using the three approaches,

OCR for page 53
but also important differences arise in the extent to which data collected from the relevant sample may be used to draw appropriate inferences about larger populations. The issue of representativeness is dealt with most cleanly in the context of national panel surveys, though the discussion here can be easily modified to refer to large-scale surveys at less than a national level (such as a rural panel). As long as sufficient care is taken in drawing an appropriately representative sample in the first place, appropriately weighted rates of mobility, for example, starting from the initial period for a panel, are representative of population mobility. Even in this case, however, important caveats must be addressed. The first is well understood: the quality of measurement of transitions over time will depend on the investigator’s ability to follow individuals over time. Fortunately, recent experience from several panel surveys has provided substantial insight into the causes and consequences of attrition in panel surveys. The experience of the Indonesian Family Life Survey (IFLS) is particularly well documented (Thomas et al., 2002). This survey provides ample evidence that attrition is nonrandom for key observables in the baseline survey and that departure from the sample region, a key cause of attrition, is nonrandom for subsequent outcomes. In particular, as might be anticipated, the people who are lost to migration tend to have extreme outcomes—that is, substantially better or worse outcomes than for those who stay. The costs of following study participants who have left their hometown but not the country is high but not prohibitive, amounting to about a 20 percent increase in cost per interview in the context of the IFLS. The second issue, which is more problematic, lies in the need to augment a panel survey so that it remains representative over time. For example, in a panel survey conducted in 1980, 1990, and 2000, population-level estimates of mobility between 1980 and 1990 and between 1980 and 2000 may be constructed using the 1980 sample weights. However, unless the 1990 panel is itself representative of the relevant population, inferences from this panel about mobility between 1990 and 2000 may be inappropriate. For some purposes, it may be possible to fix this problem by appropriately adjusting the sample weights, but for units of analysis such as the village in settings in which in-migration is common, it is necessary to include a sample of in-migrant households whose 1990-2000 mobility would not otherwise be captured. Even when an appropriate weighting system is available, the fact that the weights applied to particular households are likely to vary increasingly over time implies that population estimates tend to

OCR for page 53
become increasingly variable and particularly sensitive to the presence of outliers. Third, analysis of panel data is especially complex when considering changes over time in measures that are most naturally aggregated to the level of the household. For example, how does one measure income change in subsistence economies or other settings in which individual wage or salary data are unlikely to be available? If a landed agricultural household divides, the corresponding subhouseholds that started as a single unit with a single income measure may end up with very different levels of income in a subsequent round. Under these circumstances, estimates of income mobility at the individual, household, or dynastic (i.e., by adding together the incomes of the split-off households) levels may be quite different. Finally, for some purposes it may be useful to construct a panel based on physical rather than social households. Such an approach would have some procedural advantages in that points in space are by definition immobile and therefore problems of attrition do not typically arise (although changes in topography such as the movement of the river may make a particular location inherently uninteresting for most purposes). This approach has been recommended by the World Bank as a desirable sampling scheme in developing countries (Glewwe and Grosh, 2000). However, whether this is indeed sensible depends on the questions to be answered. It seems a reasonable approach for asking questions about transitions that are spatially defined—such as whether relatively poor or isolated villages grow more or less rapidly on average over a particular interval. However, if an investigator is fundamentally interested in measuring the transitions faced by individuals, which seems reasonable in the context of most demographic analysis, then sampling social households and tracking individuals seems critical. Many of these panel data issues apply in some degree to measurement in the context of community datasets and cohort studies. However, there are key differences between panel data and these other forms of longitudinal data collection in the extent to which they are generalizable to large populations. A cohort study can, of course, be representative of a group of people born at a particular point in time (or married at a particular point in time if that is the basis on which the cohort is drawn), but its findings cannot be easily generalized to the experience of other cohorts. Depending on the relative importance of changes that are common to cohorts (such as the process of aging) versus changes that are time-dependent and thus affect cohorts at different ages, the experience of one cohort may or may not

OCR for page 53
be similar to those of other neighboring cohorts. Other, more subtle aspects of generalizability exist as well. To return to income mobility, average income growth in households of a particular cohort need not be representative of income mobility in the population as a whole at a particular point in time, because households with one member born in a particular year do not constitute a randomly selected sample of households. Similarly, a community-level survey will be representative of the experience of a particular community at a particular point in time and can be used to establish changes in that community over time. But the experience of one community is not necessarily representative of other communities, and anyone using data from a single community has no obvious way of obtaining statistical measures of the extent of cross-community variability and thus of how different a particular community is likely to be from some average community in the relevant region. Compared with panel and cohort surveys, community surveys also have particular disadvantages in their treatment of out-migrants. Any measure of income mobility for individuals or communities, for example, will be problematic, because a community survey does not measure the experience of individuals and households who leave the community. In practice, one would expect the mobility experience of these individuals to be quite different from those who stay. Measurement on a limited geographic scale also may have disadvantages from the perspective of measurement on a repeated cross-sectional basis given mobility. The idea is that even if one does not follow out-migrants from a national panel survey, a reasonable sample of both the sending and receiving areas is achieved. Thus, as long as the panel is augmented with a new sample to be representative at, say, the village level, it will capture on average the experience of out-migrants from the original sample areas. The panel component for these individuals is lost (because the in-migrants to sample villages will not be linked with out-migrants from other sample villages), but this is not a problem from the perspective of measurement on a repeated cross-sectional basis. The more limited the geographic area being considered, the more likely one is to lose even this type of information on mobile members of society. But nonrepresentativeness need not imply that community surveys are unimportant for purposes of measurement. The frequency with which data from the Matlab study area in rural Bangladesh are used to characterize and validate overall changes in demographic rates in Bangladesh (see, for example, Cleland et al., 1994) suggests that such surveys play an important role in measurement. One of several reasons for this role is that survey costs

OCR for page 53
in a contiguous geographic area are clearly lower than those in a nationally representative sample. Second, a study’s investment in fixed resources within a particular community and long-run employment for workers may be more feasible in a community survey, possibly increasing the quality of workers who can be retained as well as the incentives to maintain quality. Long-term employment of survey workers also may increase data quality by increasing trust between survey workers and respondents. Third, the presence of an accurate census and persistent infrastructural support in a particular region at a particular point in time provides a useful basis for complementary analyses such as the development of specific-purpose longitudinal studies or qualitative analysis. Clearly, then, community surveys have had and will continue to have a critical role to play in measurement in developing countries without adequate censuses and vital registration systems. EVALUATION The second category of longitudinal data analysis—evaluation—provides the most commonly cited rationale for the collection of such data. At the heart of this rationale is the practical need to obtain, for policy design, estimates of program effects, coupled with the recognition that policies and programs are, in the absence of carefully designed intervention studies, likely to be systematically placed. This likelihood is known in the economics literature as the problem of endogenous program placement and incorporates deliberate attempts to target particular programs to particular areas (such as placing family planning clinics in high-fertility areas—see Gertler and Molyneaux, 1994), the tendency of people to live in places providing services they are likely to use (Rosenzweig and Wolpin, 1988), and correlations between outcome measures and programmatic variables that arise indirectly from underlying community attributes such as accessibility by road. Central to this approach is the notion that measurement of the effect is more important than understanding the underlying mechanisms responsible for the effect. A short methodological digression on the underlying approach provides a useful foundation for a discussion later in this paper of the relative merits of different sorts of data. The underlying principle of this work is that some basic outcome of interest that is measured at two points in time, yit, is a roughly linear function of a series of attributes xit, the presence or absence of a program pit, and a residual capturing fixed µi and time variant εit unobservables influencing the outcome in question:

OCR for page 53
(1) yit = ßxxit + ßppit + µi + εit The basic statistical problem is that the correlation between the program variable pit and the unobservables implies that standard linear methods will yield a biased measure of the effect of the program. In the special case in which the allocation of the program is correlated with the underlying fixed effects, in which the program is introduced or modified over time, and in which the linearity assumption is plausible, program effects may be measured through a differencing procedure or by including dummy variables at the level of aggregation of the community. Alternatively, if there is reason to suspect that changes in the program between periods t and t + 1 are correlated with the initial time-varying component εit, it is possible to combine differencing with an instrumental variables procedure, using as instruments the initial values of the xit variables. A close look at the assumptions that must be made to apply this approach suggests that the principle of using over-time variation to evaluate program effects does not intrinsically require access to individual-level longitudinal data. To the extent that the program being evaluated is available to all members of a community, little may be gained from being able to follow study participants over time. A comparison of the fertility behavior of a given woman before and after the introduction of a family planning clinic is likely to be less informative about the effects of the family planning program than a comparison between women of similar ages at two points in time. Indeed, what appears to be the earliest application of this approach to the evaluation of demographic programs considered the effects of family planning expenditures on fertility at the level of the district in Taiwan (Schultz, 1973). Other work in this area has made effective use of repeated cross-sectional data aggregated to the level of the community (Pitt et al., 1993). Because the underlying methodology effectively incorporates the assumption of a single, well-defined coefficient reflecting program impact, the issue of representativeness that plays a critical role in a comparison of different forms of longitudinal data collection for the purpose of measurement does not play a decisive role here. If program effects are importantly heterogeneous, then different types of longitudinal data will yield different measures of the effect. However, admitting that the effects may be importantly heterogeneous and that one cares about the magnitude of the effect rather than just, for example, its sign, raises a host of new difficulties. In particular, the process of differencing or the application of instrumental

OCR for page 53
variables will yield a coefficient from a particular dataset that is not representative of the average effect of the program on the corresponding population. For example, the process of differencing over time to measure the effects of a family planning clinic effectively removes from the dataset all villages with family planning clinics in both the initial and final periods, and thus a program effect obtained from differencing village data yields at best an estimate of the program effect in those villages that experienced a change, not the average effect on all villages in the relevant population. Although representativeness does not therefore helpfully distinguish the three forms of longitudinal data collection, significant differences can be found in the relative suitability of these approaches to the assumptions and data requirements of the proposed methods. For a panel survey, to the extent that the evaluation is of a community-level program, panel data may not be ideal for evaluating program effects; one needs instead a representative sample of the population at two points in time. Thus if there is sufficient migration or change in household composition, the panel data will not yield representative estimates unless investigators deliberately try to resample and apply the appropriate weights. Cohort data present similar problems for the evaluation of community-level programs. But additional issues arise if program effects are likely to be significantly age-related. As noted, changes in fertility in a given cohort over time are likely to have much more to do with changes in age and the stock of children than with the introduction of a family planning program. At the very least, the linearity assumption embodied in equation (1) is likely to be violated in cohort data if a differencing approach is used to measure the impact of family planning programs on fertility. Yet cohort studies may be particularly well suited to stratified sampling designs in which treated individuals (i.e., those making use of the program) are relatively rare within a population. By ensuring that treated individuals are oversampled within a particular cohort, an investigator can maximize statistical power for a given sample size. From this perspective, community-based longitudinal data collection seems ideal for purposes of evaluation. By deliberately tracking entrants, exits, and relevant behaviors in a particular community, researchers can address problems of shifting population and compare appropriate groups at different points in time. They are also able, in the context of a community-based survey, to deliberately design interventions, such as the Matlab family planning program in rural Bangladesh (see Menken and Phillips, 1990) and the nutrition supplement experiments in rural Guatemala (see Pollitt

OCR for page 53
et al., 1995). Like for evaluation, a focus on a particular region enhances the opportunity to introduce complementary studies such as longitudinal surveys or qualitative work that may inform the evaluative process. Set against these positive attributes is the relatively limited geographic coverage generally provided by community-level surveys. The spatially correlated variables and logistical considerations that place program villages in close proximity to each other could produce random shocks that yield misleading estimates of program effects. An overlap in treatments within a study area also could make it difficult to isolate the partial effects of any particular treatment. Finally, the absence of detailed follow-up on those leaving a village may make it difficult to measure the full effects of the program, particularly if the program, such as one for educational interventions, affects the opportunities for migration among village residents. The broader geographic scale in panel and cohort studies and the emphasis on migrant follow-up substantially diminish the extent of these problems. The broader scale of panel and cohort studies also permits an assessment of whether there is regional heterogeneity in the treatment effects of various programs and, if so, helps to identifiable possible sources of this heterogeneity. The discussion in this section has addressed primarily the evaluation of programs at the community level. In some situations, however, the primary concern is to identify the effects of programs on particular individuals. For example, while it may be interesting to know whether the introduction of village-level, small-scale group credit affects fertility or child health in a village, it may be especially interesting to know whether such effects are particularly prevalent among those actually participating in the program. Critical in these contexts is the ability to track individuals over time, and thus some of the limitations discussed above of panel and cohort longitudinal data are diminished. However, in any examination of individual effects there are important issues of selective participation in programs that cannot be approached purely as issues of evaluation. STRUCTURAL ANALYSIS Structural analysis as applied to empirical research means a variety of different things to scholars in different disciplines and even to researchers within the same disciplines. For some economists, the term applies only to detailed models incorporating maximizing behavior, which are then fit to data to yield fundamental parameters that capture the underlying prefer-

OCR for page 53
ences, constraints, and information of the relevant actors. For other social scientists, structural analysis refers to a detailed characterization of the underlying data-generating process or the extent to which covariates interact at various levels of aggregation. For the purpose of this paper, however, structural analysis is distinguished from measurement and evaluation in terms of its primary focus on trying to uncover the mechanisms underlying observed outcomes. As such, this definition would include both the above uses of the term as well as an intermediate approach of specifying a behavioral model and drawing out implications of this model for specification and testing, but then relying to the extent possible on estimation of simple linear models. Organizing a discussion of structural analysis around the issue of different forms of longitudinal data collection is substantially more difficult than doing so for measurement or evaluation. Not only is there more variation in the types of techniques used to extract structural information from longitudinal data than for the other objectives, but also less can be said in general about the suitability of particular types of data to specific methodologies. In short, differences in the suitability of any given form of longitudinal data collection for undertaking structural analysis are greater than the differences in the average suitability of these forms. As a result, the rest of this section focuses on the usefulness of longitudinal data in general and less specifically on the relative merits of the different forms of longitudinal data collection. A debate is currently under way in the field of empirical economics about the merits of structural analysis as defined here. One view claims that longitudinal data, at least when coupled with sufficient naturally or artificially introduced experimentation, substantially limits the need for structural analysis.1 In an abstract sense, this may be true—by manipulating the environment of particular individuals and communities and then waiting long enough, one could in principle uncover anything one wished to know about the merits of alternative policies and even fundamental aspects of human behavior. In a practical sense, however, this sentiment is clearly wrong. Given the constraints of time and money as well as the ethical limitations on the treatment of human subjects, structural analysis and longitudinal data are very much complementary. To obtain meaningful and gener- 1   Rosenzweig and Wolpin (2000), in a recent review article for the Journal of Economic Literature, characterized and critiqued this perspective.

OCR for page 53
alizable insight into demographic responses to economic and social conditions, a researcher would have to disentangle alternative potential mechanisms, and this process of disentangling is greatly aided by the presence of longitudinal data. Nonrandom selection provides a first case in which longitudinal data greatly simplifies structural analysis. Examples of selection include the possibility that participants in a program are not randomly selected from all relevant individuals and the notion that those women giving birth in a particular year do not necessarily constitute a random sample of women. A series of both parametric and nonparametric selection techniques are available for cross-sectional data analysis, but these techniques are dependent on functional form, the presence of variables that influence selection but not outcomes net of selection, the extent of conditional independence, or the availability of discontinuities in program access. Addressing selectivity problems tends to be much more straightforward in the context of longitudinal data. An example is the question of whether the Matlab family planning program had an impact on child health in the period immediately after the introduction of that program in 1978. The dramatic effects of the program on fertility are well known, but it is less well known that the measured effects of the treatment program on mortality were relatively small until 1982 when an intensive maternal and child health program was increasingly integrated into the treatment area (Menken and Phillips, 1990). But does this lack of a substantial differential in mortality during this period reflect the fact that the intensive family planning services and low-level maternal and child health services had little impact on mortality risk for children? Or does it indicate that there was a change in the composition of births because of the introduction of the treatment program, which offset a pronounced reduction in mortality? Results from Foster (1994) that incorporate maternal fixed effects suggest that the latter interpretation is correct: the estimates indicate that the program led to an approximately 20 percent drop in mortality for the children of a given woman. Because mothers with relatively low risks of child loss were among the first to adopt the family planning program, high-risk mothers were differentially represented among the children born after the introduction of the program, thereby masking the favorable effects of the program on mortality risk when viewed from an aggregate perspective. Through the use of longitudinal data the complex problem of fertility selection can be addressed simply with a minimal imposition of structure.

OCR for page 53
Nonrandom selection also comes into play in the context of attrition, which was discussed in some detail in the section on measurement. Even if complete follow-up of participants in a panel, cohort, or community study is impossible, it may be possible to at least partly assess the implications of attrition through use of the appropriate longitudinal data. By modeling the process of attrition and determining which baseline features predict the probability of participants leaving the sample, some insight may be gained into the possible biases introduced by nonrandom attrition. In repeated cross-sectional data, for example, the investigator generally does not know which individuals have left a given area, thus limiting the scope for analysis of any selectivity introduced by differential departure from a given region of particular types of individuals. A second area in which longitudinal data can aid in the identification of underlying mechanisms is in the context of intrahousehold allocation. Researchers interested in testing whether bargaining plays an important role in household decision making have used unearned income that is assignable to spouses as a source of identification (e.g., Thomas, 1990). The premise of this approach is that under the unitary or common-preference model a reallocation of the source of financial resources should not change household allocations net of total income. But nonearned income may be a consequence of unequal household allocation: men who have more control over household decision making may have more control over assets acquired by the household over time and thus have higher nonearned income. Even premarital assets may be related to unobserved (to the researcher) attributes of the partners that also influence household allocations through the process of marital sorting. Thus a correlation between the distribution of unearned income and household allocations net of total income need not imply that the unitary household model must be discarded in favor of a more complex alternative incorporating bargaining. Analysis of this question of whether control over resources affects household allocations is more palatable, however, if individually allocatable consumption or nutritional data are available over time along with measures of unanticipated shocks to income. Duflo and Udry (2001), for example, use rainfall shocks that differentially affect crops cultivated by men and women to test alternative models of intrahousehold allocation utilizing community-based longitudinal data from rural Côte d’Ivoire. The key insight here is that by following households over time it is possible to control for fixed unobservables governing patterns of behavior within a particular household and thus to mimic relatively closely the effect of exogenously

OCR for page 53
redistributing income within the household from one member to the other with total income held fixed. Longitudinal data also play a critical role in terms of evaluating interhousehold allocations. In the cross section, for example, it is difficult to distinguish financial transfers between households that are, in effect, fixed regular remittances from those that play an important role in insuring against risk. An examination of how transfers change over time produces direct evidence on this point, however. Longitudinal data also can be used to better understand why insurance-based transfers are likely to be imperfect. Recent game-theoretic models of transfer behavior have suggested that, given the difficulty of writing formal enforceable contracts governing transfer behavior, one should expect transfers to exhibit credit-like aspects in the sense that transfers between two households would be negatively autocorrelated across time. Longitudinal transfer data from panel datasets in South Asia tend to support this conclusion (Foster and Rosenzweig, 2001b). A fourth area in which structural analysis is aided by the presence of longitudinal data is where there may be important lags between the timing at which a particular program or source of economic change is introduced and the time at which this effect is actually realized. Foster and Rosenzweig (2000), for example, use data on agricultural productivity growth from a national longitudinal panel to examine the extent to which male-female mortality differentials are sensitive to the relative returns to male and female human capital. The authors argue that much of the existing literature, which downplays the potential for economic change to influence sex differences in mortality, is misleading because it focuses on income effects (which may be contemporaneous to any consequent changes in morality differentials) rather than incentive effects, which take years to accrue—for example, the benefits of investing in sons and daughters may not be realized until they marry and set up separate households. Anyone evaluating these incentive effects needs data spanning a considerable period as well as a methodological approach that captures forward-looking behavior on the part of parents. A fifth area in which longitudinal data is critical for structural analysis is in the evaluation of social or community effects on individual behavior. Clearly articulated by Manski (1993), these reflexive problems raise serious identification issues because of the difficulty of distinguishing between shared random shocks and social influence. By paying careful attention to the nature of likely effects and the processes underlying them, however, a

OCR for page 53
researcher can make some progress in this regard using longitudinal data. Foster and Rosenzweig (1995), for example, modeled social influence about the adoption of new agricultural technologies in green revolution India as a kind of learning-by-doing effect in which previous experience by oneself and one’s neighbors increased the profitability and thus the adoption of these new technologies. Critical to the identification of this model is the idea that, net of individual-specific fixed effects, the recent history of price and weather shocks affects the current profitability of new technologies only through its effect on current experience with the new technologies. This source of identification would have been useless in the absence of longitudinal data on planting and profitability. A key requirement of any analysis of social influence is, of course, tracking the social context of people over time, something that may differ in the different forms of longitudinal studies. Panel data capture a sample of the relevant social network in a particular village. But information on such social networks and community effects in general will necessarily be more limited for those leaving the study area. Cohort studies also provide a sample of the relevant social network within a study’s cohorts but provide very limited information on social contacts across cohort lines. Whether this loss is important will depend critically on the nature of the social influence being studied. With their relatively comprehensive coverage, community studies are particularly well suited to comprehensive analysis of social networks. Not only can investigators track the behaviors of most of the community members likely to influence a given person (instead of just a sample as in panel and cohort studies), but they also can ask people to identify other people in the community with whom they have had contact and link data on these nominated community members back to the individual in question. The disadvantage of community studies here, however, is, as discussed in the context of evaluation and measurement, the limited geographic coverage—it may yield relatively little independent variation in social effects across the study population. SYNTHESIS Given the variety of ways in which longitudinal data are used, little can be said in general from a methodological perspective about how longitudinal data should be collected. Nonetheless, some common threads seem to run through the cases just cited. A particularly prominent thread is that longitudinal data have a particular role to play when trying to disentangle

OCR for page 53
the relationship between a series of interdependent choices. In selection, this interdependence arises, for example, between giving birth and using maternal and child health services; in transfers, it involves choices made by both sending and receiving households; and in social networks, the behaviors of different individuals are importantly related. The fundamental difficulty with analyzing interrelated choices is that it is difficult to imagine, at least in the cross section, an exercise in which one manipulates one such choice without directly affecting the other. The advantage of longitudinal data is that, given the appropriate history dependence, it may be in fact possible to simulate the desired experiment. A simple example from economics involving estimation of a conditional demand function may be useful.2 In a consumer choice problem with three or more goods, it is sometimes desirable to assess how the consumption of one of the goods, say good 2, affects the consumption of one of the other goods, say good 1. More specifically, let us think of good 2 as the health of a child, good 1 as the schooling of a child, and good 3 as other consumption, and let us ask whether increases in child health are likely to result in better school attendance for a given income and prices. From an economic perspective, this effect, if present, is complex—it reflects not only the extent to which better health enhances performance in school but also how parents respond to changes in health by putting fewer (or more) resources into their child’s education. In a simple one-period world, this appears to be an intractable problem: given that health and schooling are chosen simultaneously, any household or community variables that influence health also will influence schooling. To implement, using cross-sectional data, the effect of providing a given child with better health and then observing whether education is influenced, one needs some household or community variables that affect health but does not affect schooling directly. An obvious choice might be the cost of health care, but this is not without difficulty—a change in the cost of health care would indeed affect health, but it also would have a direct effect on schooling net of health by influencing the resources available to the household after health expenses are incurred. The introduction of multiple periods into the analysis can help solve this problem. If, for example, period 2 health is determined by choices 2   The analytic details for the model are presented in the anex to this paper.

OCR for page 53
made in period 1, then variation in the period 1 price of health care will influence period 2 health without directly affecting resource availability in period 2 (at least in the absence of savings which, in any case, can be measured directly) and thus will affect schooling only through its effect on health. A change in the period 1 price can thus be used to examine the effect of changing the period 2 health of a child without directly influencing schooling. Thus the introduction of history dependence, coupled with the appropriate longitudinal data, solves what seems to be an inherently intractable problem in the cross section. Longitudinal data therefore allows antecedents of current outcomes to be used to simulate variation in these outcomes and thereby to disentangle the relationships between interdependent choices. CONCLUSION Longitudinal data have the potential to substantially increase understanding of human behavior and the impacts of programs and policies. The extent to which this potential is realized varies greatly according to the ways in which the data are collected, the purpose of the analysis, the methodologies employed, the substantive issues being considered, the statistical and survey capacity of the area being examined, and the availability of other data sources in that area. It is not clear that it is desirable to focus data collection in any one particular way. But there does seem to be a need to consider carefully how history dependence is captured in longitudinal data and how this dependence can best be exploited to better understand the mechanisms underlying important demographic outcomes. ANNEX Consider a utility function u(ct1, ct2, ct3) and a budget constraint pt1ct1 + pt2ct2+ pt3ct3= yt, where cti denotes consumption of good i at time t, pti denotes prices, and yt denotes income. If the consumer can freely choose levels of consumption of each of the three goods, the choices are interdependent and the relevant demand functions are of the form (2) ct1 = Ci(pt1, pt2, pt3, yt) Asking the question of how an increase in good 2 affects good 1, however, in effect imposes an additional artificial constraint on the consumer

OCR for page 53
by setting the value of ct2 to some level ct2* and then considering the effect of an increase in ct2* on the consumption of good 1. Solving this new problem as one of constrained maximization problem yields a function for ct1 that depends on ct2* and on all prices and income so that (3) cti = C1*(ct2*, pt1, pt2, pt3, yt) This equation looks estimable, in principle, but because the constraint on c2 is not in fact faced by the consumers, the levels of consumer choice actually observed is fully determined by the arguments to (2), which appear as well in (3). Thus there is no feasible way to change ct2 with pti and yt fixed—a change in ct2 can only come about through a change in one of these other variables. If good 2 (health) is determined according to the resources and information available in period 0 and cannot otherwise be altered in period 1, then c12 = c02. The period 0 budget constraint is that just described, but the period 1 budget constraint is (4) p11c11 + p13c13 = y1 The unconditional demand for c12 will then depend on period 0 and period 1 (to the extent these are known at time 1) prices and income, so that (5) c12 = C12(p01, p02, p03,y0, p11, p13, y1) The conditional demand (3) is, however, identical except that p12 is omitted: (6) c11 = c11*(c12*, p11, p13, y1) A comparison of (5) and (6) indicates that the problem arising in (2) and (3) no longer obtains. It is straightforward to imagine an increase in consumption of good 2 in period 1 (as a consequence, say, of changes in period 1 prices) while leaving the other arguments of (6) unchanged.

OCR for page 53
REFERENCES Cleland, J., J.F. Phillips, S. Amin, and G.M. Kamal 1994 The Determinants of Reproductive Change in Bangladesh: Success in a Challenging Environment. Washington, DC: World Bank, Regional and Sectoral Studies. Duflo, E., and C. Udry 2001 Risk and Intrahousehold Resource Allocation in Côte d’Ivoire. Unpublished manuscript Massachusetts Institute of Technology. Foster, A.D. 1994 Program Effects and the Allocation of Resources within the Household. Unpublished manuscript Brown University. Available online at http://adfdell.pstc.brown.edu/papers/gwu.pdf Foster, A.D., and M.R. Rosenzweig 1995 Learning by doing and learning from others: Human capital and technical change in agriculture. Journal of Political Economy 103(6):1176-1209. 2000 Missing women, marriage markets, and economic growth. Unpublished manuscript, Brown University. Available online at: http://adfdell.pstc.brown.edu/papers/sex.pdf 2001a Household division, inequality and rural economic growth. Review of Economic Studies. Available online at:http://adfdell.pstc.brown.edu/papers/split.pdf 2001b Imperfect commitment, altruism, and the family: Evidence from transfer behavior in low-income rural areas. Review of Economics and Statistics 83(3):389-407 (August). Gertler, P.J., and J.W. Molyneaux 1994 How economic-development and family-planning programs combined to reduce Indonesian fertility. Demography 31(1):33-63. Glewwe, P.W., and M. Grosh, eds. 2000 Designing Household Survey Questionnaires for Developing Countries: Lessons from 15 Years of the Living Standards Development Study. New York: Oxford University Press. Manski, C.F. 1993 Identification of endogenous social effects—The reflection problem. Review of Economic Studies 60(3):531-542. Menken, J., and J. Phillips 1990 Demographic change in rural Bangladesh: Evidence from Matlab. Annals of the American Academy of Political and Social Science 510:87-101. Pitt, M.M., M.R. Rosenzweig, and D.M. Gibbons 1993 The determinants and consequences of the placement of government programs in Indonesia. World Bank Economic Review 7(3):319-348. Pollitt, E., K.S. Gorman, P.L. Engle, J.A. Rivera, and R. Martorell 1995 Nutrition in early life and the fulfillment of intellectual potential. Journal of Nutrition 1125(4 Suppl):1111S-1118S. Rosenzweig, M.R., and K.I. Wolpin 1988 Migration selectivity and the effects of public programs. Journal of Public Economics 37(3):265-289.

OCR for page 53
2000 Natural “natural experiments” in economics. Journal of Economic Literature 38(4):827-874. Schultz, T.P. 1973 Explanation of birth rate changes over space and time: A study of Taiwan. Journal of Political Economy 81(2, Part 2): S238-S274. Thomas, D. 1990 Intrahousehold resource allocation: An inferential approach. Journal of Human Resources 25(fall):635-664. Thomas, D., E. Frankenberg, and J.P. Smith 2001 Lost But Not Forgotten: Attrition in the Indonesian Family Life Survey. RAND RP-965. RAND, Santa Monica, CA.