Using Estimates in Allocation Formulas
In this chapter we return to the topic of using estimates from the Census Bureau's Small Area Income and Poverty Estimates (SAIPE) Program (or other sources) for program purposes, specifically, for allocation of funds by the use of formulas. The chapter illustrates some problems for allocations that errors in estimates–not only persistent biases, but also random variability across areas and over time–may cause and how some kinds of formula provisions may exacerbate the effects of such errors.1
As discussed in Chapter 2, many federal programs include small-area income and poverty estimates as factors in formulas to allocate funds to states or other areas, such as school districts and service delivery areas. Many state programs also allocate funds to substate areas by formulas that use measures related to poverty or income. Typically, such funding formulas are complex: they often include more than one factor in addition to poverty or income, such as population of a certain age, total population, condition of housing stock, or relevant expenditures by the jurisdiction. They also often have other complex provisions, such as thresholds for eligibility, minimum allocation amounts, or hold-harmless require-
See Fellegi (1981) for a discussion of similar issues in the context of whether census estimates should be adjusted for measured undercount for use in allocation formulas.
ments (that jurisdictions receive not less than all or some fraction of their allocation amounts of the preceding year). The Title I education program provides an example of formulas with multiple provisions; see Box 6-1.
Formula Provisions for Title I Basic and Concentration Education Grants
The Title I program, which supports compensatory education programs to benefit educationally disadvantaged children, currently funds two different types of allocations to school districts–basic grants and concentration grants. The formulas for both types of grants allocate funds to school districts on the basis of their numbers of formula-eligible children: poor school-age children (as estimated by the Census Bureau) and other small groups of children—those in foster homes, in families above the poverty level that receive Temporary Assistance to Needy Families benefits, and in local institutions for neglected and delinquent children. The formulas also take account of state per-pupil expenditures. Both formulas includes thresholds and hold-harmless provisions, as well as state minimum allocation amounts.
Thresholds Basic grants allocate funds to school districts that meet two threshold criteria: at least 10 formula-eligible children and a percentage of formula-eligible children that exceeds 2 percent of their total school-age children. The thresholds for basic grants are low and exclude only about 10 percent of school districts. Concentration grants, in contrast, allocate funds only to school districts with high numbers (more than 6,500) or high proportions (more than 15%) of formula-eligible children: less than half of all school districts are eligible for concentration grants.
Hold Harmless The Title I legislation specified a 100 percent guarantee of the prior year's amount for basic and concentration grants for school year 1996-1997. For later years, it specified a sliding hold-harmless provision for basic grants and no hold-harmless provision for concentration grants. Under the sliding provision school districts with 30 percent or more formula-eligible children are guaranteed at least 95 percent of the prior year's grant; the guarantee is 90 percent for districts with 15-30 percent formula-eligible children and 85 percent for districts with fewer than 15 percent formula-eligible children. For school years 1998-1999, 1999-2000, and 2000-2001, Congress passed legislation providing a 100 percent guarantee for all eligible school districts for both basic and concentration grants. In addition, beginning with the 1999-2000 school year, Congress extended the concentration grant hold-harmless provision to eligibility as well as amounts: that is, any school district that was eligible for a concentration grant in the previous year would continue to receive the amount of that grant even if it was no longer eligible on the basis of the new SAIPE estimates of poor school-age children for school districts.
Formulas are complex because legislators and other policy makers often seek to satisfy multiple, sometimes conflicting, objectives. For example, they may wish to both target funds to poorer jurisdictions and to
provide some funds to as many jurisdictions as possible. They may also wish to respond to changes in short-term need but not cause a disruption by suddenly and sharply cutting back funding to jurisdictions where needs have declined. There may also be a desire to provide incentives to localities to contribute more funding of their own. Budget constraints overlay all of these considerations, further complicating matters.2
In considering how to structure fund allocation formulas to satisfy various objectives, it is important to consider the properties of the estimates that will be used for the formula factors and how features of those estimates may interact with formula provisions (see Federal Committee on Statistical Methodology, 1978; National Research Council, 2000b). It should not be assumed that estimates, even if they meet requirements of timeliness, geographic specificity, population specificity, and concept of poverty or income desired, are entirely accurate or unbiased.
Indeed, income and poverty estimates, whether from the decennial census, a household survey, an administrative records file, or a model like those in the SAIPE Program that uses multiple data sources, are just what the term implies: they are estimates that are subject to error. Survey estimates (from the census long form and other surveys) are subject to variability from sampling error. Model-dependent estimates are subject to both model error and sampling error. Income and poverty estimates from all sources are subject to other kinds of error as well. For example, they may exhibit variability due to random reporting errors (e.g., random misreporting of income in a survey or administrative records file). Estimates may also exhibit systematic bias for many reasons: they may be out of date, represent a somewhat different concept from that desired (e.g., participants in a program may not be a good proxy for people in poverty), or not pertain to the specified population group (e.g., estimates for poor children aged 5-17 may not be a good proxy for poor children aged 15-19). Furthermore, the estimates may be biased because of problems in data collection–for example, because people who fail to answer a survey or fill out an administrative form differ systematically from those who respond or because the question wording on a survey consistently elicits underreporting of income.
The extent of error in estimates can almost never be known precisely. Error, too, must be estimated. It is usually straightforward to estimate the sampling variability in direct estimates from a survey, but an estimate of sampling error understates the total variability in the estimates and does
The history of changes to the matching formula for the now-defunct Aid to Families with Dependent Children program illustrates some of the competing goals that legislators often confront (see Peterson and Rom, 1990; see also National Research Council, 1995a:Ch.8).
not address the issue of systematic bias. The magnitude of nonsampling errors is much harder to estimate. Users should require from producers as much information as possible about the error properties of small-area income and poverty estimates and use that information to assess the implications of using alternative estimates for fund allocation. It is particularly important to conduct such assessments when a new allocation program is being developed, an existing formula is being modified, or consideration is being given to changing from one source of estimates to another. Users need to recognize that errors in the estimates may have unintended consequences when they are used with a particular formula specification.
Ideally, users would consider both the benefits and costs of alternative sources of estimates for fund allocation, although it can be difficult to develop and implement an appropriate metric for doing so. It is not straightforward to estimate the costs of producing estimates or of improving their accuracy or other features (e.g., timeliness) that can affect accuracy. For example, it is not clear how much of the costs of collecting the survey and administrative data that are used in the SAIPE Program estimates should be assigned to those estimates. It is also not straightforward to estimate the benefits of improved estimates in terms of the effects on formula allocations. Nonetheless, users should consider the effects of error in estimates on allocations and the costs of alternative ways of reducing error, which may include replacing one set of estimates with another set, improving the accuracy of a given set of estimates, or changing a provision in the formula so that errors in estimates are less consequential for the resulting allocations.
As noted in Chapter 2, persistent bias in estimates is of particular concern because it means that, over time, some areas may consistently receive more or less funding than they would with unbiased estimates. Users may determine that, over time, one type of bias is less serious than another (e.g., that it is preferable to use more up-to-date estimates of poor school-age children as a proxy for poor children aged 15-19 instead of using decennial census estimates for the intended age group). That determination should be made, as much as possible, on the basis of careful analysis and consideration of alternatives.
Users should also recognize that some formula provisions may exacerbate the effects of bias in the estimates on fund allocations. For example, if there is a bias such that income is underreported and, hence, poverty is overestimated, and if the allocation formula includes a threshold for eligibility, then the use of biased estimates will likely lead to a
larger number of jurisdictions receiving funds than would occur with unbiased estimates. The effect of allocating funds to jurisdictions that are truly not eligible is to reduce the amount of funding that is available for truly eligible districts. This outcome probably occurred for Title I concentration grants in instances when states used school lunch counts to suballocate county amounts to school districts, given that students approved for free or reduced-price school lunches include near-poor as well as poor children.3
It is likely that biases will be greater for some types of areas than others. For example, if poverty is consistently overestimated for urban areas and consistently underestimated for rural areas, then urban areas will likely receive a greater proportion of total funding than intended by a formula. Morever, if the formula has a threshold, some rural areas will receive no funding, even though they are truly eligible, and, conversely, some urban areas will receive funds when they truly are not even eligible.
An alternative that policy makers could consider instead of thresholds, particularly when there is reason to suspect bias in the estimates, would be to make fund allocations a smooth function of the estimates. For example, the dollar amount allocated per poor child could increase with the proportion of poor children in the area. In this way, there would be no danger of a poor district receiving nothing, yet funds would still be targeted toward poorer areas.4
While a persistent and sizable bias is generally of most concern for the use of small-area income and poverty estimates in fund allocation formulas, variability in the estimates, due to sampling error and other sources, can have unintended effects on allocations as well. Panel members conducted simulations to illustrate the effects of variability in estimates on fund allocations under several different scenarios.
The analysis by panel members Alan Zaslavsky and Allen Schirm (reported in the Appendix) was originally prepared for a workshop on methodological issues for the planned American Community Survey
The formula for Title I basic grants also includes a threshold, but it is very low (see Box 6-1).
However, high variability in the estimates due to sampling error could reduce the targeting of funds to poorer areas with either the use of a smooth function or a threshold (see Betson, 1999a; see also “Variability,” below). As an alternative approach for better targeting of funds to poorer areas, it might be possible to keep a threshold for eligibility and use a different type of estimator that reflects uncertainty in the estimates (see National Research Council, 2000b).
(ACS) (see National Research Council, 2000b). Their work considered the effects of changing from using outdated decennial census estimates to using more current ACS estimates with higher sampling error. The analysis illustrates the problems that variability can cause, particularly when formulas include thresholds and hold-harmless provisions. It also suggests that alternative forms of estimates (such as moving averages) may reduce variability and be as effective as hold-harmless requirements in cushioning areas against sharp declines in funding. The analysis does not strictly apply to model-dependent estimates, such as those in SAIPE; nonetheless, the general conclusions are likely to hold.
Panel member David Betson (1999a) conducted related analyses that further illustrate the unintended consequences that variability in estimates can have on fund allocations. Some of these analyses are summarized below (see “Illustrative Scenarios”).
Census Versus More Current Survey Estimates
Traditionally, the decennial census long-form survey has supplied many of the income and poverty estimates used in allocation formulas. Census estimates have the advantage of comparatively small sampling error for many areas, although even census estimates have high sampling error for very small areas, such as many school districts. Census estimates are subject to other kinds of variability and to bias from several sources, including bias for annual allocations because the census measures poverty and income only at 10-year intervals although income and poverty can change markedly over shorter periods.
The use of census data in funding formulas provides a fixed stream of allocations to an area over 10 or more years (assuming no changes in appropriation levels) with no recognition of changes in need during that period. Moreover, even though, in the long run, some areas may receive funding on the basis of census data that is equivalent to the funding they would receive on the basis of their true average income or poverty over the period, this result will almost certainly not occur for all areas. Because a census takes place only once every decade, there may be areas that receive more (less) than their fair long-run share over, say, a 30-year period because their poverty rate in the 3 census years is above (below) their average rate, either due to chance variability or a systematic upward (downward) bias in their census measurements. Also, the use of census estimates in formulas that include hold-harmless provisions at a fairly high level can favor areas that have a higher-than-typical poverty rate (or lower-than-typical median income) in a census year. It could take decades for the allocations for such an area to return to a level that is more appropriate to the area's typical income or poverty level.
Measuring income and poverty more frequently, as is planned for the ACS, can make it possible for allocations to respond more quickly to changes in need. However, household surveys have considerably higher sampling error than the census and are subject to other kinds of error as well (although for some areas the mean square error of the survey estimates may be smaller than that of the census estimates). If separate samples are drawn each year, as is planned for the ACS, then sampling error can be reduced by cumulating the data over more than 1 year to calculate moving averages, but this approach again makes funding less responsive to changes in need.
There may also be other problems with using averages of estimates over time. For example, changes in appropriation levels could mean that the average funding shares received by an area over a long period, calculated by using a weighted moving average of estimates with fixed weights for each year combined with a linear allocation formula (i.e., a formula with no thresholds or other nonlinear provisions), may be larger (or smaller) than the average funding shares that would obtain with annual estimates (see Appendix). When a formula has a substantial threshold (like that for Title I concentration grants), the use of moving averages may also lead to a different allocation than would obtain with annual estimates (e.g., an area that experiences an increase in need in a particular year may not cross the threshold with a moving average). However, the use of moving averages may be advantageous to the extent that localities value continuity of funding. Detailed analysis of these and other situations is needed to fully understand the implications for allocations of various formula provisions and sources of error in income and poverty estimates that are used in formulas.
To look at more complex interactions of nonlinear funding formula provisions with variability in estimates of income and poverty, panel members developed several illustrative scenarios for which simulations were run, some of which we summarize here (see also the Appendix). These scenarios necessarily incorporate simplifying assumptions. Yet they call attention to how there can be unintended consequences for allocations due simply to sampling error in the estimates used for formula factors and to certain formula provisions.
The scenarios focused on two types of nonlinear formula provisions: thresholds and hold-harmless provisions. Thresholds are used in some allocation formulas to target areas most in need while meeting a budget constraint, and hold-harmless provisions are used in many allocation formulas to cushion the effects of a decrease in funds due to a decline in
measured need (see Box 6-1). Summarized below are three kinds of scenarios: for a single area in a single year, assuming open-ending funding; for a single area for more than 1 year, assuming open-ended funding; and comparisons of open- and closed-ended funding.
Single Area, Single Year, Open-Ended Funding
One scenario (see Appendix:Table A-1) looked at the effects of different levels of sampling error in the direct estimate of a poverty rate on allocations for an area when the formula includes a threshold poverty rate below which the area receives zero funding. If an area's estimated rate exceeds the threshold, it receives funds directly in proportion to the estimated rate. Four different true poverty rates were used in the simulations, two above and two below the threshold rate. These simulations ignore the fact that the allocation for a single area typically depends –at least to some extent–on the allocations for other areas because the total funding amount for a program is usually fixed and not open-ended.
The results showed that the higher the sampling error, the greater is the expected value of the funding that an ineligible area (i.e., with a low true poverty rate) would receive, when it should receive no funding at all with an exact measurement. Conversely, with increasing sampling error, the smaller is the expected value of the funding that an eligible area (i.e., with a high true poverty rate) would receive compared with the amount it would receive with an exact measurement. 5 These results occur because, as a result of sampling error, the estimate for an ineligible area will sometimes lead to it being classified as eligible, and the estimate for an eligible area will sometimes lead to it being classified as ineligible. The negative relationship between sampling error and expected value of funding for an eligible area is not strictly linear: at very high levels of sampling error, the expected value of funding increases again for an eligible area instead of continuing to decline, although it always remains below the allocation that would be received with an exact measurement.
The above results apply only in expectation. The expected value of funding that an area would receive is the average value over the set of values in the simulation. The particular allocation that an area will receive is subject to chance variability: it will be a single value and not the expected value. It is clearly desirable that the variability of the individual
The expected value of funding is the average amount from a large number of simulations for a given level of assumed sampling error. See Fuller (1995) for a mathematical demonstration that is related to this result.
values around the average value not be large. For example, it is problematic for an eligible area to have a sizable chance of receiving no funds or a large amount.
The above results also apply only to a single year. Over time, eligible areas are likely to value continuity in their levels of funding. In this case, they clearly benefit from lower levels of sampling error because the relationship between sampling error and the variability of the expected funding for an eligible area for which the poverty rate does not change over time is linear (see Betson, 1999a).
Overall under this first scenario, as sampling error increases, the sharp cutoff envisioned by the threshold in the formula (zero funds, some funds) is replaced with a relationship that is almost linear between an area's poverty rate (or other measure) and its expected funding. Because smaller areas will have higher sampling error than larger areas for most survey-based estimates, it is more likely that smaller areas, if they are truly ineligible, will incorrectly obtain some funding, or, if they are truly eligible, will obtain less funding than intended by the formula. The relationship of error to size of area is not so clear for model-dependent estimates, for which, in general, errors will tend to vary less across areas than they will for direct survey estimates.
Models may often provide a more cost-effective means of reducing variability than the alternative of paying to increase the sample size in a survey. However, an assessment would be needed of whether the total error (mean square error) is less for model or survey estimates.
Single Area, More than 1 Year, Open-Ended Funding
One scenario with a time dimension (see Appendix:Figure A-1) looked at the effects of different levels of sampling error on allocations over a 4-year period for a single area when the formula includes an 80 percent hold-harmless provision and there is no change in the poverty rate for the area. In this scenario an area receives funds in direct proportion to its estimated poverty rate without a threshold constraint. For an area with high sampling error, there is a considerably higher probability that the area will receive more funding in the second year than it would with an exact measurement and that the area will increasingly benefit from this windfall for years 3 and 4. By decreasing sampling error, the use of a 3-year moving average greatly reduces this effect. The results for another such scenario in which the formula includes both a threshold and a hold-harmless provision were even more pronounced than the results for each provision alone (see Appendix:Table A-2).
Yet other scenarios with a time dimension (see Appendix) looked at allocations for an area experiencing a downward trend in poverty rates
and compared the effectiveness of hold-harmless provisions and moving-average estimates in dampening the magnitude of declines in funding from year to year. The results suggested that 3-year moving-average estimates could be as effective as a hold-harmless provision in moderating downswings in allocations.
Open-Ended and Closed-Ended Funding
Betson (1999a) ran simulations for scenarios with open-ended and closed-ended funding formulas and an 80 percent hold-harmless provision. The assumption for the open-ended formula was that additional funds would be appropriated to accommodate any increase needed because some areas received more funds with the hold-harmless provision than they would have otherwise. The results for closed (i.e., fixed) funding showed that the operation of the hold-harmless provision would work against higher poverty areas in comparison with lower poverty areas. The disadvantage for the higher poverty areas was greater with higher sampling error.
Betson's analyses showed that, in general, a higher sampling error, together with a threshold, or (for a closed-ended program) a hold-harmless provision, or both, tended to equalize the funding amount per eligible person (poor child in the Title I program) across areas. This result is counter to the goal of a program, such as Title I concentration grants, that is designed to provide extra funding (beyond the basic grant) to needier areas.
The analyses conducted by panel members of the interactions of sampling error in poverty estimates with such provisions of funding formulas as eligibility thresholds and hold-harmless provisions are just a first step in probing all of the issues involved in specifying formulas that can achieve their intended goals. Complicating the problem is that, as we note above, programs generally have multiple and often competing goals that can make it difficult to specify an effective formula even without the added effects of errors in the estimates used for allocations.
Some level of error in estimates is inevitable. While further analysis of the effects of error is needed, the panel's work strongly suggests that policy makers need to take account of expected levels of bias and variability in the estimates that are considered for formulas. Policy makers need to ask analysts to evaluate both alternative formulas and alternative estimates to determine those formula provisions and kinds of estimates that
are best able to achieve such goals as targeting funds to more needy areas and avoiding sudden, large funding decreases on local budgets. For example, the panel's work suggests that moving-average estimates could serve the goal of cushioning budgets against fund decreases without misallocating funding as much as a hold-harmless provision. These and other options deserve a full-scale research effort that can inform policy makers about the likely advantages and disadvantages of alternative funding formulas and sources and kinds of estimates to use in them.
The Committee on National Statistics is planning to conduct more work in this area. With the participation of our panel, it held a workshop in spring 2000 on issues in using estimates for fund allocation, and a more intensive study of the interactions of properties of estimates with features of funding formulas is planned to begin this year. We believe such an effort can usefully inform both users and producers of small-area estimates.