Data Sources for Estimating Formula Components
Components of allocation formulas fall into three broad categories: need, fiscal capacity, and effort. These components were defined and several examples given in Chapter 4; here we consider the sources of data used to estimate these components. Specifically, we discuss how data sources for each formula component are determined, which sources are most commonly used, and what considerations are relevant in choosing from among alternative data sources.
WHO DETERMINES WHAT DATA SOURCES ARE TO BE USED?
For some programs, the original authorizing legislation or various amendments to it are very specific about the form of the allocation formula and data sources for each formula element. The current legislative requirement for the State Children’s Health Insurance Program (SCHIP), for example, specifies that the estimate of the number of uninsured children for each state is to be based on the three most recent March supplements to the Census Bureau’s Current Population Survey (CPS) before the beginning of the calendar year in which the fiscal year begins. For other programs, Congress delegates authority for such decisions to the program agency or the secretary of the department in which the agency resides. For example, a provision of the Individuals with Disabilities Education Act, the authorizing legislation for the special education program, states that: “For the purpose of making grants under this paragraph, the Secretary [of
Education] shall use the most recent population data, including data on children living in poverty, that are available and satisfactory to the Secretary” (20 U.S.C., Ch. 33, Subchapter II, Section 1411(a)).
Even when a data source is clearly specified in legislation (such as the CPS in the SCHIP example) the actual estimates from that source will be affected by survey design and procedures. These factors are largely determined by the program agency and the agency responsible for collecting the data, subject to budgetary constraints. For example, to reduce the high sampling variability of CPS state-specific estimates of uninsured low-income children, starting with FY 2000 Congress provided an annual appropriation of $10 million to increase the sizes of the relevant samples.
For the Title I education program, Congress has given the U.S. Department of Education considerable flexibility in deciding on data sources. Prior to the mid-1990s, estimates of the number of poor school-age children by state were required by law to be taken from the most recent decennial census.1 By the end of each decade, these estimates could be seriously deficient as a basis for estimating the current distribution of need. The Improving America’s Schools Act, passed in 1994, called for the use of updated Census Bureau estimates of poor school-age children to allocate Title I funds, provided the estimates were found to be sufficiently reliable by a panel of the National Research Council (NRC). In response to the 1994 act, the Census Bureau established a small-area income and poverty estimates (SAIPE) program to develop estimates by state, county, and ultimately by school district, using a model-based approach that combined data from the decennial census, the CPS, and administrative records. With review and eventual advice from the NRC Panel on Estimates of Poverty for Small Geographic Areas, these estimates were adopted for use in allocating Title I funds (National Research Council, 2000).
As noted in Chapter 1, for the Special Supplemental Nutrition Program for Women, Infants, and Children (WIC) program Congress defined the program objectives in legislation and left it to the agency to develop the allocation formula. Thus, the Food and Nutrition Service of the U.S. Department of Agriculture was responsible for determining what data sources and procedures should be used to estimate formula components.
Since the allocation formula was first used in 1979, the program agency has made several improvements in the estimates (National Research Council, 2001:32-34).
A wide variety of data sources are used to estimate formula components for the more than 180 federal formula allocation programs. These sources are grouped below into four major categories.
Decennial Census and Current Population Estimates
Every 10 years, the decennial census provides counts of population by age, sex, race, and Hispanic origin for states, counties, cities, and other political subdivisions such as school districts. For censuses through 2000, a long-form sample has provided additional data, at the same level of geographic detail, for persons, families, and housing units. These decennial census sample data, in particular information on income, have been widely used as inputs to allocation formulas. The Census Bureau has recently announced plans to eliminate the long-form sample from the 2010 decennial census. A large-scale continuing household survey, the American Community Survey (ACS), is intended replace it, producing continuously updated data of similar content. When cumulated over 5-year periods, the ACS data will provide estimates of roughly the same precision as the decennial long form.
For several decades the Census Bureau has provided intercensal estimates of population, using data on births, deaths, immigration, and internal migration to “walk” the estimates from the most recent census to the current date. Expansion of this current estimates and projections program has been driven to a substantial degree by the requirements of formula allocation programs. From 1972 through 1986, estimates of population and per capita income for approximately 39,000 units of local government were required for allocations under the general revenue sharing program. Current population estimates by state serve as the denominator for the estimates of per capita income used in the formula for the federal matching assistance percentage (FMAP), used in Medicaid and several other formula allocation programs to determine federal matching rates by state. More recently, the SAIPE program was established to produce current estimates
of school-age children in poverty by county and school district for use in the Title I education allocations.
A study by the U.S. General Accounting Office (1990) determined that nearly two-thirds of federal formula funds were distributed either wholly or in part using population data from the decennial census. Due to their lack of timeliness and the availability of more current estimates from the Census Bureau, decennial census data are now used less as formula inputs.2 The community development block grants program provides an exception; several of the elements in the two alternative formulas use data from the most recent decennial census. Also, Census data are required for metropolitan cities and urban counties for such characteristics as population in poverty, the number of housing units with more than 1.01 persons per room, and the number of housing units built before 1940. Such data have not been readily available from any current household surveys; however, a fully operational ACS should provide more current data for some of these variables.
The CPS provides monthly estimates of employment and unemployment for various population subgroups. Its annual March supplement provides data on individual and family income for the preceding calendar year. CPS income data are an important input to the SAIPE model-based estimate of school-age children in poverty that are used in the Title I education allocations. Data from the March supplement are also used in model-based estimates of infants and children eligible for the WIC program by state, as well as in state estimates of uninsured and total low-income children for SCHIP.
The National Household Survey of Drug Abuse, which was recently expanded to provide direct estimates for eight states and synthetic estimates for the remaining states, has the potential to provide better estimates of need than those currently used in the allocation formula for the substance abuse block grant program. However, like most household surveys, it does not cover institutionalized populations.
Other Statistical Programs
As part of its system of national income and product accounts, the Bureau of Economic Analysis publishes annual estimates of per capita personal income for the nation, for states, and for selected metropolitan areas and counties. These estimates are based on a combination of survey and administrative data. The state estimates, averaged over the latest three-year period, are used as a proxy measure of relative fiscal capacity in the FMAP formula. Thus, this data source affects distributions in programs that account for more than half of the total federal funds allocated each year. The Treasury Department’s series on total taxable resources, which is based on data from the Bureau of Economic Analysis and the Internal Revenue Service’s Statistics of Income Division, is used as a measure of state fiscal capacity in the substance abuse and mental health block grant programs.
Wage and price statistics from several sources are used in some programs to account for geographic differences in the cost of program services. The cost of services index in the formulas for the substance abuse and mental health block grant programs uses data on manufacturing wages from the Bureau of Labor Statistics’ Current Employment Statistics Survey and data on fair-market rents from the U.S. Department of Housing and Urban Development. The food grant portion of the school lunch program uses data from the food away from home component of the Bureau of Labor Statistics’ consumer price index for annual updates of national average prices for free, reduced price, and paid lunches. The cost factor in the allocation formula for SCHIP is based on Bureau of Labor Statistics data on mean annual wages in the health services industry.
Allocations in the Environmental Protection Agency’s (EPA) Clean Water State Revolving Fund are based on information about infrastructure needs of public water systems identified in periodic drinking water needs surveys. Prior to fiscal year 1988, allocations in EPA’s state capitalization grants program were based in part on specific needs for waste water treatment and water pollution control, as identified in the periodic clean water needs surveys.
Administrative and Program Records
Administrative and program records play a major role in the determination of amounts allocated to states and other recipients in many programs. For open-ended matching grant programs such as Medicare and
foster care, formula-based matching proportions are applied to state records of eligible program expenditures to determine the amounts to which the states are entitled. For the school lunch program’s food grants, allocations to states are based on their records of the number of paid, reduced price, and free lunches served. In the initial years of the special education program, allocations to states depended on the number of children participating in their programs. Data on expenditures and enrollment in elementary and secondary public schools, collected from the states by the National Center for Education Statistics, are used to calculate the state per pupil expenditure component of the Title I education allocation formula. State-provided data on vehicle miles traveled on the interstate system, lane miles and vehicle miles traveled on principal arterial routes (excluding the interstate system), and diesel fuel used on highways are inputs to allocation formulas used for subprograms of the federal aid highway program.
Model-based estimates of need, which combine data from several different sources, have made substantial use of administrative data. Data on tax returns from the Internal Revenue Service and on participation in the food stamp program from the U.S. Department of Agriculture have been used to estimate need components in allocation formulas for the Title I education and WIC programs. WIC has also made use of data on unemployment insurance claims.
CONSIDERATIONS IN THE SELECTION OF DATA SOURCES
As previously indicated, data sources to be used in estimating formula inputs are sometimes specified in the authorizing legislation; sometimes the choice is left to the program agency. In all situations, factors to consider in deciding what data sources to use, relate to data quality and to evaluation of the costs and benefits associated with the use of alternative data sources.
Formula allocation programs provide for annual allocations for a specified or indefinite number of years. Choices of data sources for the estimation of formula inputs can be influenced by the initial allocations and by the way allocations change from year to year. Choices may be influenced by how program designers evaluate trade-offs between relative stability in annual funding and responsiveness to changes in the distribution of true need among recipients.
Data Quality Considerations
Data from a census, a survey, or other statistical or administrative record source have several attributes that may be relevant to their suitability for use in formula allocations:
The conceptual fit between currently available data and the formula elements, as defined in authorizing legislation or administrative regulations. If the definitions of the elements or program goals lack specificity, evaluation of fit may require subjective judgments. Even if the primary goal of program designers is to arrive at a predetermined allocation, choice of data sources that provide a good conceptual fit may improve the initial and ongoing credibility of the allocation process.
The level of geographic detail at which data are provided. Most programs allocate appropriated funds to the state level; a few allocate funds to smaller areas, such as metropolitan areas, counties, or school districts. The decennial census can provide estimates for areas as small as school districts (although with substantial sampling variability for the smaller districts), whereas the Survey of Income and Program Participation (believed to provide more precise estimates of family income) can provide reasonably stable direct estimates for only a few large states.
The timeliness of the data, the elapsed time between the reference period for the estimates and the period for which the allocations are being made. Late in a decade, decennial census data are at an obvious disadvantage compared with continuing or periodic sample surveys and administrative record sources.
The levels of sampling variability and bias associated with the data. It is important that these factors be evaluated in terms of their expected effects both on initial distributions and on year-to-year changes in allocations.
The susceptibility of the data to manipulation by program recipients. Of necessity, such data as state program expenditures in matching grant programs must be generated by recipients of the grant funds. In such circumstances, to ensure accurate reporting, program agencies issue regulations that define standard concepts and definitions for use in reporting and develop quality control and audit procedures.
There are many trade-offs among these quality considerations, and it is unlikely that any single data source will be uniformly superior to others.
Trade-offs can be illustrated by comparing alternative sources of income data. At the national level, the most comprehensive, individual-level data on income by source come from the Survey of Income and Program Participation, but it has the smallest sample size. CPS data are somewhat less detailed but are based on a larger sample. Individual income tax data are not subject to sampling error but cover only about 90 percent of the total population and are based on income concepts that differ from those used in most fund allocation programs. However, their utility would be improved if they were geocoded to the county and school district levels.3 Decennial census data cover a larger proportion of the population and provide more geographic detail, but they lack timeliness and are subject to greater underreporting of some types of income. Model-based estimates that combine information from these data sources have the potential to take advantage of the strengths of each source. The SAIPE estimates developed by the Census Bureau for use in the Title I education formula program provide an excellent example of the modeling approach.
Obtaining data to be used as inputs to allocation formulas is by no means cost free. Even when data sources created for other purposes are used, there may be significant costs of obtaining data in a suitable format, mapping variable definitions into those needed and evaluating the performance of the inputs. Hence, the potential benefits conferred by improving conceptual fit or other aspects of data quality have to be weighed against the cost of such improvements. Spencer (1985) discusses some ways that a cost-benefit analysis for a statistical data program can take into account the trade-offs between nonoptimality of an allocation formula and the cost of improving it.
If a formula is designed to produce a predetermined allocation, one may question the need for high-quality statistical data to serve as inputs to the formula. However, if a formula is designed to meet specified program goals, it is possible that social welfare is increased when high-quality data are used—for instance, high-quality data may help maintain support for a program if it conveys the sense that the allocations are fair and responsive. But these benefits must be assessed, especially when a closed-ended formula
is used to distribute a fixed total, so that underallocation to some areas is matched algebraically to overallocations to other areas. Relevant assessments include simulations to determine how the use of higher quality data is likely to change the allocations. Determining the likely program effects of such changes is much more difficult. The sometimes tenuous links between improved inputs to formulas and attainment of goals for increases in social welfare have implications for how much to invest in developing new or improved data for allocation purposes. If the link is strong, then the benefit from spending more money to improve the statistics will be more than if the link is weak.