National Academies Press: OpenBook

Reengineering the Census Bureau's Annual Economic Surveys (2018)

Chapter: 6 Editing, Imputation, Disclosure Control, and Quality Standards

« Previous: 5 Sampling and Estimation
Suggested Citation:"6 Editing, Imputation, Disclosure Control, and Quality Standards." National Academies of Sciences, Engineering, and Medicine. 2018. Reengineering the Census Bureau's Annual Economic Surveys. Washington, DC: The National Academies Press. doi: 10.17226/25098.
×

6

Editing, Imputation, Disclosure Control, and Quality Standards

This chapter addresses the final stages of processing for the annual economic surveys before usable estimates are produced: (1) editing the raw data for logical errors and responses that do not appear to make sense; (2) imputing values for missing responses; (3) reviewing estimates to minimize the risk of disclosing confidential information and, as needed, taking appropriate corrective action; and (4) assessing whether estimates meet established standards for reliability and, as needed, flagging or suppressing problematic estimates. Recommendations throughout the chapter are designed to improve quality. While the panel’s recommendations are focused on the existing surveys, the overarching goal is for methods in all four areas to be harmonized across the surveys as an important step toward an integrated Annual Business Survey System (ABSS).

Many of the annual economic surveys use updated administrative data from the U.S. Census Bureau’s Business Register for imputation and nonresponse adjustment. These data are integrated from the Internal Revenue Service (IRS) and other sources into the register on a yearly basis (see Chapter 3). Administrative data on receipts, payroll, and inventory values, as well as expense values for select industries, are used extensively in editing and imputation for many of the annual economic surveys as described below.

6.1 CURRENT EDITING PRACTICES

With the exception of a few surveys that collect classification or structural information, all of the annual economic surveys use both micro and macro edits. Micro edits are performed on the individual records for the

Suggested Citation:"6 Editing, Imputation, Disclosure Control, and Quality Standards." National Academies of Sciences, Engineering, and Medicine. 2018. Reengineering the Census Bureau's Annual Economic Surveys. Washington, DC: The National Academies Press. doi: 10.17226/25098.
×

reporting unit (establishment or companies), which are evaluated for internal consistency, inappropriate responses, and missing values. Macro edits are performed by examining aggregate estimates for all variables in a survey for reasonableness, followed by searching for influential records that may distort aggregate estimates.

The annual economic surveys use a variety of micro edits. There are rules-based edits that identify logical errors, such as occur when individual items do not sum to the reported total; when there are inconsistencies among the item responses; when a respondent selects more than one check box when only one should be checked; and when items are inappropriately skipped (missing). All of the annual economic surveys use some set of rules-based edits, usually to flag errors for an analyst’s review. Edits that use the relationship between variables, such as a ratio, are used extensively, generally citing the Hidiroglou and Berthelot (1986) approach to identifying outliers. In addition to the items that are flagged by these rules-based edits, in most of the surveys, the responses for the largest 20 companies or the companies with the largest percentage changes between the prior and current year also are reviewed. Analysts review any outliers that are identified, and, in some instances, outliers are excluded from the imputation base during estimation. Procedures for identifying and resolving outliers are specific to each survey or group of surveys.

Magnitude comparisons between reported and comparable administrative data fields are also part of the editing process for a number of the surveys. Large differences may trigger automatic substitution of administrative data for reported data. For example, the Annual Survey of Manufactures (ASM) does this for single unit payroll.

Formal macro-level edits have been used in the past in the Annual Capital Expenditures Survey (ACES; see Thompson, 2007), but the Economic Statistical Methods Division at the Census Bureau was not able to verify for the panel whether any surveys are currently employing formal macro edits. Ad hoc macro-level editing generally is performed after the micro-level review phase. Distributions of tabulated cell estimates are scrutinized, for the current collection period and in contrast to corresponding prior period estimates. Further checks can lead to the identification of influential records and records with large differences in values from prior years that appear to drive the aggregate discrepancies. Macro edits frequently lead to additional review by an analyst, and written explanations may be added to Census Bureau publications that document discrepant estimates from prior years.

The manual editing task is enormous. Many of the surveys have more than 100 rules-based edits, most requiring analysts’ reviews. The number of full-time equivalent (FTE) staff dedicated to editing, imputation, and review for five of the annual economic surveys—ASM, Annual Retail Trade Survey (ARTS), Annual Wholesale Trade Survey (AWTS), Service

Suggested Citation:"6 Editing, Imputation, Disclosure Control, and Quality Standards." National Academies of Sciences, Engineering, and Medicine. 2018. Reengineering the Census Bureau's Annual Economic Surveys. Washington, DC: The National Academies Press. doi: 10.17226/25098.
×

Annual Survey (SAS), and ACES—make up about 40 percent of the total FTEs for these surveys: see Table 6-1. The Census Bureau’s high cost of editing is consistent with the historical experience of statistical agencies in other countries (see U.N. Statistical Commission and Economic Commission for Europe, 1997). Freeing staff from editing duties might provide time for them to work on other important activities, including implementation of the recommendations in this report and, in general, make more efficient use of the Bureau’s constrained resources.

Extensive manual editing also may raise the risk of introducing errors. Business responses may be generated by different people according to their expertise in the company (Smith, 2013, p. 493). If follow-up calls are not routed to the original respondents, they may result in different and less accurate information. For large business enterprises, the risk of introducing editing errors can be reduced by assigning account managers as the panel recommends (see Recommendation 3-3, in Chapter 3).

Additionally, manual editing of all sample cases is inefficient in that some edits may have limited effects on the survey estimates. Allocating effort to areas that have the biggest effects could improve efficiency. For example, concentrating on cases that have large weights or contribute substantially to important subgroup estimates may be more efficient than 100-percent editing of all sample cases. Errors in quantitative variables that affect estimates will be more important to correct than ones in qualitative fields. The Census Bureau is well aware of the expense of manual editing and has already studied how to make the process more efficient in ACES (Nguyen et al., 2017). Similar studies would likely have major payoffs for the other annual economic surveys, with the added benefit that the freed-up personnel could be assigned to work on the development of an ABSS.

RECOMMENDATION 6-1: The Census Bureau should review the edit processes in the annual economic surveys with a goal of focusing the edits on important variables for estimates, automating the remaining edits as much as possible, and reducing the personnel time spent on edits while maintaining the quality of the survey estimates. The Bureau should thoroughly test streamlined edit processes with the goal of developing a standard set of edits that will work for most of the current annual economic surveys and for the panel’s recommended Annual Business Survey System.

6.2 IMPUTATION FOR MISSING DATA

The annual economic surveys use a variety of methods for imputing for missing respondents and data items. Table 6-2 summarizes the general methods used for imputation in the surveys:

Suggested Citation:"6 Editing, Imputation, Disclosure Control, and Quality Standards." National Academies of Sciences, Engineering, and Medicine. 2018. Reengineering the Census Bureau's Annual Economic Surveys. Washington, DC: The National Academies Press. doi: 10.17226/25098.
×

TABLE 6-1 Estimated Fiscal 2015 FTEs and Costs for Five of the Annual Economic Surveys

Survey FTEs by Activity ASM ARTS AWTS SAS ACES Total FTEs
Number Percent
Program Management 5.4 1.6 1.5 5.6 2.4 16.4 10.4
Sample Design 1.6 3.7 3.2 4.0 1.5 13.9 8.8
Data Collection 10.9 5.8 4.4 10.9 6.2 38.2 24.1
Data Editing, Imputation, Reviewa 19.1 9.0 8.0 15.4 12.5 63.9 40.4
Dissemination 5.8 1.1 0.9 4.9 2.3 15.0 9.5
Information Technology (IT) Support 3.5 1.1 1.1 2.5 2.6 10.8 6.8
Total FTEs 46.2 22.2 19.2 43.2 27.5 158.3 100.0
Total Fiscal 2015 Estimated Cost (thousands of dollars) $7,439.9 $4,355.6 $2,072.3 $10,359.7 $5,212.1 $29,439.5

NOTES: See text for full names of the surveys. FTE, full-time equivalent staff person. Staff numbers do not include Jeffersonville, Indiana, processing center or IT resources other than those that directly support these surveys.

aNumbers in this row are italicized to underscore the high numbers and percentages devoted to these functions; see text for discussion.

SOURCE: Information provided to the panel by the Census Bureau.

Suggested Citation:"6 Editing, Imputation, Disclosure Control, and Quality Standards." National Academies of Sciences, Engineering, and Medicine. 2018. Reengineering the Census Bureau's Annual Economic Surveys. Washington, DC: The National Academies Press. doi: 10.17226/25098.
×

TABLE 6-2 Methods of Imputation and Use of Administrative Data in the Annual Economic Surveys

Sector Survey Method of Imputation for Missing Items Use of Administrative Data
Manufacturing Annual Survey of Manufactures (ASM) Missing items imputed in various ways:
  • substitution of administrative data for payroll when not reported;
  • imputation of missing receipts based on payroll done sequentially: payroll, then receipts, then materials;
  • derivation of totals from reported details;
  • imputation of totals (and detailed items, if any) using regression models with current year data or regression models using both current year and prior year data;
  • raking to derive detailed items that sum to the total for a given complex;
  • use of historically based ratios to impute detailed items of a specific complex; and
  • rounding checks on specific items when reported data fails specific limit checks.
See item imputation.
Manufacturers’ Unfilled Orders Survey (M3UFO) No item imputation; editing only Receipts from BR are used to check reported receipts. In some instances, receipts total is used to impute for annual value of shipments for missing/incorrectly reported values. Analysts use annual reports (10-Ks) when available, which are treated as other sources of reported data.
Management and Organizational Practices Survey (MOPS) No item imputation; editing only ASM data are used to supplement collected data: tabulation status, employment, and NAICS codes are added from ASM. Establishment age is added from the Longitudinal Data Base.
Suggested Citation:"6 Editing, Imputation, Disclosure Control, and Quality Standards." National Academies of Sciences, Engineering, and Medicine. 2018. Reengineering the Census Bureau's Annual Economic Surveys. Washington, DC: The National Academies Press. doi: 10.17226/25098.
×
Sector Survey Method of Imputation for Missing Items Use of Administrative Data
Trade Annual Retail Trade Survey (ARTS) Missing items imputed in various ways:
  • simple imputation—single item (free form) or total with detail (balance complex);
  • general imputation—using subset of administrative data for current year;
  • substitution of sum of subannual (monthly, quarterly) data for the current year;
  • multiplication of unit’s prior period value by a ratio of current-to-prior period data, where the data item used in the ratio is assumed to be highly correlated with the data item of interest (e.g., inventory to sales);
  • imputation of annual revenue from a regression model;
  • multiplication of a unit’s prior period value by a “ratio of identicals”; and
  • raking of detailed items (reported or imputed) to their total.
Administrative data updates from BR are used for:
  • current year administrative receipts;
  • prior year administrative receipts;
  • current survey year Q1, Q2, Q3, and Q4 payroll;
  • prior survey year Q1, Q2, Q3, and Q4 payroll;
  • current year administrative expenses;
  • prior year administrative expenses;
  • current survey year beginning inventory;
  • current survey year ending inventory;
  • prior survey year beginning inventory; and
  • prior survey year ending inventory.
BR data are used to compare reported data for the company to administrative receipts. Administrative data sometimes used in place of items that are missing/imputed for the company.
Analysts use annual reports (10-Ks) when available for some items (e.g., sales, inventory) for nonresponding companies.
Monthly retail data are used for some data items (sales, e-commerce, or inventories) for nonresponding companies.
Annual Wholesale Trade Survey (AWTS) Same as ARTS Same as ARTS
Suggested Citation:"6 Editing, Imputation, Disclosure Control, and Quality Standards." National Academies of Sciences, Engineering, and Medicine. 2018. Reengineering the Census Bureau's Annual Economic Surveys. Washington, DC: The National Academies Press. doi: 10.17226/25098.
×
Services Service Annual Survey (SAS) Same as ARTS Administrative data updates from BR are used to validate reported data for the company (e.g., payroll, revenue, expenses). Administrative data are sometimes used to impute data for missing items.
Annual reports (10-Ks) are used for some items.
Data from the Quarterly Services Survey (QSS) are used for some items.
Multiple Annual Capital Expenditures Survey (ACES) Items imputed in two ways: balance complex (values adjusted so that all or a subset of the items add to a corresponding total); and free form (value adjusted for a single item). None
Information and Communication Technology Survey (ICTS) Same as ACES None
Demographic Annual Survey of Entrepreneurs (ASE) Hot-deck item imputation of business and owner characteristics: gender/ethnicity/race/veteran status of the owners. In non-census years, payroll and employment data are from County Business Patterns (CBP) and BR for cases not available in CBP. Because receipts are not available in CBP and deemed unreliable in the BR, they are modeled for ASE published tables. The model multiplies the payroll obtained from CBP or BR and a receipts-to-payroll ratio calculated using data from prior economic censuses.
Suggested Citation:"6 Editing, Imputation, Disclosure Control, and Quality Standards." National Academies of Sciences, Engineering, and Medicine. 2018. Reengineering the Census Bureau's Annual Economic Surveys. Washington, DC: The National Academies Press. doi: 10.17226/25098.
×
Sector Survey Method of Imputation for Missing Items Use of Administrative Data
Frame Business and Professional Classification Survey (SQ-CLASS) No item imputation; editing only NAICS code initially from an administrative source. No estimates are produced if sales not reported; administrative payroll used to generate a measure of size.
Company Organization Survey (COS) Item imputation during closeout for all employment and payroll items in a nonresponding unit None
Business Register Missing items imputed in various ways:
  • substitution of administrative data for current year;
  • substitution of the sum of subannual (monthly, quarterly) data for current year;
  • multiplication of unit’s prior period value by a ratio of current-to-prior period data, where the data item used in the ratio is assumed to be highly correlated with the data item of interest (e.g., inventory-to-sales);
  • imputation of annual revenue from a regression model;
  • multiplication of a unit’s prior period value by a “ratio of identicals”; and
  • ranking of detailed items (reported or imputed) to their total.
Not applicable

NOTES: BR, Business Register; NAICS, North American Industry Classification System.

SOURCE: Information provided to the panel by the Census Bureau.

Suggested Citation:"6 Editing, Imputation, Disclosure Control, and Quality Standards." National Academies of Sciences, Engineering, and Medicine. 2018. Reengineering the Census Bureau's Annual Economic Surveys. Washington, DC: The National Academies Press. doi: 10.17226/25098.
×
  • fill in with administrative data;
  • impute using a regression model (simple ratio or more elaborate model);
  • derive from other current period reported items;
  • derive from prior period items; and
  • derive totals of missing quantitative variables by summing values for detailed categories.

An inventory of editing and imputation methods conducted in 2008 by the Economic Directorate found that 21 surveys use administrative data as a direct substitution for missing survey data (Ozcoskun and Hayes, 2009). For example, ASM uses administrative data to impute for missing payroll values and derives totals from the sum of values for categories that make up the sum. Some surveys, including ARTS, AWTS, and SAS, impute missing values for some items by multiplying a prior period’s value by a ratio of current-to-prior period data for a closely related item; for example, current period inventories may be imputed by multiplying last period’s inventories by the ratio of current period sales to prior period sales.

6.2.1 Use of Administrative Data for Imputation

Many, but not all, of the annual economic surveys make use of administrative data to impute values for missing items (see Section 4.1, “Data Sources,” and Table 4-1 in Chapter 4). For example, analysts who work on the Manufacturers’ Unfilled Orders Survey (M3UFO) use annual 10-K forms required by the U.S. Securities and Exchange Commission that give a comprehensive summary of publicly listed companies’ financial performance to impute values for some items in that survey. ARTS, AWTS, and SAS also use 10-K forms, while ASM does not.

Many of the annual surveys use administrative data from the Business Register for imputation. For example, ASM uses payroll data from the Business Register when payroll is not reported in the survey. The register payroll data come from several sources, including the IRS, the economic censuses, the Company Organization Survey (COS), ASM, and, for select multi-unit locations, the Bureau of Labor Statistics’ Quarterly Census of Employment and Wages. ARTS, AWTS, and SAS make substantial use of administrative data from the register to impute for missing items or as a source of primary data for items not collected in those surveys.

In other cases, administrative data are used only as edit checks on the reasonableness of data that are reported. M3UFO, for example, uses receipts from the Business Register to check reported receipts. ARTS, AWTS, and SAS also use register data as checks on reported data. ACES and the Information and Communication Technology Survey (ICTS) use Business

Suggested Citation:"6 Editing, Imputation, Disclosure Control, and Quality Standards." National Academies of Sciences, Engineering, and Medicine. 2018. Reengineering the Census Bureau's Annual Economic Surveys. Washington, DC: The National Academies Press. doi: 10.17226/25098.
×

Register payroll data to make adjustments for unit nonresponse. This use is equivalent to unit-level imputation in which the weights of responding units are increased to implicitly impute for nonresponding units. Finally, ASE uses only economic census data for imputation; it does not use any of the information in the Business Register.

It would improve consistency across the surveys and will be important for an ABSS for the Census Bureau to harmonize the uses of administrative data for missing and inconsistent responses across the annual economic surveys. It would also be useful to investigate the feasibility and reliability of using administrative data to replace survey questions, as the panel recommends (see Recommendation 4-1, in Chapter 4).

RECOMMENDATION 6-2: The Census Bureau should harmonize the use of administrative data from the Bureau’s Business Register and other sources across the annual economic surveys for the purposes of imputing values for missing responses and crosschecking the reliability of reported data.

6.2.2 Use of Models in Item Imputation

The extent of the use of models and the type of model used to impute for missing values varies across the surveys. ASM, ARTS, AWTS, SAS, the Management and Organizational Practices Survey (MOPS), and the Business Register itself impute some totals and details when applicable, using regression or ratio models based on current-year data or models using both current-year and prior-year data.

The ratio and regression imputations each have an underlying structural model under which the imputations are unbiased. A ratio imputation implies that the model between the variable being imputed and the assisting variable is a straight line through the origin. For example, the model for the unfilled orders ratio imputation is that unfilled orders for the current period can be predicted by a through-the-origin regression on the total value of shipments. The imputation models may or may not be producing unbiased predictions of missing values, but studies can be devised to evaluate this question. Basic model-fitting or simulation studies can be done in which missing data are artificially created and imputations made using the current techniques. The mechanisms that are generating missing data also can be estimated from the current data and used to generate missing data patterns for a simulation study.

Studying alternative model-based ways of imputing for missing items would also be worthwhile. Alternatives may produce more nearly unbiased estimates and may better reflect the underlying variability in the population. For example, White, Reiter, and Petrin (2016) demonstrate that when

Suggested Citation:"6 Editing, Imputation, Disclosure Control, and Quality Standards." National Academies of Sciences, Engineering, and Medicine. 2018. Reengineering the Census Bureau's Annual Economic Surveys. Washington, DC: The National Academies Press. doi: 10.17226/25098.
×

means were used to impute for missing items in the 2002 and 2007 Census of Manufactures, which is a very simple form of imputation, the dispersion among imputed values was substantially less than for the nonimputed data. They recommend an imputation method based on classification and regression trees that, in their study, produced a more realistically dispersed set of values. Other stochastic methods that retain reported levels of dispersion are available, such as adding a random residual to the modeled expected value. Analysts who use the data would want item imputations to be carried out in a way that preserves multivariate relationships, which are important for research purposes.

The best that can be hoped for any imputation procedure is that, when combined with reported data, the imputation method yields approximately unbiased estimates for aggregate statistics, like means, totals, variances, and model parameter estimates, rather than unbiased predictions for individual establishments or companies. In addition, imputation procedures should, to the extent possible, preserve bivariate and multivariate relationships. Evaluation studies can be designed with that in mind.

RECOMMENDATION 6-3: The Census Bureau should systematically evaluate all item imputation models used for missing data in the annual economic surveys to determine whether improvements can be made in either the form of the model or the covariates used for prediction. Methods of imputation should be standardized across the surveys to the extent possible, looking toward an integrated Annual Business Survey System.

The Census Bureau has already carried out some standardization. For example, imputation methods and processes in three of the annual economic surveys—ARTS, AWTS, and SAS—are coordinated to ensure consistency where possible.

6.2.3 Imputing for Large Units that Do Not Respond

In some cases, large units are nonrespondents, do not respond in a timely way, or fail to provide responses for some survey items. Such large units may require special imputation procedures based on administrative data or past data for the particular unit, perhaps in combination with statistical procedures, to index prior data forward to the survey time period. Using models or other methods that are reasonable for smaller units may not work well for large units, which tend to be unique in their economic behavior. Account managers for large units (see Recommendation 3-3, in Chapter 3) may be able to provide advice for the best imputation method for the nonresponding units in their areas.

Suggested Citation:"6 Editing, Imputation, Disclosure Control, and Quality Standards." National Academies of Sciences, Engineering, and Medicine. 2018. Reengineering the Census Bureau's Annual Economic Surveys. Washington, DC: The National Academies Press. doi: 10.17226/25098.
×

RECOMMENDATION 6-4: The Census Bureau should systematically evaluate the imputation procedures for large business units in the annual economic surveys to determine the most effective procedures and apply them across the surveys to the extent feasible.

6.2.4 Reflecting the Effects of Imputations on Estimates

Imputed values for items are not real values. Consequently, imputation is a source of error in survey estimates. Fully reflecting this error in estimates of survey precision is difficult because imputations can be a source of bias (which is hard to measure), variance, or both. As noted above, the most practical approach for bias control is to use methods that yield (approximately) unbiased estimates of aggregate statistics.

In contrast, the contribution of item imputation to variability of estimates often can be included in standard error estimates. There are techniques for incorporating the variance due to deterministic imputation methods, such as nearest-neighbor imputation, or methods that involve randomized imputations, such as hot deck (see, e.g., Im, Cho, and Kim, 2017; Kim and Shao, 2013). One method to implement for techniques that use random methods, like the random-draw hot deck used in the Annual Survey of Entrepreneurs (ASE), is multiple imputation (see Little and Rubin, 2002). Because there are instances in which multiple imputation does not appropriately estimate variances (see e.g., the discussion papers accompanying Rubin [1996]), caution is needed if this method is used. Nevertheless, the current procedure of treating item imputations as real data, which inevitably leads to underestimated standard errors, needs to be replaced and research conducted into alternative methods.

RECOMMENDATION 6-5: The Census Bureau should conduct research into and adopt a variance estimation method for the annual economic surveys that will allow the effects of item imputation to be included in published standard error estimates and by data users in their own analyses.

6.2.5 Informing Users

Users of the data from the annual economic surveys should know whether they are dealing with real or missing data, through the use of flags for whether an item is imputed and what method was used for imputation. Documentation needs to include statistics for the aggregate amount of imputation for each variable in a format that is readily interpretable by users.

To further assist users and facilitate the development of an ABSS, the Census Bureau needs to strive for consistent terminology for types of im-

Suggested Citation:"6 Editing, Imputation, Disclosure Control, and Quality Standards." National Academies of Sciences, Engineering, and Medicine. 2018. Reengineering the Census Bureau's Annual Economic Surveys. Washington, DC: The National Academies Press. doi: 10.17226/25098.
×

putation. At present, different annual surveys use different terminology to refer to the same techniques. For example, several surveys apply “balancing” in the imputation process, meaning that they force imputations for detailed, quantitative items to sum to reported, higher-level aggregates. Balancing also is applied as a type of edit. The ARTS and ACES use the term “balance complex” as the label for this forced consistency. The ASM also requires this type of balancing between detailed items and aggregates, but does not use the “balance complex” term. ARTS and ACES use the term “free form” to mean any method, like simple hot-deck imputation, that operates on one item at a time. The ASE’s hot-deck imputation is an example of free form, although that term does not appear to be used by ASE.

Adopting common terminology for types of imputation across the surveys will facilitate communication with users and among managerial and technical staff. Making it clearer when surveys are using the same or similar methods can also be expected to assist in developing best practices that span all surveys and can eventually be used in an ABSS.

RECOMMENDATION 6-6: The Census Bureau should use common terminology across the annual economic surveys to describe imputation methods. These terms should be included in the glossary of terms related to sample design features, sampling units, and estimation methods proposed in Recommendation 5-1 (in Chapter 5). Microdata from the surveys should include flags to denote whether and what type of imputation was used, and documentation for users should provide statistics on the extent of imputation.

6.3 DISCLOSURE CONTROL

Title 13 of the U.S. Code not only provides authority for the Census Bureau’s work; it also requires the Bureau to fully protect the confidentiality of information collected from individuals and businesses.1 More than 20 years ago, the Census Bureau developed a cell suppression procedure to protect tabular data from direct or inferential disclosure (Cox, 1995). The Census Bureau did not publish data in a table for a cell in which there were fewer than three respondents: publishing a cell with a single respondent clearly would be unacceptable, and publishing a cell with just two respondents also would be problematic, as either respondent could then deduce the value reported by the other. A cell with fewer than three respondents was defined as a suppressed cell. In addition to the information in the individual cells, published tables also have marginal totals that aggregate across

___________________

1 See https://www.gpo.gov/fdsys/pkg/USCODE-2009-title13/html/USCODE-2009-title13.htm [November 2017]; Subchapter 1, Section 9, covers confidentiality protection.

Suggested Citation:"6 Editing, Imputation, Disclosure Control, and Quality Standards." National Academies of Sciences, Engineering, and Medicine. 2018. Reengineering the Census Bureau's Annual Economic Surveys. Washington, DC: The National Academies Press. doi: 10.17226/25098.
×

all cells. If only one cell in a table is suppressed, its total can be deduced from values in the other cells and the marginal totals. To avoid revealing the value of the suppressed cell, a secondary cell also must be suppressed. The Bureau developed an optimization program that minimized the impact of the cell suppressions within a table and across the tables for a given survey.

Currently, the Census Bureau’s annual economic surveys use a cell suppression procedure for the tabular data that are published, with the exception of the two surveys that are used to update the Business Register—COS and the Business and Professional Classification Survey. These surveys do not publish tables or other data products.

The cell suppression optimization program has been refined over the years to minimize the percentage of cells that are suppressed. The panel does not know the specific criteria used by each survey for cell suppression to control disclosure, as releasing the specific criteria might supply information that would reduce the protection provided to respondents’ data. We observe, however, that the percentage of cells suppressed varies considerably across surveys. The 2015 ASM, for example, suppressed more than 17 percent of its published table cells for disclosure, while the percentage for other surveys ranged from 0 to 4 percent. Although these large differences could reflect a variety of factors, including differences in the distribution of respondents across publication cells, there is no obvious reason for suppression criteria to differ across surveys.

Noise infusion is another method of disclosure control used by the Bureau. Noise infusion, or perturbation of some values for responding units, can help protect the identities of individual units and may reduce the percentage of cells that need to be suppressed for disclosure reasons. ASE, for example, adds noise to receipts, payroll, and employment. County Business Patterns, which tabulates and publishes data from the Business Register, uses noise infusion by applying a random noise multiplier to employment, first quarter payroll, and annual payroll for each establishment.

Noise infusion does have limitations in applications involving skewed data. In business surveys, the survey cases have a skewed size distribution, and most data items requested from the respondents also will have a skewed distribution. Noise infusion is not likely to be particularly effective for estimates based on such items. Still, the Census Bureau may wish to review the uses of noise infusion that it has made in the surveys noted above to determine whether broader application would be beneficial across the annual economic surveys.

Another initiative under way at the Census Bureau is a project to develop synthetic industry-level microdata from the economic censuses for a subset of industries. Synthetic data are data that have been altered to reproduce specified relationships in the underlying microdata in a way that both more fully protects confidentiality and provides more information in

Suggested Citation:"6 Editing, Imputation, Disclosure Control, and Quality Standards." National Academies of Sciences, Engineering, and Medicine. 2018. Reengineering the Census Bureau's Annual Economic Surveys. Washington, DC: The National Academies Press. doi: 10.17226/25098.
×

comparison with traditional disclosure protection methods (see Kinney et al., 2011). By experimentation with synthetic files, researchers can decide whether it is worth their time and effort to pursue their analysis with the underlying microdata (see Chapter 7). Researchers may even be able to use the synthetic files themselves for some kinds of analysis. If this work is successful, a next step would be for the Census Bureau to determine the applicability and usefulness of developing and making available synthetic files for the annual economic surveys.

Currently, the Census Bureau does have important initiatives under way for improving and harmonizing its disclosure control procedures across a variety of programs. Staff in the Bureau’s Center for Disclosure Avoidance Research (CDAR), along with university researchers, are working on implementing state-of-the-art methods that embody formal measures of privacy loss for protection of Business Dynamic Statistics and 2020 decennial census data products. In addition, research is being conducted on new and improved formal disclosure protection methods for the American Community Survey; this research includes both formal privacy and model-based synthetic data methods.

The Economic Directorate is currently partnering with CDAR on a study to create synthetic data for a subset of data products from the economic censuses. This study is in the early stages, so most data products from the 2017 economic censuses will use cell suppression for disclosure protection. The suppressions will be determined by software that was rewritten and applied in the 2012 economic censuses and is currently used in ASM, ACES, and several programs for other agencies. Many improvements in the software, which handles large, complex tables, have been implemented since the 2012 economic censuses to reduce computational time and optimize cell suppression patterns.

The Census Bureau is actively working on migrating disclosure work for SAS to the cell suppression software, and has plans to migrate ARTS and AWTS. The long-term goal is to apply formal privacy protection methods to all of the Bureau’s economic programs, but considerable research is needed to fully realize this plan. These are important initiatives, and the panel commends the Census Bureau’s efforts to improve disclosure control procedures.

RECOMMENDATION 6-7: The Census Bureau should continue the work it has begun on improving and harmonizing disclosure control procedures for the annual economic surveys. As part of this work, it should investigate why rates of cell suppression vary across the surveys and whether there are ways to standardize the cell suppression rules if different rules are part of the explanation.

Suggested Citation:"6 Editing, Imputation, Disclosure Control, and Quality Standards." National Academies of Sciences, Engineering, and Medicine. 2018. Reengineering the Census Bureau's Annual Economic Surveys. Washington, DC: The National Academies Press. doi: 10.17226/25098.
×

6.4 QUALITY STANDARDS

The data tables released from the annual economic surveys are often quite detailed, with classifications by industry, geography, and product type. Because of the detail being provided, the sampling variability or relative standard error for an individual data item may be such that the reliability of the estimate for the item is questionable. The panel observed four different approaches across the annual economic surveys for handling data items that may be unreliable:

  1. annotating the cell with an “S” if 40% < coefficient of variation (CV) < 100% and “A” if CV > 100% (ASM);2
  2. suppressing the cell if total quantity response rate (TQRR)3 < 50% or CV > 30% (SAS, ARTS, AWTS);
  3. no procedure for denoting quality of an estimate (ACES, ICTS); and
  4. suppressing the cell if the CV exceeds a high value (ASE).

These different procedures resulted in 7.3 percent of the cells in the most detailed SAS table for 2014 being suppressed for quality, and up to 4 percent in tables from other economic surveys. The Census Bureau needs to consider whether a consistent procedure for quality suppression is appropriate across all of the annual economic surveys.

The annual economic surveys use cell suppression for two purposes: (1) to reduce the risk of releasing confidential information from reporting units, as discussed above; and (2) to indicate that the variability of an estimate is such that the estimate is unreliable. Just as the panel finds no compelling reason to have different disclosure criteria for cell suppression across the annual economic surveys, we also find no compelling reason to have different criteria for suppression of an estimate due to quality concerns.

In cases in which estimates can be published based on disclosure standards, the precision of some estimates may still be poor. In such cases, users may prefer to have the point estimate and its standard error or CV so that they can make their own decision about whether to use the estimate for analyses. Such estimates could be footnoted as “unreliable” but still published. Regardless of the procedure adopted, an ABSS will need to have

___________________

2 The CV is the standard error as a percentage of the estimate: CVs of 10–12 percent or less are considered reliable (see National Research Council, 2007, p. 64), although the Census Bureau does not use that number as a threshold for release.

3 The TQRR is defined as the percentage of the weighted estimated total of a key data item that is either reported by a survey unit or acquired from some other information source (such as a publicly available quarterly or annual report) determined to be equivalent in quality to reported data and not imputed.

Suggested Citation:"6 Editing, Imputation, Disclosure Control, and Quality Standards." National Academies of Sciences, Engineering, and Medicine. 2018. Reengineering the Census Bureau's Annual Economic Surveys. Washington, DC: The National Academies Press. doi: 10.17226/25098.
×

consistent standards that adhere to the Bureau’s own published quality standards (U.S. Census Bureau, 2013).

RECOMMENDATION 6-8: The Census Bureau should use consistent procedures to denote the quality (or lack thereof) of estimates in tables developed from the annual economic surveys.

6.5 REFERENCES

NOTE: All URL addresses were active as of November 2017.

Cox, L.H. (1995). Network models for complementary cell suppression. Journal of the American Statistical Association, 90(432), 1453–1462.

Hidiroglou, M.A., and Berthelot, J.-M. (1986). Statistical editing and imputation for periodic business surveys. Survey Methodology, 12(1), 73–83.

Im, J., Cho, I., and Kim, J. (2017). FHDI: Fractional Hot Deck and Fully Efficient Fractional Imputation. R Package Version 1.0. Available: https://CRAN.R-project.org/package=FHDI.

Kim, J., and Shao, J. (2013). Statistical Methods for Handling Incomplete Data. Boca Raton, FL: CRC Press.

Kinney, S., Reiter, J., Reznek, A.P., Miranda, J., Jarmin, R.S., and Abowd, J.M. (2011). Towards unrestricted public use business microdata: The synthetic Longitudinal Business Database. International Statistical Review, 79(3), 362–384, 2011 doi: https://doi.org/10.1111/j.1751-5823.2011.00153.x.

Little, R.J.A., and Rubin, D.B. (2002). Statistical Analysis with Missing Data. New York: John Wiley & Sons.

National Research Council. (2007). Using the American Community Survey: Benefits and Challenges. Panel on the Functionality and Usability of Data from the American Community Survey. C.F. Citro and G. Kalton (Eds.). Committee on National Statistics, Division of Behavioral and Social Sciences and Education. Washington, DC: The National Academies Press. doi: https://doi.org/10.17226/11901.

Nguyen, J.L., Diamond, L.K., Newman, B., Dumbacher, B., Hill, G., Dalzell, J., and Hogue, C. (2017). Economic Editing Reduction Research Report. Version 1.1. Washington, DC: U.S. Census Bureau.

Ozcoskun, L., and Hayes, M. (2009). The Economic Directorate’s Editing and Imputation Inventory. Washington, DC: U.S. Census Bureau.

Rubin, D.B. (1996). Multiple imputation after 18+ years. Journal of the American Statistical Association, 91(434), 473–489.

Smith, P. (2013). Sampling and estimation for business surveys. In G. Snijkers, G. Haraldsen, J. Jones, and D.K. Willimack (Eds.), Designing and Conducting Business Surveys (Ch. 5). Hoboken, NJ: John Wiley & Sons.

Thompson, K.J. (2007). Investigation of macro editing techniques for outlier detection in survey data. In Proceedings of the Third International Conference on Establishment Surveys (ICESIII) (pp. 1186–1193). Available: https://ww2.amstat.org/meetings/ices/2007/proceedings/ICES2007-000071.PDF.

U.N. Statistical Commission and Economic Commission for Europe. (1997). Data Editing Methods and Techniques (Volume 2). Available: https://webgate.ec.europa.eu/fpfis/mwikis/essvalidserv/images/3/3f/SDEVolume2.pdf.

Suggested Citation:"6 Editing, Imputation, Disclosure Control, and Quality Standards." National Academies of Sciences, Engineering, and Medicine. 2018. Reengineering the Census Bureau's Annual Economic Surveys. Washington, DC: The National Academies Press. doi: 10.17226/25098.
×

U.S. Census Bureau. (2013). U.S. Census Bureau Statistical Quality Standards. Washington, DC: Author. Available: https://www.census.gov/content/dam/Census/about/about-the-bureau/policies_and_notices/quality/statistical-quality-standards/Quality_Standards.pdf.

White, T.K., Reiter, J.P., and Petrin, A. (2016). Imputation in U.S. Manufacturing Data and Its Implications for Productivity Dispersion. Working Paper No. 22569. Cambridge, MA: National Bureau of Economic Research. Available http://www.nber.org/papers/w22569.pdf.

Suggested Citation:"6 Editing, Imputation, Disclosure Control, and Quality Standards." National Academies of Sciences, Engineering, and Medicine. 2018. Reengineering the Census Bureau's Annual Economic Surveys. Washington, DC: The National Academies Press. doi: 10.17226/25098.
×
Page 119
Suggested Citation:"6 Editing, Imputation, Disclosure Control, and Quality Standards." National Academies of Sciences, Engineering, and Medicine. 2018. Reengineering the Census Bureau's Annual Economic Surveys. Washington, DC: The National Academies Press. doi: 10.17226/25098.
×
Page 120
Suggested Citation:"6 Editing, Imputation, Disclosure Control, and Quality Standards." National Academies of Sciences, Engineering, and Medicine. 2018. Reengineering the Census Bureau's Annual Economic Surveys. Washington, DC: The National Academies Press. doi: 10.17226/25098.
×
Page 121
Suggested Citation:"6 Editing, Imputation, Disclosure Control, and Quality Standards." National Academies of Sciences, Engineering, and Medicine. 2018. Reengineering the Census Bureau's Annual Economic Surveys. Washington, DC: The National Academies Press. doi: 10.17226/25098.
×
Page 122
Suggested Citation:"6 Editing, Imputation, Disclosure Control, and Quality Standards." National Academies of Sciences, Engineering, and Medicine. 2018. Reengineering the Census Bureau's Annual Economic Surveys. Washington, DC: The National Academies Press. doi: 10.17226/25098.
×
Page 123
Suggested Citation:"6 Editing, Imputation, Disclosure Control, and Quality Standards." National Academies of Sciences, Engineering, and Medicine. 2018. Reengineering the Census Bureau's Annual Economic Surveys. Washington, DC: The National Academies Press. doi: 10.17226/25098.
×
Page 124
Suggested Citation:"6 Editing, Imputation, Disclosure Control, and Quality Standards." National Academies of Sciences, Engineering, and Medicine. 2018. Reengineering the Census Bureau's Annual Economic Surveys. Washington, DC: The National Academies Press. doi: 10.17226/25098.
×
Page 125
Suggested Citation:"6 Editing, Imputation, Disclosure Control, and Quality Standards." National Academies of Sciences, Engineering, and Medicine. 2018. Reengineering the Census Bureau's Annual Economic Surveys. Washington, DC: The National Academies Press. doi: 10.17226/25098.
×
Page 126
Suggested Citation:"6 Editing, Imputation, Disclosure Control, and Quality Standards." National Academies of Sciences, Engineering, and Medicine. 2018. Reengineering the Census Bureau's Annual Economic Surveys. Washington, DC: The National Academies Press. doi: 10.17226/25098.
×
Page 127
Suggested Citation:"6 Editing, Imputation, Disclosure Control, and Quality Standards." National Academies of Sciences, Engineering, and Medicine. 2018. Reengineering the Census Bureau's Annual Economic Surveys. Washington, DC: The National Academies Press. doi: 10.17226/25098.
×
Page 128
Suggested Citation:"6 Editing, Imputation, Disclosure Control, and Quality Standards." National Academies of Sciences, Engineering, and Medicine. 2018. Reengineering the Census Bureau's Annual Economic Surveys. Washington, DC: The National Academies Press. doi: 10.17226/25098.
×
Page 129
Suggested Citation:"6 Editing, Imputation, Disclosure Control, and Quality Standards." National Academies of Sciences, Engineering, and Medicine. 2018. Reengineering the Census Bureau's Annual Economic Surveys. Washington, DC: The National Academies Press. doi: 10.17226/25098.
×
Page 130
Suggested Citation:"6 Editing, Imputation, Disclosure Control, and Quality Standards." National Academies of Sciences, Engineering, and Medicine. 2018. Reengineering the Census Bureau's Annual Economic Surveys. Washington, DC: The National Academies Press. doi: 10.17226/25098.
×
Page 131
Suggested Citation:"6 Editing, Imputation, Disclosure Control, and Quality Standards." National Academies of Sciences, Engineering, and Medicine. 2018. Reengineering the Census Bureau's Annual Economic Surveys. Washington, DC: The National Academies Press. doi: 10.17226/25098.
×
Page 132
Suggested Citation:"6 Editing, Imputation, Disclosure Control, and Quality Standards." National Academies of Sciences, Engineering, and Medicine. 2018. Reengineering the Census Bureau's Annual Economic Surveys. Washington, DC: The National Academies Press. doi: 10.17226/25098.
×
Page 133
Suggested Citation:"6 Editing, Imputation, Disclosure Control, and Quality Standards." National Academies of Sciences, Engineering, and Medicine. 2018. Reengineering the Census Bureau's Annual Economic Surveys. Washington, DC: The National Academies Press. doi: 10.17226/25098.
×
Page 134
Suggested Citation:"6 Editing, Imputation, Disclosure Control, and Quality Standards." National Academies of Sciences, Engineering, and Medicine. 2018. Reengineering the Census Bureau's Annual Economic Surveys. Washington, DC: The National Academies Press. doi: 10.17226/25098.
×
Page 135
Suggested Citation:"6 Editing, Imputation, Disclosure Control, and Quality Standards." National Academies of Sciences, Engineering, and Medicine. 2018. Reengineering the Census Bureau's Annual Economic Surveys. Washington, DC: The National Academies Press. doi: 10.17226/25098.
×
Page 136
Next: 7 Dissemination »
Reengineering the Census Bureau's Annual Economic Surveys Get This Book
×
 Reengineering the Census Bureau's Annual Economic Surveys
Buy Paperback | $60.00 Buy Ebook | $48.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

The U.S. Census Bureau maintains an important portfolio of economic statistics programs, including quinquennial economic censuses, annual economic surveys, and quarterly and monthly indicator surveys. Government, corporate, and academic users rely on the data to understand the complexity and dynamism of the U.S. economy. Historically, the Bureau's economic statistics programs developed sector by sector (e.g., separate surveys of manufacturing, retail trade, and wholesale trade), and they continue to operate largely independently. Consequently, inconsistencies in questionnaire content, sample and survey design, and survey operations make the data not only more difficult to use, but also more costly to collect and process and more burdensome to the business community than they could be.

This report reviews the Census Bureau's annual economic surveys. Specifically, it examines the design, operations, and products of 11 surveys and makes recommendations to enable them to better answer questions about the evolving economy.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!