Skip to main content

Currently Skimming:

9 Combining Data Sources for National Statistics: Next Steps
Pages 187-200

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 187...
... tax data and transfer program benefits data has provided valuable insights about the accuracy of survey responses and alternative perspectives on key measures such as poverty and income distribution. Linking health survey records with the National Death Index has allowed researchers to evaluate mortality risks associated with health conditions.
From page 188...
... The other major source, the Uniform Crime Reporting Program, compiles crime statistics from data submitted by states and individual law enforcement agencies. This program faces challenges similar to those of other programs that compile administrative records from states: missing data from states and agencies that do not make submissions, the need to assess and improve quality of data that are supplied, and the need to resolve measurement differences among data suppliers.
From page 189...
... Challenges of using privatesector data for official statistics are greater than the challenges of using government-collected administrative records, in part because of the limited history of public-private data cooperation. However, private-sector data such as those collected through precision agriculture programs or private health insurance companies could potentially improve federal statistics and create new data resources for social and economic research -- if these data can be shown to be reliably available, accurate, and cost-effective sources of information.
From page 190...
... Linkage error can result, for example, in appending the wrong person's race to a data record, or in coding a person as living when in fact that person is in the National Death Index but the link was missed. Harmonization error, which arises when sources have different units or definitions for data elements (e.g., pixels in satellite data might not match up with farms or fields in another dataset; data sources might report information for nonsynchronous time periods, or sources may use different definitions for seemingly identical concepts)
From page 191...
... . As work on combining data sources progresses, it is important to continue to invest in improving the individual data sources -- probability surveys, administrative records, and other data -- that feed into a new data infrastructure.
From page 192...
... Transparency and Documentation CNSTAT's Principles and Practices for a Federal Statistical Agency emphasized the importance of transparency and documentation of data products: Federal statistical agencies must have credibility with those who use their data and information. The value of a statistical agency rests funda­mentally on the accuracy and credibility of its data products.
From page 193...
... Data Equity The use of multiple data sources to advance data equity is a major theme of this report. As Leary (2022)
From page 194...
... Record linkage can add variables needed for producing statistics that are disaggregated by race, ethnicity, or other characteristics measured in a linked data source. While combining data sources can enhance knowledge about subpopulations, there is also the potential that combining data will increase bias.
From page 195...
... . Improving data equity across the federal statistical system will be challenging and will require a broad-based approach that integrates perspectives of federal statistical agencies, other data producers, data users, and community members.
From page 196...
... Beyond the technical challenges, there are challenges for promoting data equity and public trust in data, and these areas require additional resources and expertise. Statistical agencies will need investments in personnel, training, and computer infrastructure to take advantage of new data resources.
From page 197...
... It is important to examine the representativeness and cover age of combined data sources to ensure data equity. CONCLUSION 3-2: Record linkage can merge information from separate data sources and add variables that are needed to produce disaggregated statistics.
From page 198...
... However, combined data sources do not necessarily have either full population coverage for generating national statistics or sufficient sample sizes to investigate differences among population subgroups. CONCLUSION 4-3: The National Vital Statistics System can serve as a model for assembling state-administered data programs into coordinated, standardized national databases of administrative records that can be linked to other data sources.
From page 199...
... But the transition to NIBRS is still under way and variations in measurement and data reporting across jurisdictions need further study. CONCLUSION 7-2: Improving crime statistics will require coordination of the National Crime Victimization Survey and Uniform Crime Reporting Program with new data sources that can provide timely and detailed information about crimes, including those measured in the current classification systems and those that are currently unmeasured.
From page 200...
... were the work of pioneers, and that one component of CNSTAT's vision of a redesigned national data infrastructure concerns how to "make this type of work, now done by pioneers, routine rather than cutting-edge." Today's data world contains amounts of digital information that were inconceivable when the theory of probability sampling was developed in the 1930s. Arora (2022b, p.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.