Read "Improving Data to Analyze Food and Nutrition Policies" at NAP.edu

Page 53 Cite

Suggested Citation:"3 Proprietary Data Sources." National Research Council. 2005. Improving Data to Analyze Food and Nutrition Policies. Washington, DC: The National Academies Press. doi: 10.17226/11428.

×

3
Proprietary Data Sources

A number of rich datasets on food purchases and consumption are produced outside the federal government’s data collection efforts. These datasets comprise information collected by private market research firms in order to analyze food and related markets. Two types of data are typically collected: scanner data, which record sales of food purchased at stores or food used by consumers in their homes; and surveys of households that collect information on what is consumed by the household, either through direct questions about food consumption or through food diaries (see Table 1-1 in Chapter 1). These datasets contain an enormous amount of information about food purchases and consumption, including prices paid by consumers for food. Because the analyses conducted using these datasets are for firms interested in understanding the latest market trends, the data are usually available within a couple of weeks of collection. This chapter briefly describes some of these datasets and their key attributes, potential uses, and limitations for food and nutrition policy planning and research by the U.S. Department of Agriculture (USDA) and other agencies.

SCANNER DATA

Scanner data come from two types of data collections: (1) point-of-sale (retail) collections, which use the universal product code (UPC) of products sold at retail checkout counters to identify products and quantities

Page 54 Cite

Suggested Citation:"3 Proprietary Data Sources." National Research Council. 2005. Improving Data to Analyze Food and Nutrition Policies. Washington, DC: The National Academies Press. doi: 10.17226/11428.

×

sold and their prices; and (2) household scanner panels, which are usually random samples of households in which household members are asked to scan in the UPC of the items they have purchased, using scanners provided to them (see Box 3-1 for a summary of the data content of household scanner panels).

ACNielsen (formerly, A.C. Nielsen Company) and Information Resources, Inc. (IRI), are the two major producers of these types of datasets. For point-of-sale data, ACNielsen and IRI purchase price and item data from the scanner systems of cooperating retail outlets (the ACNielsen data collection is called Scantrack Services; the IRI collection is called Custom Store Tracking). Supermarket scanner data do not include fruits and vegetables, some prepared foods, and other products that lack UPC codes. They also do not cover restaurants or other food outlets.

Household scanner panel data are generated by randomly selected households, in which a household member scans in the household’s food purchases from all types of stores over a week’s time. As currently designed, these data provide limited demographic characteristics. Information collected on products with a UPC includes price, quantity, and promotional information. For items that lack a UPC, such as meat and fresh produce, participants are asked to identify the type of item and its weight. Both ACNielsen and IRI conduct these types of panel surveys for nationally representative samples of more than 61,500 and 50,000 households, respectively (the ACNielsen data collection is called the HOMESCAN Consumer Panel; the IRI collection is called the Combined Outlet Consumer Panel).¹

Researchers inside or outside the government must purchase scanner data, although the cost need not be high, depending on the amount of data required. A study by the Food and Nutrition Service estimated the cost of 2 months of scanner data collection for a supermarket chain to be $35,000. To consider this figure in context, the study indicated that the National Survey of Food Stamp Program Participants in 1996 cost $1.7 million. Purchase of the necessary scanner data for applications that required many months or years of observations for many outlets could, of course, entail substantial costs.

¹	Only one-quarter of households in the HOMESCAN Consumer Panel are asked to record items that lack a UPC code.

Page 55 Cite

Suggested Citation:"3 Proprietary Data Sources." National Research Council. 2005. Improving Data to Analyze Food and Nutrition Policies. Washington, DC: The National Academies Press. doi: 10.17226/11428.

×

BOX 3-1
Household Panel Scanner Data

What?

Who?

Where?

• Price Paid

• Quantity Purchased

• Purchase Date

• Product Category

• Brand

• Size

• Universal Product Code (UPC)

• UPC Description

• Coupon Information

• Product Attributes

—Flavor

—Form

—Fat Content

—Sodium

—Cholesterol

—Organic

—Container Type

• Store Name Identifier

• Channel Type Identifier

• Household Size

• Household Income

• Female and Male Head Characteristics

—Age

—Education

—Employment

—Occupation

—Marital Status

—Race

• Household Composition

• Presence and Age of Children

• Local Market Identifier

• Region

• Projection Factor (Weight)

• Grocery Store

—Kroger, Safeway, etc.

• Drugstore

—CVS, Walgreens, etc.

• Mass Merchandiser

—Target, Value City, Wal-Mart, etc.

• Supercenter

—Big K, Super Target, Wal-Mart Supercenter

• Warehouse Club

—Costco, Sam’s Club, etc.

• Convenience, Gas

• Other

—Dollar Store

—Farmers’ Market

—Online Purchase

—Etc.

Page 56 Cite

Suggested Citation:"3 Proprietary Data Sources." National Research Council. 2005. Improving Data to Analyze Food and Nutrition Policies. Washington, DC: The National Academies Press. doi: 10.17226/11428.

×

Uses

Scanner data have been used in published economic studies for over a decade to answer a variety of questions about food consumption, food pricing, and the operation of retail food markets. Most applications to date have used retail data; a few have used household data or a combination of the two. Scanner data have been used most often to examine pricing behavior in particular product markets, including the influence of private-label foods on name-brand pricing (Putsis and Cotterill, 2001; Ward et al., 2002), strategic pricing responses in markets supplied by only one or two firms (Vickner and Davies, 2002), and the effect of political pressure on breakfast cereal prices (Cotterill and Franklin, 1999). Scanner data have also been used to measure the value of product attributes (Bonnet and Simioni, 2001; Unnevehr and Gouzou 1998), assess bias in the construction of the Consumer Price Index (Reinsdorf, 1999), analyze seasonality in prices and consumption (Chevalier, Kashyap, and Rossi, 2004; MacDonald, 2000; Thompson and Wilson, 1997), and develop basic estimates of price elasticities for specific food products (Jones, 1997; Maynard and Veeramani, 2003). In studies to estimate price elasticities, income is controlled imperfectly through store location. Scanner data have been used for policy-relevant food and nutrition research, such as studying the effects of mandatory nutrition labeling (Mathios, 1998, 2000) and the redemption activity of food stamp and cash assistance clients in conjunction with the Maryland demonstration project on electronic benefit transfer (Cole, 1997).

Finally, scanner data have been used for general descriptive work to answer such questions as, for example, whether fresh fruits and vegetables are more expensive than processed fruits and vegetables and how much it costs to meet guidelines for daily intake of fruits and vegetables (Reed, Frazao, and Itskowitz, 2003). Thus, scanner data have the potential to address questions related to market sales, price response in markets for very specific products, how pricing relates to product characteristics including specific nutrition characteristics, firm behavior in concentrated processed food product markets, and consumer demand for specific kinds of food products.²

²

A workshop on the uses of scanner data in policy analysis, organized by the Economic Research Service, USDA, and the Farm Foundation, included useful reviews of the advantages and limitations of scanner data (www.farmfoundation.org/projects/documents/ScannerDataWorkshopSummaries2_000.pdf [June 2005]); see also Appendix A in this report.

Page 57 Cite

Suggested Citation:"3 Proprietary Data Sources." National Research Council. 2005. Improving Data to Analyze Food and Nutrition Policies. Washington, DC: The National Academies Press. doi: 10.17226/11428.

×

Scanner datasets contain several valuable attributes that make them attractive for some specific uses. One of those key attributes is detailed information on the product that was purchased, including the brand name of the product, the exact description of the product (for example, for orange juice, whether it is calcium enriched), the quantity of the product, and the price for which the product was purchased, including whether it was on sale or part of a promotion. This linkage of price and detailed quantity and product data for individual household purchases is unique among all the datasets reviewed in this report. Another key attribute of these data is that they are produced in a timely manner, unlike those from federal surveys. Furthermore, the household scanner panel samples are much larger than those for the Consumer Expenditure Survey (CE).

Limitations

Scanner data do have limitations. Coverage is a key issue for point-of-sale scanner datasets. Although most major retailers, including warehouse clubs (Sam’s Club, B.J.’s, Costco), participate in both the ACNielsen and IRI point-of-sale collections, the largest retailer in the nation, Wal-Mart, does not. Some smaller mom-and-pop grocery stores do not participate, either. In addition, as noted above, some items do not have UPC codes, including fresh vegetables and fruits, meats, baked goods, and other prepared foods.

The household scanner panels cover only food and other items purchased in retail stores, not food purchased in restaurants. Moreover, many households in a given week will not have purchased specific products, raising problems for how analyses should deal with infrequent purchases and the frequency with which people shop. Studies that have linked retail and household scanner information have encountered inconsistent data between the two data sources.

In addition, the household scanner surveys place a big burden on respondents. A respondent is asked to scan in all the items purchased after each shopping occasion and report the results to the collecting firm. Households are sent scanners with guidelines or training videos on how to use them. Unlike the CE survey, which is an in-person interview, interviewers do not go through the data collection procedure with the household members, although a telephone helpline is available.

Households in the IRI Combined Outlet Consumer Panel are asked to scan all their purchases from stores. They receive points that can be ex-

Page 58 Cite

Suggested Citation:"3 Proprietary Data Sources." National Research Council. 2005. Improving Data to Analyze Food and Nutrition Policies. Washington, DC: The National Academies Press. doi: 10.17226/11428.

×

changed for prizes, vacations, and in restaurants for every shopping trip for which purchases are scanned, and they can participate in the survey as long as they wish. Participants in ACNielsen’s HOMESCAN panel are asked to transmit data on scanned purchases through a regular telephone line once a week. HOMESCAN panelists also receive points that can be redeemed for prizes for each data transmittal (personal communication, G. Crusafulli, ACNielsen). Typical response rates for the HOMESCAN panel are around 85 percent. IRI does not publicly release response rates for its household scanner panel survey.

Representatives from ACNielsen presented information about the HOMESCAN survey at the panel’s workshop (see Appendix A). They reported that they have trouble recruiting some groups to participate in the survey, specifically, young single adults, people in low-income households, and minorities. Jensen (2003) compared a sample from the HOMESCAN panel to U.S. national averages from the 2000 census and found that the HOMESCAN sample households by comparison had higher incomes, were smaller, were more likely to be married couples, were more likely to be white, and were less likely to be Hispanic (see also Appendix A in this report). Thus, these data may not be useful for specific analyses of underrepresented groups. The IRI Combined Outlet Consumer Panels also tend to overrepresent higher income households in comparison with 2000 census data.

The household scanner panels are not designed to collect much information on the households selected for the sample. Some basic demographic information is collected, but it is not very detailed. No information is collected on health, physical activity, or diet and health knowledge. Although data on employment status, total household income, and vehicle ownership are collected, information about assets, sources of income, and participation in food assistance programs is not collected.

One general limitation of point-of-sale scanner data and household scanner data is that the UPCs do not always clearly identify items. The codes are 10-digit numbers that are intended to be universal guides to products sold. The first five digits for an item are assigned by the centralized Uniform Code Council, and the last five digits are assigned by the corporations that make the product. Guidelines are given to corporations to help them assign the last five digits, but there is evidence that these guidelines are not necessarily followed and that codes change for some products. This phenomenon has implications for the difficulty and accuracy of placing products into specific categories. For example, Mladenic, Eddy, and Ziolko (2001) in an analysis of more than 280,000 UPCs for grocery

Page 59 Cite

Suggested Citation:"3 Proprietary Data Sources." National Research Council. 2005. Improving Data to Analyze Food and Nutrition Policies. Washington, DC: The National Academies Press. doi: 10.17226/11428.

×

products found that 44 different codes were used for fresh 2 percent milk. Thus, any effort to identify specific products consumed would need to work through potentially difficult coding issues.

Future Potential

Despite the limitations of the retail and household panel scanner datasets, many researchers, both in USDA and in academic and private research organizations, have begun to exploit scanner data because they provide such extensive detail on food products, quantities purchased, and prices. None of the food and nutrition-related datasets produced by the federal government can match the scanner datasets on this type of content, nor on the timeliness with which they are produced. Because of these advantages, it is hard to exclude them as potential sources of information for USDA policy and decision making.

Before placing significantly greater reliance on scanner data, however, additional work must be done to examine the characteristics and representativeness of the population covered by the data and other possible sources of bias (see Kirlin and Cole, 1999). If the research on data quality supports the usefulness of scanner data, they could be drawn on to examine a wide range of issues.

Specifically, the retail scanner datasets could be used to examine short-run and long-run aggregate market trends. They could also be used to compare aggregate totals on food purchases with other sources of data on food expenditures—for example, from the CE survey and from the national food disappearance data, which measures the flow of raw and semiprocessed foods through the U.S. marketing system.³ In addition, the retail scanner datasets could be linked with data on club card members to obtain some very basic information on the households that purchase the goods.

The household panel scanner datasets could be used to understand short-run and long-run trends in foods consumed by households and the relationship between price and consumption. The level of detail on products purchased could allow for analysis of consumption trends when new

³

The USDA “national disappearance” estimates, produced annually, provide estimates with a 2-year lag of commodities that are available for food purchase and consumption. They are developed on the basis of production estimates adjusted for inventory changes, exports, imports, and nonfood uses (see www.ers.usda.gov/Data/FoodConsumption/FoodAvailDoc.htm [June 2005]).

Page 60 Cite

Suggested Citation:"3 Proprietary Data Sources." National Research Council. 2005. Improving Data to Analyze Food and Nutrition Policies. Washington, DC: The National Academies Press. doi: 10.17226/11428.

×

products are introduced or when minor changes to products are made. The household scanner data could also be used to understand aggregate changes in purchases in relation to changes in policies on healthy eating, such as changes in the food pyramid guidelines that were announced by the USDA in January 2005, or to food safety recalls.

If information on participation in food assistance programs (for example, food stamps, WIC, school breakfast and lunch) could be added to household scanner data, the augmented datasets could be used to track and compare expenditures of food assistance program recipients and of nonrecipients with similar incomes. With augmented household scanner data it might also be possible to address such questions as why the participation rates among the eligible population in the Food Stamp Program plummeted in the 1990s. Was this phenomenon a by-product of the expanding economy and welfare reform, or was it due to changes in food preparation and consumption behavior or both? Specifically, with the rise of labor force participation rates among women (both single and married) over the past decade, the time that is available to prepare foods for home consumption has declined, and major grocery stores have significantly expanded the quality and quantity of prepared food items. However, one cannot use food stamps to purchase prepared foods. Is part of the low rate of food stamp use a by-product of the fact that families have less time to prepare foods and that grocery stores provide attractive alternatives not available to food stamp recipients?

HOUSEHOLD FOOD CONSUMPTION SURVEYS

Two other major food consumption surveys are conducted by the NPD Group, a sales and marketing research firm. The National Eating Trends (NET) Survey obtains food intake data from a nationally representative sample, and the Consumer Report on Eating Share Trends (CREST) collects information from a large online sample of consumers on their purchases of prepared meals and snacks at commercial restaurants and other outlets. Both of these datasets are used in analyses by firms interested in food market trends.

The NET survey has been conducted since March 1980. Over the course of a year, 2,000 households record diaries of food and beverage consumption for 14 consecutive days for all individuals in the household. The survey questionnaire and diary are mailed to 60 new households every

Page 61 Cite

Suggested Citation:"3 Proprietary Data Sources." National Research Council. 2005. Improving Data to Analyze Food and Nutrition Policies. Washington, DC: The National Academies Press. doi: 10.17226/11428.

×

Monday. Data are usually processed and available for analysis within three months of collection.

In addition to the food intake diaries, the NET survey collects information on the types and brands of foods consumed, how they were prepared and served, the ingredients used in home-prepared meals, and who in the household consumed them. Information is obtained on whether the respondents were on a diet during the 14-day period and which type of diet they were on, whether they have any medical conditions, their height and weight, supplement use, exercise level, and attitudes on nutrition. Some demographic information is also obtained on respondents.

The CREST survey is an online survey that collects information from 3,000 adults and 500 teenagers on a daily basis (42,000 responses per month). Survey respondents are asked to report what they ate, where they purchased it, where they ate it, who they were with, and how much they spent for food at commercial outlets the day before the survey. The survey also includes behavioral and attitudinal questions.

These surveys collect unique information that could be useful in a number of policy environments. The CREST survey’s unique focus on food eaten away from home could fill in gaps from other surveys on what is known about eating out. However, since the survey is an online survey, it will not cover those without Internet access. Thus, these data may not be useful for low-income or elderly populations. The survey also has low response; typically the response rate is just over 40 percent.

The NET data are unique in providing 14 days of dietary recall, which is an extraordinary amount of information on food intake that is not matched in any other dataset. This information could be used to provide more stable estimates of consumption of different types of food than the two-day recalls from NHANES. It might also be useful for estimating consumption of foods that are eaten less regularly, which may be critical for certain food safety risk assessments. Information about preparation of food and ingredients used could also be used in food safety risk assessments. The other key attribute of these data is that they include information on attitudes towards food and dieting practices. This information, if released in a timely manner, could be useful in picking up market trends related to dieting practices. For example, the recent popularity of the Atkins and related diets is believed to have had large effects on major food purchases, such as meat, grains, and fruits. Timely information about dieting practices might be useful for analyses of these trends.

Page 62 Cite

Suggested Citation:"3 Proprietary Data Sources." National Research Council. 2005. Improving Data to Analyze Food and Nutrition Policies. Washington, DC: The National Academies Press. doi: 10.17226/11428.

×

Of greatest concern with the NET data is the quality of data collected through the 14 days of dietary recalls. Since this amount of recall places significant time and recall burdens on respondents, the quality of the data may suffer. This issue would need careful scrutiny before basing important public policy decisions on results from NET-based analyses.