Annex G:
Issues Related to Sampling

BACKGROUND:

Number of Phase II (1992–2000): Until we receive and integrate all databases, we do not know this. Combining data from the SBA web site (1997-2000) and SBA published reports for 1992,1993, and 1994 and extrapolating from DoD data for 1995 and 1996, I estimate this number to be about 10,800. Based on the three published reports, about 7 percent of these Phase II are from the smaller agencies. Thus if we consider only the big five 10,000 is a good approximation.

Number of Awards per Firm: Until we receive and integrate all databases, we do not know how many firms have only one phase II award, or two or three etc. Thus I must estimate how many surveys will be generated by the following approach.

Existing Commercialization Data: DoD has data by project for 10,372 Phase II projects. (This includes projects from 1983 to 1991 and 2001). Since 1999, firms who have submitted SBIR or STRR proposals to DoD have had to enter firm information and information on sales and investments for all of the Phase II awards that they received, regardless of awarding agency. As a percent of Phase II awarded by Agency from 1992 to 2000, we have data on approximately 75 percent of DoD, 67 percent of NASA and DOE, 54 percent of NSF and 16 percent of NIH/HHS Phase II awards. DOE has provided commercialization data by product, which cannot be directly associated to projects due to double counting. NASA has collected data by project, which could very useful to our examination of NASA.

Proposed New Commercialization Database: We may set up a database comparable to the DoD one, to collect initial data from firms not in the DoD Commercialization Database. The Commercialization Data includes substantial information about the firm, which will not then have to be collected on the firm survey. It provides a broad overview of all projects. This allows us to sample survey rather than 100 percent survey, yet still have info on a high percentage of projects and firms. It also reduces the chance we will miss any high performing projects when we sample.

Addresses: the use of a commercialization database insures we have a point of contact, phone number and email address, which is important if not essential to executing a good on line survey.

SAMPLING APPROACH:

I propose several different samples described below.

Random Sample. After integrating the 10,000 awards in a single database, I will generate a random sample of some percent of the awards (for example 20 percent) for each of the years; e.g., 20 percent of the 1992 awards, etc. Generating the total sample one year at a time will provide a balance sample.

Random sample by agency. I would then group by agency and randomly select a few more as required to insure each agency had at least 20 percent surveyed.

Top Performers. From the Commercialization database, we would identify the top projects in sales and investment. (Since the current DoD Commercialization data include 10,372 projects, it gives us an approximation of how many projects this would entail.) If we select all projects that had at least $5,000,000 in sales or at least $5,000,000 in investment this would entail about 385 projects.

100 percent for Firms with a Small Number of Projects. I would like to survey 100 percent of the projects that went to firms with only one or two awards (perhaps three). I would estimate about a third of the 10,000 awards went to firms with 2 or less awards. (Based on data from 1983 to 1993, which show 2/3 of all Phase II awards went to firms with four or less awards and a roughly exponential distribution where firms with a single award were most common, followed by firms with two etc.) These are the hardest firms to find; address information is perishable, thus response rate is much lower. We usually have good address info for multiple winners, thus a much higher level of response.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 61
Annex G: Issues Related to Sampling BACKGROUND: Number of Phase II (1992 – 2000): Until we receive and integrate all databases, we do not know this. Combining data from the SBA web site (1997-2000) and SBA published reports for 1992,1993, and 1994 and extrapolating from DoD data for 1995 and 1996, I estimate this number to be about 10,800. Based on the three published reports, about 7 percent of these Phase II are from the smaller agencies. Thus if we consider only the big five 10,000 is a good approximation. Number of Awards per Firm: Until we receive and integrate all databases, we do not know how many firms have only one phase II award, or two or three etc. Thus I must estimate how many surveys will be generated by the following approach. Existing Commercialization Data: DoD has data by project for 10,372 Phase II projects. (This includes projects from 1983 to 1991 and 2001). Since 1999, firms who have submitted SBIR or STRR proposals to DoD have had to enter firm information and information on sales and investments for all of the Phase II awards that they received, regardless of awarding agency. As a percent of Phase II awarded by Agency from 1992 to 2000, we have data on approximately 75 percent of DoD, 67 percent of NASA and DOE, 54 percent of NSF and 16 percent of NIH/HHS Phase II awards. DOE has provided commercialization data by product, which cannot be directly associated to projects due to double counting. NASA has collected data by project, which could very useful to our examination of NASA. Proposed New Commercialization Database: We may set up a database comparable to the DoD one, to collect initial data from firms not in the DoD Commercialization Database. The Commercialization Data includes substantial information about the firm, which will not then have to be collected on the firm survey. It provides a broad overview of all projects. This allows us to sample survey rather than 100 percent survey, yet still have info on a high percentage of projects and firms. It also reduces the chance we will miss any high performing projects when we sample. Addresses: the use of a commercialization database insures we have a point of contact, phone number and email address, which is important if not essential to executing a good on line survey. SAMPLING APPROACH: I propose several different samples described below. Random Sample. After integrating the 10,000 awards in a single database, I will generate a random sample of some percent of the awards (for example 20 percent) for each of the years; e.g., 20 percent of the 1992 awards, etc. Generating the total sample one year at a time will provide a balance sample. Random sample by agency. I would then group by agency and randomly select a few more as required to insure each agency had at least 20 percent surveyed. Top Performers. From the Commercialization database, we would identify the top projects in sales and investment. (Since the current DoD Commercialization data include 10,372 projects, it gives us an approximation of how many projects this would entail.) If we select all projects that had at least $5,000,000 in sales or at least $5,000,000 in investment this would entail about 385 projects. 100 percent for Firms with a Small Number of Projects. I would like to survey 100 percent of the projects that went to firms with only one or two awards (perhaps three). I would estimate about a third of the 10,000 awards went to firms with 2 or less awards. (Based on data from 1983 to 1993, which show 2/3 of all Phase II awards went to firms with four or less awards and a roughly exponential distribution where firms with a single award were most common, followed by firms with two etc.) These are the hardest firms to find; address information is perishable, thus response rate is much lower. We usually have good address info for multiple winners, thus a much higher level of response. 61

OCR for page 61
Coding: The database will track which sample(s) each survey belongs to. It would be possible for a random sampled project to be a top performer from a firm, which had only two awards. Thus it could be coded as random sample for the program, random sample for the awarding agency, top performer and 100 percent of single or double winners. The database itself can group surveys that came from multiple winners once we establish how many awards we use as a cutoff for that designation. How many surveys: I estimate that if the random sample were 20 percent, this approach would generate about 5000 to 5500 project surveys and about 3000 firm surveys, assuming each firm that received at least one project survey also received a firm survey. Although we would be sampling over half of the awards, firms that had many awards would have surveys on slightly over 20 percent. The response rate depends on how much effort is spent before the survey in insuring good addresses (Do we create the new commercialization database?) and how much follow up and phone calls we make to people who do not respond. One agency mentioned that his survey had a 70- 80 percent response rate, but until he began phone calls that rate was 15 percent. 62