Options for the Use of Sampling to Collect Merchandise Trade Data
This appendix discusses three major options for using sampling techniques for the compilation of U.S. merchandise trade data: cutoff sampling of low-value trade documents, probability sampling of documents, and collection of data from a sample of enterprises or establishments. The three options are not necessarily mutually exclusive: it would be possible, for example, to use a combination of cutoff and probability sampling of trade documents; another possibility would be to use an establishment sample to produce preliminary estimates of a few monthly aggregates, while continuing to rely on the present system for detailed data.
The first two options relate solely to the production of merchandise trade data based on the official import and export documents. The third option, collection of data from enterprises, has broader implications. This method could either supplement or replace the production of merchandise trade data based on the official documents. It also has the potential for collecting not only merchandise trade data, but also some kinds of services trade data from the same sample of enterprises.
CUTOFF SAMPLING OF LOW-VALUE TRADE DOCUMENTS
Cutoff sampling of low-value trade documents is currently used by the Census Bureau in compiling monthly merchandise trade data. The method consists of capturing the desired information
from all import and export transactions for which the value equals or exceeds specified dollar cutoffs. However, the Census Bureau publishes monthly estimates of the total value of the excluded low-valued shipments by country (although not by commodity), using multiplicative estimation factors based on historical data.
For exports, the cutoff is $2,500 (raised from $1,500 as of October 1989 data) and shippers' export declarations (SEDs) are no longer required for transactions of less than that amount. For exports to Canada, the SED filing requirement has recently been eliminated for all except a few regulated transactions because, under the U.S.-Canada data exchange agreement, the United States is now relying on Canadian import data to report U.S. exports to Canada, just as Canada is using U.S. import data to report Canada's exports to the United States.
For imports, the current cutoffs are $250 for selected commodities for which there are import quotas and $1,250 for all others. (The latter value was raised from $1,000 as of October 1989.)
ADVANTAGES OF CUTOFF SAMPLING
An obvious advantage of cutoff sampling is that it is the present system: it does not require any resources for testing and development. The sampling process for the documents that are processed manually is straightforward: documents with only low-value transactions are set aside; if a document has both types of transaction, data are keyed only for those above the cutoffs. Since the main costs are for keying, savings are roughly proportional to the number of transactions below the cutoff figures. Significant savings in manual processing costs can be realized by raising the cutoff point, especially for exports, for which about 60 percent of the total value of exports comes from transactions that are not being reported electronically. (Data exchange with Canada covers about 20 percent of total value of exports; the other 20 percent is reported electronically.) For import and export transactions reported electronically, the “sampling ” is even simpler: it consists of computer instructions to omit any transactions below the cutoffs from all tabulations. For exporters, the cutoff system reduces the reporting burden because they are not required to file SEDs for transactions below the cutoff (except for certain controlled commodities).
Last, but not least, for transactions above the cutoff figures, there are no limits on the amount of detail that can be tabulated (except as required by the confidentiality policy that groups together export data for detailed commodities if publication of data might reveal individual company activities). This advantage is somewhat illu-
sory because for some cells the excluded (below cutoff) transactions constitute a substantial part or even all of the true total.
DISADVANTAGES OF CUTOFF SAMPLING
The use of cutoff sampling means that there is no direct information available on transactions below the cutoff point (except for exports to Canada, for which some information on transactions of as low as about $770 in value is available as a result of the U.S.-Canada data exchange). As of October 1989, the Census Bureau estimated that low-value shipments accounted for about 0.7 percent of the value of imports and 2.9 percent of the value of exports. The effect of their omission on highly aggregated data is fairly small, but some of the thousands of cells of data published monthly are subject to much larger effects. Adams (1989) stated that the increase in the export cutoff (from $1,500 to $2,500) would lose data for approximately 15,000 (11 percent) of the commodity by country by method of transportation data cells. At the individual country level, the increase would lose 50 percent or more detail for six of our smallest trading partners. The proportion of the value of air shipments lost by raising the cutoff was about 5 times the proportion lost for vessel shipments (Dickerson, 1989).
Except for exports to Canada, the monthly estimates of low-value shipments by country are obtained by multiplying the current cutoff figures by factors derived from earlier data. For exports, at least, there is no reliable way to evaluate the accuracy of these estimates because no documents are available for the transactions they include. (Theoretically, it would be possible to require exporters to periodically file SEDs for low-value shipments, but this would be operationally complex and could absorb much of the savings from using cutoff sampling.) A Census Bureau memorandum (Huang, 1987) states that “the validity of the estimated low value factors is in doubt.”
The use of cutoff sampling provides exporters with an opportunity to escape the SED filing requirement. For example, an exporter with a shipment valued at $6,000 could treat it as three separate shipments valued at $2,000 each. We have seen no evidence, however, that this is a common practice.
The potential and actual savings from the use of cutoff sampling are much greater for transactions reported on paper documents than they are for those reported electronically. Future
costs of compiling merchandise trade data based on cutoff samples are difficult to predict because they are subject to opposing trends. Expected increases in the number of foreign trade transactions could lead to substantial increases in the cost of processing the documents if there were no change in the proportion of transactions covered by electronic reporting. However, there are three factors that can limit such cost increases or even lead to reductions: the substitution of foreign import data for U.S. export data; increases in the proportion of transactions reported electronically; and increases in the cutoff point.
Substitution of Canadian data on imports from the United States for U.S. data on exports to Canada began in January 1990. As a result, there is now no need to process paper documents for transactions covering about 20 percent of the value of U.S. exports. Similar substitution arrangements with other trading partners, including the European Community, are under consideration but the prospects of implementation within the next 5 years appear slight. The arrangement with Canada resulted from several years of negotiation and development work, and arrangements with other countries would require solutions to problems that did not exist for U.S. exports to Canada, such as shipments through intervening countries and elapsed time between date of export and date of import.
According to the Census Bureau, as of early 1991 electronic reporting covered about 78 percent of the total value of imports and about 40 percent of total value of exports. The prospects for increasing the proportion for imports are said to be good: there are some effective incentives for importers to shift to electronic reporting. For exports, the prospects will be good if exporters are provided incentives to shift to electronic reporting.
As has happened from time to time in the past, the cutoffs can be increased again in the future. Although there have been many studies to determine the effects on the data of proposed increases in cutoffs, there are no firm criteria used to determine how far the cutoffs can be raised. Decisions seem to be driven more by budgetary considerations.
PROBABILITY SAMPLING OF TRADE DOCUMENTS
In general terms, probability sampling requires that all units belonging to a defined universe have a known, nonzero chance of selection. Large units, if desired, can be given a selection prob-
ability of one. Probability sampling of foreign trade documents or transactions could be applied to all official documents submitted by U.S. importers and exporters or it could be applied to some subset, such as all transactions valued at or above specified cutoffs. The latter approach would be a combination of probability and cutoff sampling.
Starting in 1953 and continuing until 1982 for imports and 1985 for exports, the Census Bureau selected probability samples of low-value import and export transactions, processed these transactions, and used the data in its monthly estimates. Cutoffs and sampling fractions were changed frequently during this period. These sampling procedures were dropped and replaced by cutoff sampling primarily because funds available for processing paper documents were insufficient to process the increasing number of documents received each month.
ADVANTAGES OF PROBABILITY SAMPLING
As is the case for cutoff sampling, savings can be realized by processing only a probability sample of documents. If the sample selection and estimation procedures are simple, the savings can be roughly proportional to the reduction in the number of documents processed; if they are complex, the savings will be less. The size of the samples required to obtain sufficiently reliable estimates is a key determinant of cost savings. Sample size requirements are primarily a function of how many distinct data cells must be estimated for each month or other time period.
The use of sampling may permit more timely publication of data and reduction of nonsampling error through more thorough review of the data for sample transactions. However, these results are not guaranteed: they depend on the particular sampling plan and operational procedures used. The sampling errors resulting from the use of sampling can be estimated directly from the sample data, unlike the errors of estimates for transactions below the cutoff, which are difficult to determine with any degree of confidence.
DISADVANTAGES OF PROBABILITY SAMPLING
Probability sampling introduces some requirements that do not exist when cutoff sampling is used. An estimation procedure must be used, which usually involves weighting the data for transactions in the sampled portion of the universe. To comply with
Census Bureau standards and generally accepted procedures, sampling errors must be estimated periodically and the estimates of sampling error included in publications or made available to data users by other means.
The main disadvantage, however, is the introduction of sampling error. For data cells or estimates based on large samples, this is a minor problem, and nonsampling errors are often much larger than sampling errors. About 90 percent of the detailed data cells for exports, however, are based on five or fewer transactions (Bureau of the Census, 1983). For these cells, any use of sampling would introduce high and probably unacceptable levels of sampling error. Many nonzero cells would have no sample transactions and would have to be shown in publications as estimated zero cells.
OPTIONS FOR PROBABILITY SAMPLING OF TRANSACTIONS
There are many different ways in which probability sampling could be introduced (or reintroduced) into the compilation of merchandise trade statistics. To narrow the possibilities, we eliminate several options:
Sampling of transactions that are reported electronically. Although some savings might accrue from sampling these transactions, the savings are likely to be quite small in relation to what could be saved by reducing the number of paper documents to be processed.
Sampling of import transactions not reported electronically. The Census Bureau has set a target for raising the level of automated reporting of imports to 90 percent of total value by 1991 (Adams, 1989). The costs of setting up and operating a sampling procedure for the remaining 10 percent would probably use up a substantial part of the potential cost savings.
Reintroduction of sampling of export transactions below the present cutoff without any sampling of transactions above the cutoff. This option would increase the reporting burden on exporters and would increase overall processing costs. Even if funds were available, they might better be used for other purposes.
Sampling paper export documents to produce more timely preliminary estimates of the key merchandise trade statistics, but processing all documents, possibly on a delayed schedule, to eventually produce the present level of detail. This option would add substantially to the present cost of the program.
Elimination of these options leaves the possibility of processing only a sample of the paper documents for export transactions, including some that are valued at or above the present $2,500 cutoff. If the goal of the program is to continue to produce monthly data by commodity, country, and U.S. customs district at the present level of detail, this is not a viable option. With about 90 percent of the nonzero data cells having five or fewer shipments and about 75 percent having only one or two shipments (Bureau of the Census, 1983), the sampling errors for any kind of sample would clearly be unacceptable. Therefore, in the following discussion it is assumed that the data requirements can be changed to reduce the amount of detail produced on a monthly basis for exports. Such a reduction would not require any change in the current statute, which requires substantial monthly detail on imports but only one overall figure for exports.
There are two ways of changing the data requirements so that estimates of acceptable precision could be produced by sampling. One is to reduce the number of data cells estimated for a given month or other reference period. The other is to reduce the frequency with which estimates are published—to change the frequency from monthly to quarterly or annual for some cells. Some combination of these changes is possible. During a budget crisis in the early 1980s, Census explored the possibility of using a 50 percent sample of export documents for transactions valued at or above the then cutoff of $500 and below a series of upper limits ranging from $5,000 to $100,000 (Puzzilla, 1983). At that time, the proposed sampling scheme would have reduced the number of documents to be processed by more than 40 percent, even with an upper limit of $5,000. The only data cells evaluated in that study were totals for nine broad commodity groups and three methods of transportation. The differences between the universe totals and the sample estimates for these 12 cells were all acceptably small.
We can conclude at this point only that sampling non-Canadian paper export documents at a rate that would lead to a significant reduction in processing costs would provide monthly estimates of acceptable reliability for an unknown number, x, of data cells, with x lying somewhere between 12 and the more than 200,000 export data cells for which data are published (or made available in CD-ROM format) monthly. This is, of course, a wide range of uncertainty. To narrow it would require a more comprehensive version of the 1983 study undertaken by the Census Bureau. Such a study would not be unduly expensive and might be
carried out with assistance from the Census Bureau's Statistical Research Division. There are several general considerations for designing such a study:
The study should provide information on the reliability of monthly, quarterly, and annual estimates for the same data cells. With fixed sampling fractions, the accumulation of sample data over more than one month can provide estimates of acceptable reliability for longer periods.
The study should be designed to provide data for several alternative sample designs with varying stratum definitions and sampling fractions. It is likely that any acceptable design would include a “certainty stratum,” that is, a category of large transactions that would be included in the sample with a probability of one.
One would expect a new study to be somewhat better than the 1983 study because all Canadian and automated transactions would not be included in the estimates. Sampling would apply only to the approximately two-thirds of transactions not in those categories.
There are two different approaches to undertaking a study of this kind. The 1983 study compared estimates from one particular sample with corresponding data items tabulated from all transactions. An alternative would be to develop estimates of within-stratum variances for several variables for all of the potential design strata. These estimated variance components could then be used to evaluate the expected levels of reliability for several alternative sample designs. The latter approach would be more flexible for examining several alternatives.
The evaluation of alternative designs should take into account the actual selection process that would be required for each. If a design is too complex, a substantial part of the cost savings could be used up in selecting the sample.
It might be desirable to undertake the study with a database for a period prior to October 1989, so that export transactions in the $1,500-$2,500 range could be included. The additional cost of sampling documents in this range at a low rate could be quite small, and including these documents in the sampled universe would eliminate some of the uncertainty associated with estimates based on historical factors for low-value shipments.
On the basis of such a study, the Census Bureau could develop for discussion with data users one or more options for sampling non-Canadian paper export documents. For each option, the description would identify the levels of detail that could be pub-
lished monthly, quarterly, and annually, the expected levels of reliability for each type of estimate, and the expected cost savings.
Panel members were somewhat surprised at the amount of manual handling required by the present system for processing paper documents and noted that there may be document processing technology available that could be applied to reduce the volume and cost of manual operations. There are at least two possibilities directly relevant to a decision on whether to use extensive probability sampling of non-Canadian paper export documents.
The first would be to redesign the SED to incorporate machine-readable codes to allow fully or partly automated sorting and batching of incoming documents consistent with the different kinds of treatment they require in the sample selection procedures, and the data entry procedures. If this could be done, the next step, sample selection, could be at least partially automated. A more difficult problem, but one which would certainly justify some systems development effort, can be considered: If more extensive sampling of paper documents is adopted, is there some relatively low-cost method by which the nonsample documents could be stored and indexed to permit on-demand special 100 percent tabulations of documents in small cells identified by variables such as country, detailed commodity, method of transportation? If this type of system could be developed, special needs not covered by the sample-based routine tabulations could be provided at a reasonable cost on a reimbursable basis.
SAMPLING OF ENTERPRISES AND ESTABLISHMENTS
Some kinds of data on merchandise trade can be collected from business enterprises or establishments. For about 30 years, establishments included in the Census Bureau's Annual Survey of Manufactures (ASM) have been asked to report what proportion of their total production (plant value) has been exported. The 1987 economic censuses included a question about total value of exports for all establishments, and the 1987 ASM included a question on what percentage of the cost of materials was accounted for by materials of foreign origin. The information reported by establishments has been used for various purposes, including estimation of the number of jobs in manufacturing accounted for by
exports, by state, and, more recently, as one of the inputs to an exporter data base being developed by the Foreign Trade Division.
Until now, only annual data on total exports have been collected from establishments. Could detailed monthly data on merchandise imports and exports be collected from a sample of enterprises or establishments to supplement or replace the data currently being compiled from official foreign trade documents? If this is technically possible, would it be desirable in the short run or in the long run? How would the costs of the trade statistics program be affected? These are difficult questions. The remainder of this section attempts to identify the main issues that must be considered in arriving at answers and to eliminate some of the less realistic options.
What kinds of information would one want to obtain from monthly samples of importers and exporters? Would one want to collect data from importers and exporters as defined in the present system based on official trade documents or from the ultimate consumers of imports and the original producers of exports? Preliminary information from linkages of SEDs with data from the 1987 economic censuses shows that most exporters of record are manufacturers or wholesalers; only about 2 percent are freight forwarders. Less is known about the characteristics of importers, but it is likely that a substantial proportion of the importers of record are customs brokers who complete the entry forms and pay the duties but do not necessarily know the ultimate destination of the goods. Thus, the kind of information that could be collected would depend on how importers and exporters were defined for enterprise surveys. From some points of view, it might be preferable not to define them exactly as they are in the present system.
How much detail could be collected from enterprises? To reproduce the full detail available from the present system of official trade documents, it would be necessary to ask a large sample of enterprises to report full data for each of their import and export transactions or, at best, to report combined data for transactions during the reference period for the same country, commodity, and method of transportation. Alternatively, establishments with large numbers of transactions could be asked to report data for only a sample of their transactions.
On the basis of these considerations, we can readily conclude that it would not be possible, with collection of data from a sample
of enterprises, to produce monthly merchandise trade data at the present level of detail, largely for the same reasons that it would not be possible to do this with more extensive probability sampling of official trade documents. In addition, the reporting burden on the enterprises must be considered. Even to provide summary data on a monthly basis might turn out to be a costly and time-consuming effort for the sample enterprises and one they would be unwilling to undertake. (Under current law, Census Bureau establishment surveys that require more frequent than annual reporting cannot be made mandatory.)
What kinds of merchandise trade data could reasonably be produced by collecting data from a sample of enterprises? There are at least three possibilities. One would be the production of preliminary estimates of a few summary values significantly in advance of the release of the detailed data, which occurs about 45 working days after the end of the reference month. A second would be the production of final estimates with much less detail than is provided by the current system. If this is the only source of merchandise trade information, however, a change in the law might be necessary, at least with respect to the requirements for monthly data on imports. A third possibility would be the collection of information not readily available from the present system, such as information on the characteristics of importing and exporting establishments (such as size and industry classification) and on ultimate destination of imports. In addition, some information on trade in services is currently being collected by the Bureau of Economic Analysis in enterprise surveys: to the extent that the same enterprises engage in both merchandise and service trade, both kinds of data could be collected from the same sample units.
A long-range consideration is that the structure of international trade may change in ways that eliminate the need, for administrative purposes, for much of the information presently captured from official documents. If this happens, collection of data from enterprises may be the only way to obtain certain kinds of data.
Given preliminary agreement on the data requirements for surveys of importers and exporters, the next steps would include development of a sampling frame, design of a sample of enterprises and establishments to meet the survey goals, and the development and testing of suitable data collection instruments and procedures.
The development and maintenance of a sampling frame of importers and exporters is made easier by the addition of employer identification numbers (EINs) to the official trade documents starting in 1985. Computer files of import and export records can be matched to EIN records from the economic censuses and the Census Bureau's Standard Statistical Establishment List to produce listings of importers and exporters—with their names and addresses, establishment detail, and characteristics, such as industry classification and volume of imports and exports—that could be used to design an efficient sample. Such a matching process, based on 1987 trade documents and the 1987 economic censuses is, in fact, under way for exporters (Farrell, 1989). (However, no matching is being performed for importers.) For intercensal use as a sampling frame, the resulting exporter database would have to be updated annually on the basis of the ASM and the Annual Company Organization Survey. (The latter updates the information on establishments for multiestablishment companies.) The annual cost of updating is estimated to be $500,000 (Farrell, 1989). One problem with this procedure is incomplete reporting of EINs on the SEDs, which exceeded 20 percent for the 1987 trade records. This problem could be dealt with in various ways if the exporter data base were funded as an ongoing activity.
To determine the optimum design for a sample of enterprises for any purpose, it is necessary to have information about their distribution by some measure of size, such as total value of imports and exports, that is related to the variables of interest. If the distribution is highly skewed—that is, if a few units account for a large part of the volume of imports and exports—reliable estimates for some variables could be developed using a fairly small sample of enterprises. Nonetheless, no reliable data on the distribution of imports and exports by company or establishment are presently available. For exports, such data will become available when the exporter database linkages have been completed; no comparable data sources for imports are under development.
Data collection procedures would require extensive testing. The survey objectives and data requirements would have to be specified in detail. A series of feasibility tests and pretests would be needed to find out whether respondents are able and willing to report the desired information.
Important policy issues would have to be confronted. Would response to the planned surveys be mandatory and, if so, would new legislation be needed? What limits would have to be imposed on the release of survey results in order to preserve the
confidentiality of responses for individual enterprises and establishments? How would new surveys conducted by the Census Bureau be coordinated or integrated with surveys presently conducted by the Bureau of Economic Analysis on related topics?
A new establishment sample is clearly not a viable option in the short term to replace the present system of compiling merchandise trade data from official trade documents, even if the amount of detailed data required monthly could be reduced somewhat. Substantial time and resources would be needed to decide what kind of information and how much detail could and would be reported on a monthly basis by importers and exporters. In addition, since all other countries presently compile their foreign trade statistics from official documents, shifting to an establishment basis for U.S. foreign trade statistics would complicate data exchange arrangements (such as the one between the United States and Canada) and might make U.S. data less comparable with those of other countries.
There are some kinds of data relevant to foreign trade, however, for which enterprises and establishments are the logical source of information. Examples are the data on exports by manufacturers and wholesalers and costs of imported materials being collected in the economic censuses and the ASM and the data on import and export prices collected from a sample of establishments by the Bureau of Labor Statistics. There are other kinds of useful information that might best be collected in new establishment surveys. For example, most low-value international shipments by air are carried by a relatively small number of shippers. In view of the interest that has been expressed by specialized user groups for more complete coverage of air shipments, the Census Bureau might want to investigate the possibility of collecting some information about air shipments in sample surveys of shipping companies, which would include all of the large shippers. Because of the specialized user groups involved, it would be appropriate for the Census Bureau to make such an activity contingent on receipt of at least partial funding from users.