Appendix B
Routine External Evaluation Protocol
Dorfman (2017) proposes that the Routine External Evaluation Protocol (REEP) be used as part of external evaluation.1 He argues that, as the demand for small-area estimation is ever-increasing, statisticians have the responsibility to develop a protocol aimed at setting criteria and enabling detection of “a tipping point” at which estimates produced using small-area models cannot be regarded as satisfactory. This is particularly difficult to determine in groups of areas where there is little or no sample and that may behave differently from larger areas.
For NASS, REEP would be based on comparing county model estimates with the corresponding direct sample-based estimates where the latter are based on a large enough sample. The procedure would be implemented by selecting a random sample of counties within which a supplementary sample of farms would be selected and used to prepare direct estimates.
___________________
1 If NASS decides to research the feasibility of the above REEP, a number of questions will have to be answered before the implementation:
- the procedure for selecting the random sample at Step 1, the number of counties needed, and the size of the sample selected within each county;
- specific procedure to amend the current adaptive design for data collection;
- specific method of evaluation and diagnostics, including developing formal criteria;
- estimated additional cost;
- estimated benefits from having the additional sample (beyond the goal of the model evaluation);
- possibility of evaluation (simulations?) based on historical data;
- whether it is possible to construct a procedure in such a way as to be able to modify the set of publishable counties based on the test results, and in the case where a model fails, whether it can still be applied to a subset of counties.
Step 1. Select at random a certain number of counties from the set of all counties that are expected to be estimated using models because the regular sample size does not meet publication standards.
Step 2. From each county selected at Step 1, take a supplementary sample. This sample should be large enough so that even with nonresponse, the direct sample-based estimates meet publication standards. These direct estimates and their variances will serve to test the quality of the model estimates.
Remark 1: For the County Agricultural Production Survey (CAPS), the current sample design already attempts to sample enough units from every county (the target sample size for each county is 70 for each commodity, with the goal of having 30 positive reports or at least 25% coverage within the county). The adaptive design approach used by NASS to target counties for survey nonresponse follow-up may need to be modified to give priority to the counties selected for external evaluation.
Remark 2: In the presence of high nonresponse, this supplementary sample may also be potentially useful in assessing the nonresponse bias. Compare the estimates in selected counties with all the other counties. Since the counties at Step 1 were selected at random, the average estimate that includes the supplementary sample can be compared with the rest of the sample; if there is no bias due to nonresponse, the estimates should be close.
Step 3. Use the “main sample” to compute model-based estimates and the full sample for evaluation and diagnostics. Note: The advantage of this “external” test is that it is formal (objective) and contemporaneous.
Step 4. Once the goal of model evaluation is accomplished, the supplementary sample can be combined with the main sample to produce better direct survey estimates and model-based estimates. Thus, even in the situation where model-based estimates fail, the estimates for counties selected at Step 1 will still be published.