Stratified Random Sampling to Estimate Water Use

Several states with extensive water use databases rely upon the census approach. That is, data are collected for all water users withdrawing amounts greater than a specified threshold that varies from state to state. In some cases—New Jersey, for example—seasonal and annual water use data are collected. In other states, water use quantities may be estimated from empirical equations relating water use to other variables like population and economic activity. In these cases, the water use data may reflect few direct measurements.

The information obtained from the census approach is valuable in understanding water use patterns. However, census data collection may be costly. Indirect estimation techniques allow preparation of aggregated water use estimates, but the quality of the information may be low or uncertain. Therefore, methods that minimize data collection and compilation costs while producing water use estimates with the needed level of accuracy are preferred.

Random sampling is an alternative to exhaustively collecting and processing water use numbers from all users (the census approach) or indirectly estimating use from empirical equations such as coefficient methods. With random sampling, a subset of randomly selected users would complete water use surveys. Statistics derived from the survey results for sampled users would be used to estimate total water use for all users. Compared to the census approach, random sampling reduces the effort involved in collecting water use data, while allowing quantification of the introduced sampling error.

Random sampling is most likely to be useful when done within categories expected to show similarities in the nature of water use. Characteristics of users and geographical location form the basis for dividing users into (mutually exclusive and collectively exhaustive) categories or strata. Within each category,

Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.

Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 86

Estimating Water Use in the United States: A New Paradigm for the National Water-Use Information Program
5
Stratified Random Sampling to Estimate Water Use
Several states with extensive water use databases rely upon the census approach. That is, data are collected for all water users withdrawing amounts greater than a specified threshold that varies from state to state. In some cases—New Jersey, for example—seasonal and annual water use data are collected. In other states, water use quantities may be estimated from empirical equations relating water use to other variables like population and economic activity. In these cases, the water use data may reflect few direct measurements.
The information obtained from the census approach is valuable in understanding water use patterns. However, census data collection may be costly. Indirect estimation techniques allow preparation of aggregated water use estimates, but the quality of the information may be low or uncertain. Therefore, methods that minimize data collection and compilation costs while producing water use estimates with the needed level of accuracy are preferred.
Random sampling is an alternative to exhaustively collecting and processing water use numbers from all users (the census approach) or indirectly estimating use from empirical equations such as coefficient methods. With random sampling, a subset of randomly selected users would complete water use surveys. Statistics derived from the survey results for sampled users would be used to estimate total water use for all users. Compared to the census approach, random sampling reduces the effort involved in collecting water use data, while allowing quantification of the introduced sampling error.
Random sampling is most likely to be useful when done within categories expected to show similarities in the nature of water use. Characteristics of users and geographical location form the basis for dividing users into (mutually exclusive and collectively exhaustive) categories or strata. Within each category,

OCR for page 86

Estimating Water Use in the United States: A New Paradigm for the National Water-Use Information Program
water use data from surveys are averaged. This quantity multiplied by the total number of users in the category produces an estimate of total water use by category. The total water use estimate for a region is just the sum of the total water use in the categories. Theory and techniques of stratified random sampling are discussed in Cochran (1977) and elsewhere. Water use categories are defined in Chapter 3.
The stratified random sampling approach allows explicit estimation of the error due to sampling. Additional error may result from measurement inaccuracies, deliberate misrepresentation of water use, and the failure to identify or appropriately categorize all users. Cochran (1977, Chapter 13) describes some sources of error and discusses needed theoretical modifications.
Stratified random sampling is used successfully by other federal agencies to address similar sampling problems. For example, the U.S. Department of Agriculture (USDA) uses a stratified random sampling approach to estimate irrigated acreage nationwide. Irrigators are grouped into strata based on their past reports of acres irrigated. Strata boundaries are flexible and differ from state to state. The USDA publishes its methods and data in the Census of Agriculture (USDA, 1999a).
STRATIFIED RANDOM SAMPLING METHODOLOGY
The following discussion summarizes some notation and relationships used to estimate water use with stratified random sampling without providing details and derivations. Readers interested in a full analysis of the equations are referred to Cochran (1977).
The fundamental goal of the stratified random sampling methodology is to develop an estimate of total water use, , using data collected within categories. Let the index h represent the hth stratum/category in a water use survey, where h = 1, 2, 3, . . ,.L and L is the total number of strata. Nh is the number of users in category h, and is the total number of users, N. Let nh be the number of samples taken from stratum h, and further, assume that water use sampled from a single water use site within each category has population variance .
Then from Cochran (1977, Equation 5.6) the error variance VT (the square of the standard error for the total water use) is:1
5.1
1
The variance in the mean has been written in terms of VT since is the estimate for the average water use as used by Cochran (1977).

OCR for page 86

Estimating Water Use in the United States: A New Paradigm for the National Water-Use Information Program
Note that this equation differs from equations commonly used to estimate variance for samples drawn from infinite populations. The second term on the right side of Equation 5.1 represents the finite population correction, which is essential in this situation because populations of water users are finite. VT is the error variance attributable to sampling—i.e., the variance that can be controlled by increasing or decreasing sample size. Thus, if every user is sampled, nh = Nh and VT equals zero.
The optimal allocation of samples between strata depends upon the total number of users in each category and the population variance within each category.2 For stratum h, the optimal number of samples, nh, can be calculated with the following (notation modified from Cochran, 1977, Equation 5.26):3
5.2
where n is the total number of samples needed to estimate water use to the desired precision. The total n depends on Nh, , and VT, the error variance in the total Ŷ:4
5.3
Equation 5.3 is only valid if samples are allocated in accordance with equation 5.2.
In contrast, if random sampling is performed without stratification, the required number of samples needed is:5
5.4
2
Assuming that the cost of sampling does not vary from category to category. Cochran (1977) provides a modification if sampling costs do depend on category.
3
In actuality, Cochran (1977) developed the result in Equation 5.2 by minimizing variance on the estimate of the mean. The result is identical if calculations are made to minimize variance on a total like total water use.
4
This equation is modified from Cochran (1977, Equation 5.25), where VT has been used in place of V, the variance in the mean since VT=N2V.
5
This equation is obtained by solving Cochran (1977, Equation 2.8) for n and using VT rather than V.

OCR for page 86

Estimating Water Use in the United States: A New Paradigm for the National Water-Use Information Program
where σ2 is the population variance of samples taken from individual water use sites.
EXAMPLE: DEVELOPMENT OF A SAMPLING PLAN FOR ARKANSAS
The purpose of this example is to illustrate how a sampling approach can be used to estimate total annual withdrawals of water in the state of Arkansas. The example utilizes the existing inventory of point withdrawals within the state, which in 1997 contained 44,670 individual withdrawal points.
The 1997 Database
The 1997 database contains monthly and annual values for 44,670 groundwater and surface water withdrawal points. The information used in the analysis below includes county name, annual withdrawal, source designation, category of use, and pipe diameter. Table 5.1 contains the summary statistics for the 1997 data.
Irrigation withdrawals accounted for nearly 71 percent of the total offstream withdrawals and 92 percent of the total withdrawal points in 1997. A single withdrawal point for nuclear power accounted for nearly 12 percent of the withdrawn volume of water.
Sampling Approach
The data in Table 5.1 were obtained by using a census approach. Although state law mandates the inventory, the inventory entails a cost borne by individual water users and the state government. For all practical purposes, the 1997 data can be taken to represent the entire population of individual withdrawal points in Arkansas, providing us with accurate knowledge of population variances needed to develop a sampling plan. In usual practice, the population standard deviations would be unknown and would be estimated with sample standard deviations or historical population standard deviations.
Assume that the allowable standard error due to random sampling is approximately 10 percent of the total annual withdrawal from all categories (12,688,688 MG) or 1,268,868.8 MG. If the total withdrawal were estimated by taking a random sample from the population of all points, to measure the total withdrawal with a standard error of 10 percent, the sample size n would have to be (from Equation 5.4):

OCR for page 86

Estimating Water Use in the United States: A New Paradigm for the National Water-Use Information Program
TABLE 5.1 Database of Point Withdrawals in Arkansas, 1997
Category of Use
Number of Withdrawal Points
Mean Withdrawal (MG)
Standard Deviation (MG)
Coefficient of Variation
Total Annual Withdrawals (MG)
Points with Zero Use
Irrigation (IR)
41,102
165
492
3.0
6,771,025
5,417
Agriculture (AG)
1,918
211
328
1.6
403,701
193
Water Supply (WS)
1,026
536
3,837
7.2
550,096
455
Industrial (IN)
200
959
3,829
4.0
191,812
41
Commercial (CO)
120
362
1,286
3.6
43,472
82
Fossil Fuel
49
8,520
32,979
3.9
417,487
19
Power (PF)
Minerals
33
975
5,488
5.6
32,180
18
Extraction (MI)
Nuclear Power (PN)
15
74,869
289,966
3.9
1,123,034
14
Domestic (DO)
4
2.5
5.0
2.0
10.0
3
Waste
4
98
113
1.2
390
2
Treatment (ST)
Hydropower (PH)
2
1,560,228
267,112
0.2
3,120,455
0
Unknown
197
178
264
1.5
35,026
44
All Sectors
44,670
284
11,861
41.8
12,688,688
6,288
(combined)
NOTE: Total offstream withdrawals (minus instream use for hydropower) in 1997 = 9,568,233 MG.
SOURCE: USGS Arkansas District Office.
The sample size would be 35,560 withdrawal points or almost 80 percent of the population. When the population variance and the number of users are large (the typical case in water use estimation), the standard error for random sampling and the number of samples required are also very large.
However, the error can be reduced by dividing the population into distinct strata (with smaller strata variances). If optimal stratified sampling is employed, using the use categories in Table 5.1 as the strata, the total number of samples n needed to estimate water use with the same standard error is determined as follows:

OCR for page 86

Estimating Water Use in the United States: A New Paradigm for the National Water-Use Information Program
and from Equation 5.3,
Thus, stratification has the potential to substantially improve sampling efficiency. Stratified random sampling reduces the number of samples needed by grouping water use quantities likely to be similar. In this case study, for example, large uses by power plants are separated from smaller irrigation uses, removing some of the sampling variance or randomness.
Allocating the samples according to Equation 5.2 results in the required numbers of samples within each category, as shown in the column in Table 5.2 for corrected number of samples. In two categories (hydropower and nuclear power), the calculated number of samples required exceeds the population size. This impossibility is corrected for such cases by requiring that all users be sampled. Adding an additional requirement that each category have a minimum of two samples results in the corrected values for nh in Table 5.2.
TABLE 5.2 Number of Required Samples, by Category, to Achieve Approximately 10% Standard Error in the Water Use Estimate
Number of Samples Required, nh
Category of Use
Number of Withdrawal Points
Calculated
Corrected
Final
Standard Error (%)
Irrigation (IR)
41,102
211.9
212
330
16
Agriculture (AG)
1,919
6.6
7
10
49
Water Supply (WS)
1,026
41.2
41
64
86
Industrial (IN)
200
8.0
8
12
112
Commercial (CO)
120
1.6
2
3
202
Fossil Fuel Power (PF)
49
16.9
17
26
52
Minerals Extraction (MI)
33
1.9
2
3
310
Nuclear Power (PN)
15
45.6
15
15
0
Domestic (DO)
4
2.10 × 10-4
2
2
100
Waste Treatment (ST)
4
4.74 × 10-3
2
2
58
Hydropower (PH)
2
5.6
2
2
0
Unknown
198
0.5
2
2
105
All Sectors
44,670
339.9
312
471
10

OCR for page 86

Estimating Water Use in the United States: A New Paradigm for the National Water-Use Information Program
Equation 5.1 allows calculation of the standard error for any sample allocation. Using this equation for the corrected nh shows that the standard error in the estimate of total use is 1,592,912 MG, or about 12.6 percent error. If this value is unacceptable, additional samples could be taken from the 10 categories containing unsampled users. Solving iteratively for the samples required to result in a standard error of 10 percent, the final numbers of required samples (Table 5.2) is obtained. Thus, to achieve 10 percent standard error, the stratified random sampling approach requires 471 samples, less than 1.1 percent of the population. Within each category, random sampling is used to estimate water withdrawals.
The standard error for individual categories is calculated as .
The last column of Table 5.2 shows the standard error by category, as well as the standard error for all sectors combined. Obviously, in the two categories where all users would be sampled, the error is zero (except for unknown measurement errors). Note, however, in all other categories, the standard errors are greater than 10 percent. In some applications, sample planning may also include objectives on allowable errors for individual categories, or allowable errors for total withdrawals by county or region. If so, additional sampling would be required to meet these objectives.
SUBSTRATA DELINEATION
With the stratified random sampling approach, withdrawal measurements are made only on a subset of the population within a category/stratum. The effectiveness of the stratified random sampling approach depends on the variances of the populations within the strata. Where variances within strata are high, it may be useful to divide a stratum into two or more substrata. The equations described early in this chapter could be easily modified to address the situation where boundaries between substrata are obvious and chosen before sampling begins. In other cases, the decision about subdividing a stratum may seem arbitrary, e.g., dividing a range of users into two subcategories of major and minor users. It may be efficient in such a case to sample a smaller proportion of the minor users, particularly when the variability associated with minor users is low. This may occur, for example, when there are large numbers of users with zero withdrawal (see Table 5.1).
Example: Determination of Substrata Boundaries and Assessment of Errors
The main purpose of this example is to examine the precision of estimating stratum water withdrawals with an alternative sampling approach. This approach combines statistical sampling of minor withdrawal points with a census (i.e., a

OCR for page 86

Estimating Water Use in the United States: A New Paradigm for the National Water-Use Information Program
complete enumeration) of major withdrawal points. It is designed to overcome the problems presented by the very heterogeneous population from which the sample is taken. In contrast to a standard sampling problem in which the sizes of the strata are predefined, this example approach simultaneously selects the sample size and the boundary between the two strata. By using an electronic spreadsheet, it is possible to calculate the population statistics for each stratum while moving the stratum boundary from the largest withdrawal points to the smallest. Points in the top stratum are individually measured (i.e., through a census), and a random sample is taken from all the remaining points in the population. If we assume a target standard error of 10 percent of the total withdrawal for groundwater and for surface water, Equation 5.4 can be used to calculate the required sample size for the bottom stratum.
The approach is applied to a single water use category: the population of Arkansas’ 1997 irrigation withdrawals. The 1997 data for Arkansas reported irrigation withdrawal volumes for 41,102 points, of which 5,417 show no water withdrawal in 1997. Table 5.3 summarizes the statistics for the subpopulations of groundwater and surface water withdrawals for irrigation.
Figure 5.1 shows the total sample size (i.e., all points in the top stratum plus sampled points in the bottom stratum) needed to achieve the target standard error as a function of top stratum size for Arkansas’ groundwater and surface water irrigation withdrawals. Note the minimum total sample size required for the groundwater subcategory is less than that for surface water subcategory, even though there are over seven times as many groundwater withdrawal points. This occurs because the relative variability is much lower for the groundwater subcategory.
Still, the delineation of a substratum boundary is very effective in reducing the sampling required for surface water. To estimate the total annual surface withdrawal for irrigation with a 10 percent standard error using random sam-
TABLE 5.3 Population Characteristics for Irrigation Withdrawals, Arkansas, 1997
Characteristic
Groundwater
Surface Water
Number of water use sites
36,053
5,049
Total annual volume, MG
5,492,734
1,278,291
Mean annual withdrawal, MG
152
253
Median annual withdrawal, MG
110
90
Standard deviation, MG
162
1,333
Coefficient of variation
1.1
5.3
Number of samples required
110
163
Standard error of estimate, %
10
10

OCR for page 86

Estimating Water Use in the United States: A New Paradigm for the National Water-Use Information Program
FIGURE 5.1 Optimum sample size for the sampling of groundwater and surface water withdrawals for irrigation using substrata boundaries, Arkansas, 1997.
pling, the required number of samples is 1,789. Using the substratum boundary, the minimum total sample size is 163 (Figure 5.1); the 60 withdrawal points with the largest annual quantities should all be measured (approximately 1.2 percent of all surface withdrawal points for irrigation), and a random sample of 103 withdrawal points (or approximately 2 percent of all points) should be selected from the remaining 4,989 withdrawal points. In contrast, the use of a substratum boundary does not significantly reduce the sampling required for groundwater. Using random sampling, the required number of samples is 110. Using the substratum boundary, the minimum total sample size is 110; the single largest withdrawal points should be measured, and a random sample of 109 points (approximately 0.3 percent) should be selected from the remaining 36,052 groundwater withdrawal points. Hence, with groundwater withdrawals for irrigation, which account for over 80 percent of all withdrawal points in the state database, random sampling is sufficient for statewide water use estimation.

OCR for page 86

Estimating Water Use in the United States: A New Paradigm for the National Water-Use Information Program
Overall, the division of irrigation withdrawals into two subcategories and the use of substrata boundaries (at least for surface water) greatly improve on random sampling within the irrigation category. As shown in Table 5.2, the use of 330 random samples as part of a statewide sampling plan results in a standard error of 16 percent for the irrigation category. Using the minimum sampling indicated above, a total of 273 samples is required, and with a standard error of about 4.4 percent for the total irrigation withdrawal (using Equation 5.1). Still, it is worth noting that there are some practical problems associated with implementing such a plan. For instance, identifying the largest users could be difficult in the absence of census data. This and other issues would need to be considered as part of the sample planning process.
APPLICATION TO STATES THAT LACK WATER USE DATA
The example analysis used known statistics for 1997 water use in Arkansas to develop an optimal sampling plan that could be used in subsequent years. This example demonstrates the substantial, optimal benefit of stratified random sampling. However, random sampling is most needed for states or programs that currently lack data on water use. In these cases, USGS researchers would not have the information needed to develop an optimal sampling plan. However, stratified random sampling can still be employed as long as it is possible to estimate the number of users in a set of use categories. This is true even in the absence of prior knowledge of the site-specific statistics. The sampling plan developed for this situation would likely be nonoptimal, but stratified random sampling does not have to be optimal to be useful. Using Equation 5.1, it is possible to estimate error variance and standard error for water use estimated from any stratified sampling plan. Water use estimates developed from nonoptimal sampling plans will be expected to have larger standard errors than estimates developed from optimal sampling plans with the same number of total samples.
The sampling plan itself (values for all nh) could be established in many different ways when site-specific statistical data are not available. One possibility is to conduct a small preliminary sampling effort. This would allow estimation of category variances needed to prepare an optimal sample plan in accordance with the procedure followed in the Arkansas example. The major difference is that the square root of the preliminary sample variances, , would be substituted for the square root of the population variances, , in Equations 5.2 and 5.3. Sample variances derived from the full stratified sample would be substituted in Equation 5.1 to calculate the error variance for the water use estimate. Uncertainty in the sample variances can be estimated using standard statistical procedures and would contribute additional error to the total water use estimate. Uncertainties in the number of users in the categories could similarly be accounted for in each term in the sum. Thus, uncertainty in category size and variance will

OCR for page 86

Estimating Water Use in the United States: A New Paradigm for the National Water-Use Information Program
increase the standard error of the water use estimate. This situation will likely improve as time passes. The statistical data collected during the first complete stratified sampling effort would be used to design an improved sampling plan for future use.
A second possibility for developing a sampling plan in the absence of prior statistics is to use category variances available from state programs with substantive data collection efforts in the optimal sampling plan Equations 5.2 and 5.3 with the target state’s category numbers. This will work best when similarities in the categorical water use between the two states are expected. Again, discrepancies between variances in the target state and the data-rich state will render the plan less than optimal, but will not impair the ability to assign an estimated error to the water use estimate.
A third possibility exists where the state has the legal authority to register or permit water withdrawals. Permitted withdrawals could be used as an approximation of actual withdrawals for the purpose of sampling plan design.
This discussion has described only a few ideas for setting up a sampling plan for the first survey when a good water use database is not available. More work is needed to expand this list and evaluate the utility of each option.
The sampling plan would also require information on the number of water users in each category. One approach would be to do a census of the water users. However, this approach might be prohibitive in most states. Alternate methods for estimating the number of water users need to be explored. The Census of Agriculture, the Population Census, manufacturing surveys, and other data sources may provide information necessary for water use population estimation (see Chapter 4). The USGS should also consider consulting with and using the services of experts in other federal or state agencies for help in estimating the number of water users within a state.
ISSUES FOR FURTHER RESEARCH
This is a preliminary study of data from a single state. Comparative studies of data from a number of states with differing degrees of data quality are needed to solidify the understanding of how stratified random sampling can best be applied to water use. Some of the questions that remain to be resolved are the following:
Most states have a trigger level for reporting water use, so the population of sites sampled is censored to omit the smallest users. How can stratified random sampling be used to estimate total water use from a sample censored in this manner?
A region may have a particular water use category such as irrigation, for which the total number of water withdrawal points is not known precisely and can

OCR for page 86

Estimating Water Use in the United States: A New Paradigm for the National Water-Use Information Program
only be estimated. How does uncertainty in the size of the total population affect the error of estimate of total water use in the region?
What is reported on most water use surveys as the amount withdrawn or used is not a measurement but is itself an estimate by the water user. Can a measure of these site-specific estimation errors be obtained and incorporated within the error estimates for the water use strata so as to adjust the number of users sampled to allow for the estimate of the error inherent in sampling each user?
The 50 states, the District of Columbia, and Puerto Rico can be thought of as a collection of 52 sampling units, which each have particular characteristics. An attempt has been made in Chapter 2 to classify these sampling units for data quality. How can the data be examined state by state to more rigorously quantify data quality as a function of characteristics such as the type of laws pertaining to water use data collection?
In many states, the samples of water use sites, often obtained by voluntary submission of water use reports, are incomplete. Can incomplete samples of this kind be used in a stratified random sampling framework to arrive at reasonable estimates of the total water use and its standard error?
The examples presented in this chapter are all for annual withdrawals in Arkansas. Does the variability of monthly water use differ sufficiently from that of annual water use to require a different sampling plan?
Measurement errors for withdrawals were ignored in the Arkansas example. Although stratified sampling reduces the sampling requirements, it still requires quality data for water use estimation. Are modifications needed to the statistical approaches to deal with measurement uncertainty, which might vary from state to state?
The USGS has a strong group of water statisticians who have experience in examining the above issues in other contexts, such as analysis of floods, streamflow, and water quality (Helsel and Hirsch, 1992). The statistical studies recommended here are certainly within their range of competence.
CONCLUSIONS AND RECOMMENDATIONS
This chapter has described some of the potential benefits of using random sampling to estimate water use. Benefits may include reduced sampling and associated costs (compared to performing a full census), simple approaches to assessing the quality of water use estimates, as well as the ability to design a sampling plan to meet particular data-quality needs.
One important benefit of random sampling is a potential reduction in data collection efforts. By incorporating variance reduction techniques, particularly optimal stratified sampling, quality water use estimates may be obtained by sam-

OCR for page 86

Estimating Water Use in the United States: A New Paradigm for the National Water-Use Information Program
pling less than 5 percent to 10 percent of users. An example analysis for Arkansas showed that only approximately 1 percent of users needed to be sampled for estimating state-level water use, with a standard error of 10 percent. A greater percentage of users would need to be sampled to achieve the same standard error for regional or county-level estimates or for estimates for individual water use categories.
Random sampling facilitates the use of statistical approaches to calculate and report the quality of water use estimates. For example, in this chapter’s analysis, standard error in water use estimates was chosen as the measure of quality. Other measures, such as confidence intervals, could also be used to describe quality.
Another advantage of random sampling is that it allows water use program staff to readily design a sampling program to meet particular data-quality needs. In the study performed earlier in this chapter, the total number of samples and the allocation of samples among strata were chosen to meet an allowable standard error. In practice, quality targets could be based on the intended uses for the data, ensuring that sampling efforts and staff resources are directed where they are most needed. Furthermore, sampling programs need not be fixed; dynamic sampling could also be beneficial. For example, if a preliminary analysis of sample data shows that variability in a particular use category is substantially higher than expected, it may make sense to update the number and allocation of samples and perform additional sampling.
Many variations on the random sampling approach are possible. The case study demonstrated the use of a hybrid approach to random sampling within a single category or stratum—samples were drawn randomly for “minor” users in the category, and a census approach was used for “major” users. The study showed that for surface water irrigation withdrawals in Arkansas, the hybrid approach required far fewer surface water samples than with random sampling. However, for groundwater irrigation withdrawals, where the relative variability of withdrawals is much lower than for surface water, the hybrid approach offers no advantage over random sampling. Further guidance is needed to help USGS districts select appropriate statistical sampling techniques for water use estimation in individual states.
The quality of results from the random sampling approach is limited by some of the same issues that affect the census approach. Although full surveys are not required of every use category, it is still critically important to have an accurate count of the total number of users. In addition, when stratified sampling is used, all users must be accurately distributed into categories. Maintaining accurate user counts may require a substantial amount of effort. In addition, whether the census approach or random sampling is used, procedures must be established to manage the response rate (e.g., follow-up for surveys not returned). Depending upon the situation, these efforts could be substantial. As a result, further study is needed to determine whether substantial cost reductions could be achieved through stratified random sampling in actual practice. The examples in this

OCR for page 86

Estimating Water Use in the United States: A New Paradigm for the National Water-Use Information Program
chapter certainly support the possibility of reduced costs and justify an additional investigation.
A disadvantage of reducing the proportion of users sampled (by using random sampling rather than the census approach) is the possibility of increased uncertainty in water use estimates. Thus, it is important to ensure that sampled data are accurate and representative. However, as a result of diminishing returns (each additional sample progressively provides less information as the sample size increases), stratified random sampling has the potential for greatly reducing the data collection workload with small, acceptable increases in uncertainty. This beneficial outcome can be obtained by explicitly balancing costs and accuracy. Reducing the quantity of data collected may even allow increased attention to quality for the fewer data collected.
The committee recommends that the USGS develop statistical sampling approaches for water use estimation as part of the National Water-Use Information Program. Site-specific water use data from various states may be useful in developing and evaluating sampling approaches. The National Handbook of Recommended Methods for Water Data Acquisition (USGS, 2000) and the USGS’s internal guides for preparing water use estimates should be updated with a manual of procedures for statistical sampling of water use and determination of total water use estimates and their errors.