Appendix H
Defining and Managing Outliers in MRIP Output: An Order Statistics Approach
The Marine Recreational Information Program (MRIP) produces estimates of recreational fish catch and variance of catch by 2-month wave and by year. These estimates are produced by domain, such as by species, by geographic region, or by fishing mode. Fishing regulations are typically based on recent MRIP catch estimates or statistics derived from MRIP catch estimates, such as the third-largest of the five most recent MRIP catch estimates. In such derived statistics, the influence of so-called outlier estimates on the derived statistics is an important issue. The questions of how to define an outlier, how to decide whether an outlier of a given magnitude should trigger a change in management policy, and how to update management policy given a triggering outlier are important for fishery managers. This appendix presents a method for answering these questions based on the statistical concept of order statistics.
ORDER STATISTICS
The statistical concept of order statistics offers one approach to defining, identifying, and measuring outliers. Order statistics provide a method for determining the probabilities that the first-largest, second-largest, third-largest, etc., in a set of ordered numbers will take particular values.
For example, denote the i = 1…n annual MRIP catch estimates for a particular fish species as X1, X2, …, Xn. Assuming no change in the fish population or fishery from year to year (an assumption that can be tested later), the Xi are independent and identically distributed random variables having a common probability density f(X) and a common cumulative distribution function F(X). For nonrare fish species, f(X) is typically the density function for the normal distribution, and F(X) is the cumulative distribution function for the normal distribution. (For rare fish species, f(X) and F(X) could be the probability mass function and cumulative distribution function for the Poisson or negative binomial distribution [see Appendix E].)
Arrange the X1, X2, …, Xn values in order from smallest to largest, and use subscripts j = 1 to n in parentheses to denote the order of the values as shown below:
X(1) = the smallest of the set X1, X2, …, Xn.
X(2) = the second-smallest of the set X1, X2, …, Xn.
X(j) = the jth-smallest of the set X1, X2, …, Xn.
X(n – 1) = the second-largest of the set X1, X2, …, Xn.
X(n) = the largest of the set X1, X2, …, Xn.
PROBABILITY DISTRIBUTIONS OF ORDER STATISTICS
It can be shown (Ross, 1988, p. 225) that the joint density function fjoint of the order statistics is given by:
The density function f(X(j) = x) of the jth-order statistic X(j) can be obtained by integrating the joint density function above to find (Ross, 1988, p. 227):
The cumulative distribution function F(X(j) ≤ b) of the jth-order statistic X(j) can be obtained by integrating the density function f(X(j) = x) of the jth-order statistic to find (Ross, 1988, p. 227):
For example, F(X(n – 2) ≤ 3000) gives the probability that the third-largest catch out of n catches is less than or equal to 3,000. Similarly, 1 – F(X(n – 2) ≤ 3,000) gives the probability that the third-largest catch out of n catches is greater than 3,000.
FISHERIES APPLICATIONS: DEFINING AN OUTLIER
First, consider the problem of trying to determine whether the largest value of catch in n time periods is an outlier. Assume that i = 1 to n time periods of catch data are available for a nonrare fish species, where fish catch in each time period i, Xi follows a normal distribution f(Xi), with the same mean μ and variance σ2 for all i, and where F(Xi) is the normal CDF of Xi. Suppose fishery managers are trying to decide whether the largest value of catch from the n time periods, namely X(j = n), is an outlier. One possible definition of outlier would be any value of X(j = n) with a chance of occurring that is less than the fishery manager’s preselected level of statistical significance (say, 5 percent). The “threshold” value of catch denoted b for this definition of outlier would be the value of b that is the solution to:
Hence, if the largest catch X(j = n) in the n time periods is greater than b, it would be considered an outlier because it has a less than 5 percent probability of occurring by chance alone. Similarly, if a fishery regulation were based on the third largest of the five most recent MRIP catch estimates,
then the threshold value c for the third-largest catch estimate in n = 5 catch estimates to have a 5 percent chance of occurring is the solution to:
where any value for the third-largest catch greater than c would be considered an outlier.
FISHERIES APPLICATIONS: DECIDING WHETHER AN OUTLIER SHOULD TRIGGER A MANAGEMENT CHANGE
If an outlier were to occur, fishery managers would first check to ensure that the outlier was not due to an error in the data or an error in data processing. If the outlier were not due to an error, managers would need to decide whether (1) the outlier occurred by chance alone, and so should not trigger a change in fishery management policies (e.g., a change in control rules); or (2) the outlier is an indication that either the fish population or the fishery is changing, and that as a result, the probability distribution of X is shifting, so the outlier should trigger a change in fishery management policies. Typically, fishery managers would use their prespecified level of statistical significance (say, 5 percent) to decide between (1) and (2). If the outlier exceeded the threshold value of catch (such as b or c in the above examples), managers would decide that either the fish population or the fishery was changing, and that as a result, the probability distribution of X was shifting, so fishery management policies should be changed (or at least warrant further investigation).
FISHERIES APPLICATIONS: HOW TO UPDATE MANAGEMENT POLICY GIVEN A TRIGGERING OUTLIER
Given a triggering outlier, the outlier value of catch would be used to update the probability distribution of fish catch using Bayesian updating methodology as described under the Bayesian model of in-season management outlined in this report (see Appendix D). Other fisheries management policies (e.g., control rules) could then be updated based on the updated probability distribution of fish catch.
REFERENCE
Ross, S. 1988. A First Course in Probability, 3rd Edition. New York: Macmillan.
This page intentionally left blank.