Parkfield prediction have responded to the failed prediction by modifying the parameters, rather than by abandoning the entire theory on which it was founded.

Some predictions (like Parkfield) are intended to be terminated as soon as they are satisfied by a qualifying earthquake, while others, such as those based on earthquake clustering, might be “renewed” to predict another earthquake if a qualifying earthquake occurs. Thus, when beginning an earthquake prediction test, it is helpful to describe in advance any conditions that would lead to a termination of the test.

Suppose there are P predictions, either for separate regions, separate times, separate magnitude ranges, or some combination thereof. To simplify the following discussion, we refer to the magnitude-space-time interval for each prediction as a “region.” Let *p*_{0}_{j}*,* for (*i*=1,…, *P*), be the random probabilities of satisfying the predictions in each region, according to the null hypothesis, and let *c*_{i} for (*i*=1,…, *P*), be 1 for each region that is “filled” by a qualifying earthquake, and 0 for those not filled. Thus, *c*_{i} is 1 for each successful prediction and zero for each failure. According to this scheme only the first qualifying earthquake counts in each region, so that implicitly the prediction for each region is terminated as soon as it succeeds. A reasonable measure of the success of the predictions is the total number of regions filled by qualifying earthquakes. The probability of having as many or more successes at random is approximately that given by the Poisson distribution

**[2]**

where *λ* is the expected number of successes according to the null hypothesis. The choice of the rate parameter *λ* requires some care. If multiple events in each region are to be counted separately, then *λ* is simply the sum of the rates in the individual regions, multiplied by the time interval. If, instead, only the first earthquake within each region counts, then *λ* is the sum of the probabilities, over all regions, of success during the time interval *t*. The difference between the two approaches is small when all rates are small, so that there is very little chance of having more than one event in a region. The difference becomes important when some of the rates are high; in the first approach, the sum of the rates may exceed *P/t,* whereas this is prohibited in the second approach because each region may have only one success. The first approach is appropriate if the physical hypothesis predicts an enhanced rate of activity that is expected to continue after the first qualifying event in each region. The second approach is more appropriate when the hypothesis deals only with the first event, which changes the physical conditions and the earthquake rates. An advantage of the second approach is that it is less sensitive to treatment of aftershocks. Under the first approach, aftershocks must be specifically included in the prediction model or else excluded from the catalog used to make the test. Whether aftershocks are explicitly predicted or excluded from the catalog, the results of the test may be very sensitive to the specific algorithm used to predict or recognize aftershocks.

Eq. **2** is approximate, depending on *λ* being large compared to one. There are also a few other, more subtle, assumptions. A more robust, but also approximate, estimate of the probabilities may be obtained by simulation. Assume the second approach above, that the prediction for each region is either no earthquake, or at least one. Draw at random a large number of simulated catalogs, also represented by *c*_{i} for (*i*=1,…, *P*). For each catalog, for each region, draw a random number from a uniform distribution between 0 and 1; if that random number is less than *p*_{0}_{j}*,* then *c*_{i}=1; otherwise, *c*_{i}=0. For each synthetic catalog, we count the number of successes and compile a cumulative distribution of that number for all synthetic catalogs. Then the proportion of simulated catalogs having *N* or more events is a good estimate of the corresponding probability. Simulation has several advantages: it can be tailored to specific probability models for each zone, it can be used to estimate other statistics as discussed below, and its accuracy depends on the number of simulations rather than the number and type of regions. Disadvantages are that it requires more computation, and it is harder to document and verify than an analytic formula like Eq. **2.**

The “M8” prediction algorithm of Keilis-Borok and Kossobokov (2) illustrates some of the problems of testing prediction hypotheses. The method identifies five-year “times of increased probabilities,” or TIPs, for regions based largely on the past seismic record. Regions are spatially defined as the areas within circles about given sites, and a lower magnitude threshold is specified. TIPs occur at any time in response to earthquake occurrence; thus, the probabilities are strongly time dependent. The method apparently predicted 39 of 44 strong earthquakes over several regions of the world, while the declared TIPs occupied only 20% of the available space-time. However, this apparently high success rate must be viewed in the context of the normal seismicity patterns. Most strong earthquakes occur within a fairly small fraction of the map, so that some success could be achieved simply by declaring TIPs at random times in the more active parts of the earth. A true test of the M8 algorithm requires a good null hypothesis that accounts for the spatial variability of earthquake occurrence. Constructing a reasonable null hypothesis is difficult because it requires the background or unconditional rate of large earthquakes within the TIP regions. However, this rate is low, and the catalog available for determining the rate is short. Furthermore, the meaning of a TIP is not explicit: is it a region with higher rate than other regions, or higher than for other times within that region? How much higher than normal is the rate supposed to be?

An earthquake prediction hypothesis is much more useful, and much more testable, if probabilities are attached to each region. Let the probabilities for each region be labeled *p*_{j}*,* for *j*=1 through P. For simplicity, I will henceforth consider the case in which a prediction for any region is terminated if it succeeds, so that the only possibility for each region is a failure (no qualifying earthquake) or a success (one or more qualifying earthquakes). In this case, several different statistical tests are available. Kagan and Jackson (3) discuss three of them, applying them to the seismic gap theory of Nishenko (4). Papadimitriou and Papazachos (5) give another example of a long-term prediction with regional probabilities attached.

*(i)* The “*N* test,” based on the total number of successes. This number is compared with the distributions predicted by the null hypothesis (just as described above) and by the experimental hypothesis to be tested. Usually, the null hypothesis is based on the assumption that earthquake occurrence is a Poisson process, with rates determined by past behavior, and the test hypothesis is that the rates are significantly higher in some places. Two critical values of *N* can be established. Let *N*_{1} be the smallest value of *N,* such that the probability of *N* or more successes according to the null hypothesis is less than 0.05. Then the null hypothesis can be rejected if the number of successes in the experiment exceeds *N*_{1}. Let *N*_{2} be the largest value of *N,* such that the probability of N or fewer successes according to the test hypothesis is less than 0.05. Then the test hypothesis can be rejected if *N* is less than or equal to *N*_{2}. If *N*_{1} is less than *N*_{2}, then there is a range of possible success counts for which neither hypothesis can be rejected. According to classical hypothesis testing methodology, one does not