Read "Spatial Statistics and Digital Image Analysis" at NAP.edu

« Previous: 5. Geostatistical Analysis of Spatial Data

Page 109 Cite

Suggested Citation:"6. Spatial Statistics in the Analysis of Agricultural Field Experiments." National Research Council. 1991. Spatial Statistics and Digital Image Analysis. Washington, DC: The National Academies Press. doi: 10.17226/1783.

Page 110 Cite

Page 111 Cite

Page 112 Cite

Page 113 Cite

Page 114 Cite

Page 115 Cite

Page 116 Cite

Page 117 Cite

Page 118 Cite

Page 119 Cite

Page 120 Cite

Page 121 Cite

Page 122 Cite

Page 123 Cite

Page 124 Cite

Page 125 Cite

Page 126 Cite

Page 127 Cite

Page 128 Cite

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

6 Spatial Statistics in the Analysis of Agricultural Field Experiments v 6.1 Introcluction Julian Besag University of Washingtoni The main purpose of agricultural field experiments is to compare the effec- tiveness of different treatments (e.g., fertilizers) on a particular crop variety or to make comparisons between different varieties of the same crop. Ac- curacy is paramount, but valid assessment of error is also important. A typical experimental layout consists of a linear or two-dimensional array of contiguous rectangular strips of lancl, called plots, each of which is (le- votec! to a single treatment or variety. Plots are usually Tong and narrow (e.g., 20 m x 2 m), partly as a trade-off between ease of management and compactness of the experiment. In a linear layout, the longer sides are chosen to abut one another, so as to minimize the impact of fertility gradients across plots (see Figure 6.1~. The most common measurement is that of plot yield at harvest, which in an ideal world would provide a direct assessment of the corresponding treatment or variety eject. However, yield is influenced by external factors such as weather and plot fertility. It can often be assumed that weather has a uniform effect, in which case comparisons remain valid under more general conditions; otherwise, several experiments may be required. On the other hand, variation in fertility over the experimental region is usually substantial ~ Now at University of Newcastle upon Tyne. 109

110 and if ignored can lead to quite erroneous comparisons. Proper design and analysis of field experiments aims to minimize such problems. By the term design, here we mean, somewhat narrowly, the rule by which treatments are allocated to plots (henceforth we include the possibility that treatments are in fact different varieties). Clearly, the task of controlling for variation in fertility is inherently spa- tial and is the focus of this chapter. In §6.2, we discuss general background, largely from a historical perspective, and in §6.3, we consider some recent progress and possible future directions at a more technical level. First, how- ever, it should be noted that there are other types of field experiments that are not strictly covered by the above description. For example, interest may center on a measurement other than yield, such as resistance to dis- ease or quality of product. Also there are experiments that are multisite or multistage or that involve mixtures, spacings, intercropping, competition, interference, and so on. Thus some experiments involve spatial considera- tions rather different from those on which we concentrate in this chapter; for example, in assessing resistance of different treatments to a particular pest, the main problem may concern patchiness of infestation over the experi- mental region. Nevertheless, we hope that in focusing on a single important topic, the richness of the subject as a whole will not be lost. 6.2 General Backgrounc! Methods of controlling for variation in fertility across the experimental re- gion have a long history. Perhaps the first was the use of check plots (e.g., Wiancko, 1914~; that is, plots interspersed regularly at frequent intervals throughout the experiment and containing a standard treatment. The yields from these plots can be used to calculate a local fertility index for each ex- perimental plot and to adjust its yield accordingly; the assumption is that variations in adjacent plots are relatively small. Check plots are still em- ploye<1 in early generation selection trials where it is required to choose say logo from many hundreds of varieties for further assessment. This selec- tion is done at a stage where only a single plot is available to each vari- ety because of restrictions of management, space, and the quantity of seed available. Besag and Kempton (1986) give an example involving 1560 (liffer- ent lines of winter wheat. However, for general use, check plots are rather crude, demand additional space, labor, and expense, and are somewhat self- defeating in that experimental plots become even more widely (lispersed

111 over the field. Furthermore, there is no obvious means of judging the relia- bility of estimates. This is also true of systematic designs, such as the Knut Vik square, in which several plots, regularly dispersed over the experimental region, are devoted to each treatment. In this case, control for fertility is implicit rather than explicit; each treatment set of plots should be subject to approximately the same variation. Systematic designs also have a long history (e.g., Beaven, 1909) and still find favor in some Nordic countries. Two other possibilities are (~) the use of soil analysis in each plot to con- struct a fertility index, although this appears to have little if any support in practice, and (2) the incorporation of data from previous experiments, although this may be awkward operationally and requires the generally du- _ _ _ O ~ bious assumption that fertility gradients remain approximately static from year to year. The method of control that has now become standard in most coun- tries was first proposed by R. A. Fisher in the 1920s. Fisher's triumph was to construct an entirely self-contained inferential framework that is valid whatever the pattern of fertility might be, subject only to an assumption of treatment additivity; that is, it is assumed that the relative eject of any particular treatment would be the same on any plot. The methodology relies on three key ingredients: replication, Mocking, and randomization. Replica- tion means that each treatment appears several times, usually with equal frequency, in the experiment; this generally improves accuracy and provides a basis for its assessment. Blocking implies that there are restrictions on the allocation of treatments to plots, which are imposed to counteract suspected fertility gradients. The intention is that fertility should be approximately constant within blocks, so that corresponding differences in plot yields are meaningful; this further improves the accuracy of treatment comparisons. Finally, randomization requires that treatments are allocated to plots en- tirely at random within the constraints of the blocking structure; it is this step that ensures the unbiasedness of treatment comparisons (contrasts) and the validity of the associated standard errors within the inferential frame- work. Since this framework is not at all obvious, we provide some brief discus- sion in the particular context of the simplest common example, namely, the randomized complete blocks design, for which blocks and replicates coincide. Each plot yield is assumed to be the sum of three components, a fixed block effect, a treatment effect, and a plot effect. The natural means of introduc- ing randomness into the mode! is to suppose that the plot effects represent a realization of a spatial stochastic process. The question, which is addressed

112 in §6.3, is, what process? Fisher brilliantly circumvented this problem by as- suming plot effects to be fixed and by introducing randomness solely through the act of allocating treatments to plots. In other words, it is the very act of randomization that alone induces a probability distribution in the yields, and hence a basis for inference. For details see Kempthorne (1952, Ch. S) but it turns out that the calculation of treatment estimates and associated standard errors coincides exactly with an ordinary least-squares analysis of the corresponding linear mode! assuming a fixed layout and uncorrelated random plot ejects. For this reason, it is sometimes assumed that the latter formulation underpins the Fisherian analysis, whereas the two moclels are quite distinct, with the first addressing a fixed field with a randomly chosen layout and the second a random field with a fixed layout. Of course, in the second, an assumption of uncorrelated or equi-correlated plot effects within blocks is untenable if fertility gradients exist, as is generally the case unless the number of treatments (i.e., blocksize) is very small. Finally, note that the "usual" construction of confidence intervals based on the t-distribution can be shown to be approximately valid under randomization. Thus, the main problem in adopting a randomized complete blocks cle- sign is not in the validity of the analysis (although one might challenge the relevance of the conceptual framework) but in its lack of sensitivity. A graph of estimated plot effects against their locations in the experiment will almost inevitably display substantial spatial structure; to ignore this is extremely wasteful. The classical remedy has been to develop much more sophisticated designs that employ a local blocking structure within which it can be more reasonably assumed that fertility is effectively constant. The most efficient designs, such as completely balanced lattice squares, are rarely practicable because of restrictions on the number of treatments and replicates. This difficulty has been met by the introduction of compromise designs, baser! on partially balanced incomplete blocks, and these are now used quite widely, especially in variety trials (Patterson and Hunter, 1983~. However, a new problem arises with sophisticated designs, for there no longer exists a proper justification for the use of Gaussian-theory confidence intervals, unless ad- ditional or different assumptions are made in the statistical formulation. Furthermore, despite the obvious merits of sophisticated designs, alarge proportion of experiments in the world employs nothing more complicated than randomized complete blocks, whether for reasons of tradition or ease of management. Thus it is important that methods of analysis be avail- able that adopt explicit spatial models for fertility, if only as a means of salvaging badly designed experiments (Bartlett, 197S, 1988~. Of course, it

113 would be unrealistic to expect a spatial mode} to provide more than a crude representation of a true fertility process, but this is probably unimportant, since (1) it is the replicated treatment effects rather than the individual plot fertilities that are of primary concern and (2) the purpose of the model is essentially one of interpolation rather than extrapolation. There is an interesting contrast here with the usual requirements in time series analysis. In fact, the idea of extracting information from neighboring experimental plots as a means of controlling for variation is not at ah new and was first proposed in the 1920s by J. S. Papadakis, a distinguished Greek agronomist. Unfortunately, his entirely empirical approach received very little attention by others; for some historicaIreflections, see Bartlett (1988~. Nevertheless, Papadakis himself continued to use and develop his method over several decades (see Papadakis, 1984), and it is instructive to consider one particular version below. Thus, let y denote the vector of observed yields, with plots indexed in some convenient manner, and suppose y = Tr+x+z, (6.1) where T denotes treatment effects, T is the corresponding fur-rank treatment design matrix, x represents the (fixed) fertility effects measured about zero, and z is residual error. If x* denotes a current assessment of x and is presumed to be correct, the corresponding ordinary least-squares estimate O : ~ IS ~ (x ~ = (TTT) 1TT(yx ). (6.2, This provides a reassessment yTT* of x but leads to circularity in the absence of some form of constraints on the parameter space. Papadakis resolved this difficulty by using as the new estimate of x X*(T*) = H(Y - TT*), (6.3) where ET is a matrix that reflects anticipated similarity in fertility between neighboring plots. For example, in a two-dimensional layout, the fertility in any particular plot might be estimated by the average of the residuals in the four adjacent plots, with an appropriate modification at the boundary of the experiment. Papadakis initiated (6.2) with x* = 0 and then iterated between (6.2) and (6.3), either for a prescribed number of cycles or until convergence. Here we concentrate exclusively on the latter option, referred to as the iterated! Papadakis procedure. It follows that the final estimate A*

114 satisfies so that T* = (~TTT>'ITTLY _ iI(yTT a)], TT(~_ B)TT* = TAT_ H\)Y. (~6.4) Thus an alternative viewpoint is that r* is the generalized least-squares estimate of T when z is negligible and x is interpreted as a realization from a spatial stochastic process with (possibly generalized) inverse covariance matrix proportional to ~At, assuming this is symmetric positive (semi-) definite. The implication is that Papadakis' empirical procedure may have a sepa- rate interpretation as a formal model-based approach to fertility adjustment. We investigate this and consider generalizations in §6.3. Of course, at this stage, there is no guarantee that r* has any particular merit, or that the ~~ induced by typical Papadakis adjustment holds appeal as inverse co- variance matrices in a random field formulation. Finally, note that, where in our discussion ~~ is singular, estimates of treatment contrasts rather than T* itself wiB be uniquely determined. 6.3 Some Recent Progress ant! Future Directions 6.3.1 Aims It has already been noted that, in most field experiments, plots are long and narrow. It follows that, even when the layout itself is two-dimensionaI, in- ternal control for fertility variation is usually profitable only in the direction of the shorter plot axis. Thus, in §6.3.2, we concentrate on one-dimensional adjustment. In particular, we first discuss the role of simple stochastic mocI- els in experiments that only involve a single linear array (column) of plots; the results extend immediately to trials that in effect employ several sepa- rate columns. Then, in §6.3.3, we tackle the less common but nevertheless important situation in which genuine two-dimensional adjustment is nec- essary; this is a topic that requires considerable further research. Finally, in §6.3.4, we briefly discuss some other approaches and some outstanding problems. In §§6.3.2 and 6.3.3, we assume a formulation that is in accordance with equation (6.~), where now y, x, and z are interpreted as realizations of spatial stochastic processes Y. X, and Z; thus, Y = TT+X+Z, (6.5)

115 where Y is the vector of random plot yields, T represents fixed treatment effects, and T is the design matrix for the experiment. We further suppose that the components of Z are uncorrelated, with zero means and common variance lo, and that Z and X are uncorrelated; Z takes account of residual errors and is often negligible in practice when compared with the variation in the fertility process X. 6.3.2 One-Dimensional Adjustment In discussing specification of the fertility process X for layouts that consist of a single column of n plots, it proves convenient initially to consider an ostensibly infinite column, with plots labelled i = 0,+1,..., according to their positions with respect to a reference plot 0. Let Xi denote the random fertility in plot i, measured about zero. The simplest specification of the Xi's that departs from independence is the classical first-order stationary autoregression in which the lag k autocorrelation is Pk = A~k~,k = 0,+1,.. where ~~ < 1. The corresponding autocorrelation generating function is ·7 oo ~ _ `2 C(u) ~ pku 1 + A2 _ A(u + u ) (6.6) In the present context, the above unilateral model is more naturally formu- lated as a bilateral autoregression, with E (Xi all xj, j if i) = ,B(Xi_1 + Xi+l ), var(Xi~ all xj,j 76 i) = K, (6.7, where ,B = A/(1 + A2) and ~,0~ < 2. The equivalence is confirmed by noting that (6.7) implies that the corresponding autocorrelations Pk satisfy Pk = 3(Pk1 + Pk+~), k = +1,+2, . . ., (6.S, and hence that Pk = inks. The Nudity between the unilateral and bilateral formulations does not generally extend to higher dimensions and rests on the factorization h(u~h(u-~) of the denominator in (6.6), where in(u) = 1Au see §6.3.4 for some further comments. Of course, in reality, neither X nor Z is observed but only Y over a finite column of plots, i = 1, 2, . . . ,n, say. If ~ = w/2n and ,5 were known, then estimation of T COUi] proceed by generalized least squares. Otherwise,

116 several methods of estimating at, a, and ,B are available. For example, one might assume additionally that X and Z are Gaussian and apply standard maximum-likelihood estimation (cf. Tiao and Ali, 1971, in a different con- text) or the residual maximum likelihood (REME) variation (Patterson and Thompson, 1971~. What is important here is that in practice it is common that the estimate of ax is zero or close to zero and that of,0 is very close to its maximum possible value, 2. It is therefore instructive to consider both or = 0 and ,(} ~ - in more detail. In each case, we again begin with an infinite line of plots. First suppose that ~ = 0 with ,(] known and let H denote the doubly- infinite matrix with (i, j) element H~ J ~ i = ii 1 ~ l O otherwise Then it is easily checked that the inverse covariance matrix (i.e., precision matrix) for Y is proportional to ~H. It follows that the generalized least squares estimate T* of T agrees with the iterated Papadakis estimate in (6.4) for which fertility in plot i is estimated at each stage by ,0 x {sum of the residuals in the two adjacent plots). This provides a useful connec- tion between the present model-based approach and that espoused empir- icaDy by Papadakis. Furthermore, the duality between bilateral modeling and Papadakis' method is perfectly general, provided H in (6.3) is sym- metric positive (semi-) definite, and extends not only to more complicated one-dimensional adjustment but also to higher dimensions (cf. §6.3.3~. Of course, for real experiments with finite numbers of plots, it holds only as an approximation because of edge effects. Second, suppose that ,l} ~ 2 for fixed ct. Though the distribution of X itself degenerates, the differences XiXj remain well behaved with zero means and with variances var(Xi Xj) ~ _ >2 Hi JO as ,0 ~ -. Equivalently, first differences XiXi+~ are, in the limit, un- correlated and have equal variance 2n, so that X can be thought of as a random walk. It is also of interest that, in the limit ~ = -, the conditional expectation formulation (6.7) remains valid, even though the marginal mean of Xi is undefined and the marginal variance is infinite. This is the most

117 basic example of an intrinsic autoregression (Kunsch, 1987) and provides a stochastic version of simple linear interpolation; see also §6.3.3. The degen- eracy is unimportant in practice because it is only comparisons of treatment effects that are being assessed. This becomes apparent algebraically if we return to the actual experiment. Thus, let Ui = Yi - Yin+, i = 1, . . ., n - 1, or, in vector notation, U = MY, where ~ is the n - 1 by n matrix taking first differences of the Yi'S. It is convenient at this point to single out the overall level ~ of the experiment. Suppose T iS a ~vector; then perhaps after reparameterization, we can write Tr = y1 + Do, (6.9) where 1 is an e-vector of l's, ~ is a p1 vector of (relative) treatment effects, and D is an ~ by p1 design matrix of rank p- 1. Since any treatment contrast ~ = Ate, where aT1 - 0, can be written in terms of i, it is sufficient to concentrate on the estimation of I. The mean and variance-covariance structures of U are given by E(U) = Ff, ~ , DfU) = 2nQ, (6.10) where F = AD, Q = ~ + orbit' and ~ is now the (n1) x (n1) identity. Note the absence of end-plot problems and the retention of information on treatment contrasts in the reduction of the data to n1 first differences. Finally, let Icy denote the generalized [Least-squares estimate of i, so that a* =(FTQiFy iFTQ u, where u is the observed value of U. (6.11) Two special cases of (6.~) merit attention. At one extreme, fO is the ordinary least-squares estimate of ~ based on a; at the other, bOO is the ordi- nary least-squares estimate of ~ based on y (cf. Besag and Kempton, 1986~. Thus, for any intermediate value of a, i,' provides a compromise between the ordinary least-squares estimates of ~ based on u and on y, respectively; this resembles the combination of intra- and interblock information in the classical analysis of incomplete block designs but here using the notion of a moving block of size two. The above discussion indicates that there is little point in retaining a flexible value off and that ,B = - should be adopted from the outset. In practice, it is still often found that the estimate of or is essentially zero,

118 TABLE 6.1: Layout and Yields (t/ha) for 62 Varieties of Winter Wheat Repl icat e 1 Variety Yield 10 7.32 50 6.34 18 7.44 58 8.54 42 7.26 26 7.50 2 6.92 34 8.46 56 7.22 32 8.22 16 7.15 8 6.90 24 7.48 48 7.02 40 8.16 41 7.92 9 7.56 33 8.57 49 8.57 1 7.35 25 8.85 57 8.06 17 8.10 45 8.64 61 8.41 13 8.95 37 7.39 5 7.30 29 8.31 53 8.87 21 7.83 20 8.45 36 7.07 52 8.00 44 8.28 4 7.48 60 8.69 28 6.88 12 8.17 62 7.99 14 7.48 54 7.17 22 7.67 6 7.60 38 7.40 46 8.55 30 7.50 39 7.75 15 4.82 55 8.40 31 9.02 47 7.57 23 9.12 7 9.96 11 8.51 19 8.55 59 9.14 3 7.70 43 8.43 27 7.98 51 7.66 35 8.24 Repl icat e 2 Variety Yield 58 7.52 31 7.50 40 6.61 41 7.00 4 5.59 22 6.01 13 7.52 51 7.07 27 7.63 18 7.28 8 6.32 45 7.43 55 7.31 62 8.36 36 7.04 9 7.23 56 8.16 28 6.86 46 8.21 19 8.54 10 7.56 37 7.41 1 7.35 44 8.50 26 9.01 17 8.60 54 7.25 61 8.39 35 8.45 16 7.70 7 9.46 48 7.28 57 7.71 3 7.46 30 7.63 50 6.32 21 7.29 12 8.29 39 7.09 49 8.30 2 7.49 20 8.94 11 8.09 29 7.99 47 8.05 38 7.38 32 9.00 52 9.24 59 9.60 33 9.13 42 8.20 14 7.90 23 9.26 5 7.80 15 6.28 43 8.95 53 8.96 25 9.32 24 8.24 34 9.15 60 9.40 6 8.17 Replicate 3 Variety Yield 32 7.29 14 6.70 23 7.57 51 7.38 33 7.71 2 6.78 60 8.44 45 7.61 48 6.93 9 7.23 36 6.76 5 6.19 27 7.86 18 7.82 54 6.69 34 8.35 25 8.35 24 7.69 52 8.25 3 7.77 15 4.38 61 8.31 46 8.45 11 8.30 42 7.43 7 9.51 38 6.87 20 8.69 57 7.69 56 7.84 29 7.96 6 7.75 19 7.98 55 7.82 28 6.58 37 7.51 10 7.31 41 7.90 35 8.60 62 8.22 16 7 ~ 55 53 8.52 26 8.58 47 7.15 4 8.37 17 7.89 59 9.16 13 9.04 1 7.72 40 8.15 31 8.98 50 7.10 44 8.20 22 8.20 58 -8.92 39 7.68 8 6.78 21 7.67 49 8.57 12 8.14 43 7.95 30 7.67

119 especially if due allowance for outTiers and jumps in fertility is made (cf. §6.3.4~; this leads to a very simple estimation procedure. Although design is not a consideration here, it is worth noting that if plots 1 and n contain the same treatment, {0 is a linear combination of the n - 2 second differences hi-2Yi + yin+ and is therefore invariant to linear trends in fertility and approximately so to locally linear trends (which are of greater concern). Incidentally, it also turns out that [0 is featured in several other proposals for fertility adjustment and provides an agreeable unity between ostensibly different approaches; again see §6.3.4. The general analysis with ,B = 12 extends immediately to several (effec- tively) independent columns of plots. The single parameter fly is replaced by a vector of separate column effects and, subject to the usual necessity of such terms, there is again no Toss of information on treatment contrasts by the reduction of the data to first differences within columns. For further details, including determination of stanciard errors, the analysis of a facto- rial experiment on triticale, and an assessment of accuracy and precision, see Besag and Kempton (1986~. Here we illustrate first-differences analysis on an official United Kingdom trial for winter wheat, carried out by the East of Scotland College of Agri- culture and involving final assessment of 62 different varieties. The layout of the experiment, in three physically separated complete replicates, and the corresponding yields (t/ha) are listed in Table 6.1. The yields are graphed against plot position in Figure 6.1 (top); there is clear evidence of mod- est fertility gradients within replicates. First-differences analysis (i.e., with ~ = 2 ~ produces the estimate ~ = 1.76 and the decomposition of yields into relative variety effects, fertility effects, and resicluals shown in the bottom three panels of Figure 6.~. The standard errors of pairwise differences be- tween varieties range from .202 to .238 with a mean of .223, compared with the value .418 for a complete block analysis. Thus, there is substantial im- provement in precision and presumably in accuracy of variety estimates; for objective methods of assessment, see the penultimate paragraph of §6.3.4. Moreover, fertility gradients are often more pronounced than in this partic- ular experiment (see, for example, the associated trials analyzed by Green et al., 1985), particularly in countries that do not have the temperate climate of the United Kingdom. In fact, because of the importance of the above trial, the within-replicate layout conformed to a generalized lattice design. This confers no particular benefit to the first-differences analysis, except perhaps to reduce the range O

120 replicate 1 2 ~_f raw yields + fertility + residuals f 'I' ~~ _ ~ 21· ~ ._ ,. <~,_, ,, I FIGURE 6.1: Raw yields and subsequent first-differences decomposition for 62 varieties of winter wheat. of the standard errors, but enables classical incomplete-blocks analysis to be carried out. The corresponding standard error of a varietal difference, kindly supplied by Professor H. D. Patterson, is .235. The fairly close agreement with spatial statistical analysis seems typical but of course the latter does not require a sophisticated design and applies equally to the simple layouts encountered more commonly in practice. 6.3.3 Two-Dimensional Formulation In this section, we seek to generalize the previous one-dimensional formula- tion. Thus, we again adopt equation (6.5) but now identify plots by integer

121 pairs of Cartesian coordinates i = (il,i2). Practical aspects are not devel- oped to the same extent in two dimensions, the main problem being that edge effects are much more important than in one dimension (Guyon, 1982; DahThaus and Kunsch, 1987) and must not be ignored, although their effects are sometimes overemphasized. The generalization of equation (6.7) to a two-dimensional model is given by E (Xii all Xj, j 7t i) = pl~xil_l,i2 + xil+l~i2) + p2(xi~,i2-l + xil,i2+l), var(Xi| all xj,j ~ i) = K, (6.12) where ~,B~ + ~2~ < 2; we assume neither 9~ nor p2 iS zero. It follows that the autocorrelations Pk satisfy Pk = ~l(Pkil,k2 + Pki+l.k2) + 32(pk~,k21 + Pki,k2+1) (6.13) for k 76 (O. O) and that the corresponding generating function is oo oo C(U) ~ ~ Pkul1u22 or [1,ll1(ul+ul 1),(12(U2+U2 1)] 1, (6.14) kl=ook2=oo which cannot be reproduced by any finite unilateral autoregression. For- mulae exist for the low-order autocorrelations but generally the pk'S are best calculated by recursive algorithms (these can be quite delicate) or ap- prox~mated using Bessel functions; for details, see Besag (1981~. Equation (6.12) can be easily extended to include more distant plot fertilities, with appropriate modification to (6.13) and (6.14~; see Rosanov (1967) or Besag (1974), though (6.12) itself dates back to Levy (1948~. If Z is negligible (i.e., ct = 0), the equivalent iterated Papadakis adjustment (6.3) for any particular bilateral autoregression can be written down immediately. Given a partial realization of X in (6.12) over a finite region, the parame- ters ,0~, 32, and ~ can be estimated by matching the theoretical variance and first-order autocovariances with the edge-corrected (i.e., unbiased) empirical values (Besag, 1974; Guyon, 1982~. This generalizes to arbitrary bilateral autoregressions and, if X is Gaussian, is equivalent to asymptotic maximum- likelihood estimation. However, here we are concerned with observation on Y in (6.5) rather than on X. The incorporation of treatment effects T iS straightforward but that of random error Z is more problematical, primarily because of edge effects. When Z is ignored, it is generally found that the estimate of ,B~ + p2 iS very close to -. One means of including Z is to make

122 toroidal assumptions, identifying opposite edges of the field (Besag, 1977~. Although this device is of minimal direct practical relevance, one might rea- sonably expect that it should depress the estimates of p~ and 3:, and yet the above behavior seems to be reproduced. Thus, we might well abandon the stationarity assumption and rather adopt (6.12) with ,0~ + p2 = 2; that is, on the infinite lattice, an intrinsic bilateral autoregression of class zero (Kunsch, 1987~. Such a formulation is of course entirely consistent with the one-dimensional development in §6.3.2; again, we may expect that Z will often be negligible, which if assumed from the start would lead to en- tirely straightforward estimation although we do not wish to exclude the possibility of a non-zero or. ~ v Certainly the problems of estimation are not insuperable but they require further research, especially as regards standard errors for treatment con- trasts; these must retain approximate validity somewhat outside the narrow confines of the model itself. We briefly consider the assessment of different methods in §6.3.4, but here we conclude with some remarks concerning the structure of intrinsic autoregressions. Thus, suppose ,( = ,5 and p2 = 2 - in (6.12~. In the absence of stationarity, we need a new measure of covari- ation and the obvious choice is the semi-variogram (of chapter 5), defined by Ok = 2 var(Xi - Xi+k), i, k ~ Z2 It follows that vo = 0 and, from (6.12), he =Wok t 4(Vk,l,k2 + Vk,+l,k2 ) + ( 2 Irk, ,k2 - 1 + Vkl ,k2+1 ), where Ek = 1 if k = (O. O) and is otherwise zero. It can be shown (Kunsch,1987) that the asymptotic growth of the semi-variogram is logarithmic. Although explicit results are not generally available, it may be noted that, when ,B = -, veto = vow = in, Vk,k = 1r (i + 3 + 5 + + 2~' ~ ) ~ k = 1, 2, . 1 ' (cf. Besag, 1981), and the semi-variogram can be easily evaluated for all lags. On the other hand, ,B = 2 Of course reverts to the model of §6.3.2 for independent columns of plots. At first sight, it might appear that the value of ~ should simply be determined by plot shape, assuming isotropy of the underlying fertility process, but in practice, fertility patterns are also induced by the management of the experiment.

123 6.3.4 Other Approaches ant! Further Research Several other methods of analysis for agricultural field experiments, adopting explicit spatial assumptions, have been proposed in the literature; see, for example, Wilkinson at al. (1983), Green et al. (1985), Williams (1986), Gleason and Cullis (1987), and Martin (1989~. Here we briefly consider two, the first based on a time-series formulation, the second on a data-analytic approach. We have already noted, in §6.3.1, the equivalence between first-order unilateral and bilateral autoregressions in one dimension. This extends to models of arbitrary order. Thus, for trials that only require one-dimensional adjustment, Martin (1989) proposes that classical time-series methodology should be used to select and fit an appropriate mode] in advance. He then extends this approach to two-dimensional adjustment by considering only the class of processes that, after row or column differencing, are separable; that is, are stationary and have interplot autocorrelations Pk satisfying PI ,k2 = Pki,0 Polka - (6.15) Separability leads to a considerable simplification in the computation of parameter estimates, though the advantage is diminished with the inclusion of superimposed random error; see Martin (1989) for details. At first sight, the Manhattan metric of (6.15) is unappealing but could be appropriate when fertility patterns are largely the product of cultivation practice and may in any case provide an adequate approximation. Model selection based on very limited data is perhaps the major handicap of the approach, though this aspect could be abandoned. As with other methods, there is a need for further research, including practical investigation. For a data-analytic viewpoint, we consider the approach proposed by Green et al. (1985~; this also supplies further insight into Papaciakis' and most other methods. Equation (6.1) again provides the starting point. How- ever, equations (6.2) and (6.3) are generalized to T (X ~ = f (yX ~ X (T ~ = S(YTT ), (6.16) where S and f may be linear or nonlinear operators; S is a smoother of fer- tility and other extraneous variation, whilst ~ allows (e.g., robust/resistant) alternatives to be substituted for the ordinary least-squares estimate (6.2~. Here we concentrate on adjustment along a single column of n plots, in which case the basic choices of S and ~ are made as follows. First it is as- sumed that fertility variation is approximately locally linear, so that second

124 differences A2x are small, where /~2 iS the n2 by n matrix taking second differences. Then x and ~ are estimated by [east-squares smoothing; that is, x* and T* minimize C~XT~2T~2z + zTz with the effect that smoothness of the fitted fertility pattern is offset against the residual variation, according to the value of the "tuning constant" or. Hence x* and T* satisfy (6.16) with f as in (6.2) and S=~+~2~2) (6.17) As might be anticipated, (6.16) determines estimates of treatment contrasts rather than T* itself; see below. Green et al. (1985) suggest several data- analytic prescriptions for the choice of or, including cross validation, and illustrate their methodology on data from three different trials. The anal- yses also include approximate standard errors for treatment contrasts ant! graphs of estimated treatment, fertility, and residual effects across each of the experimental areas. As with other methods of fertility adjustment that involve a tuning parameter, the exact value of c' does not seem to be critical. It is of interest that an alternative derivation of (6.17) is available through the random field formulation (6.5), with the assumption that second differ- ences in X are uncorrelated and have equal variance 2~. The generalized least-squares estimate of ~ in (6.9), based on second differences of the yi's, is then given by (6.11) but with i\ replaced by ~2 in the definitions of F. Q. and U. The equivalence follows since (6.17) implies that ~2~/S)+~2 = ~ Q ~ where A+ denotes any generalized inverse of A. (6.18) Note that, since (6.18) also holds if ~2 iS replaced by ~ throughout, the above argument can be inverted to provide a least-squares smoothing interpretation of the first- differences analysis in §6.3.2. In fact, the generalized equations (6.16) are of very wide applicability. For example, they include, on the one hand, the estimates obtained from a classical analysis of an incomplete block design and, on the other, those obtained from the "NN" methodology of Wilkinson et aL (1983~; for a comprehensive discussion, see Green (1985~. Incidentally, NN analysis over a finite region provides an example where a random field formulation is not strictly available: S. though linear, is not completely symmetric because of border plots. However, there are close asymptotic links between NN analysis and that based on first differences. A very important aspect of any model-based statistical procedure is its robustness to departures from the underlying assumptions. Our initial opti- mism concerning the adequacy of a fairly cru(le fertility mode: is supported

125 in practice by the general similarity between treatment estimates obtained from different spatial formulations and by close agreement with classical results when sophisticated design and analysis, such as balanced lattice squares, has been used. Furthermore, there is frequent disparity between spatial and conventional estimates when a poor design, such as randomized complete blocks, has been employed, so that fertility adjustment seems to be worthwhile. We briefly discuss quantitative assessment below but it may also be desirable to modify a model-based procedure to accommodate gross anomalies, particularly those caused by measurement errors or by abrupt jumps in fertility, which may be the product of a change in underlying geo- Togical structure, for example. Papadakis (1984) reviews his previous work on this topic and Besag and Seheult (1989) summarize a closely related ap- proach geared to first-differences analysis. In the context of (6.16), the two types of anomaly might be catered for by nonlinear resistant versions of and S respectively. How can quantitative assessments be made? Perhaps the only rigor- ous method is to use data from uniformity trials in which all plots receive common treatment. If a mock design is superimposed on such a trial, any particular procedure can be used to estimate the relative "treatment" ef- fects and, since these are known to be zero, an assessment of accuracy can be made. Furthermore, the results are relevant to a real experiment under the usual assumption of treatment additivity, provided the method of analy- sis also acts additively. Predicted standard errors can also be compared with actual variability of estimates. Unfortunately, in a random field framework, each trial provides but a single assessment and many sets of uniformity data are required for a proper evaluation. Moreover, uniformity trials are rarely carried out these days, though, for an early catalog, see Cochran (1937~. An alternative is to carry out an assessment within a randomization frame- work (e.g., Besag and Kempton, 1986, Appendix 2), although of course, this addresses a population for which the procedure was not designed. Finally, what of Bayesian formulations? They are indeed conspicuous by their virtual absence from the literature. The main difficulty is that of representing in probabilistic terms one's prior beliefs about fertility varia- tion. Thus, considerations are likely to be very similar to those arising in a random field formulation (6.5) but inferential aspects may be more akin to recent developments in Bayesian image analysis.

126 Bibliography [1] Bartlett, M. S., Nearest neighbour models in the analysis of field ex- periments (with discussion), ]. R. Stat. Soc., B 40 (1978), 147-174. t2] Bartlett, M. S., Stochastic models and field trials, '7. Appl. Prob. 25A (1988), 79-89. [3] Beaven, E. S., Pedigree seed corn, .7. R. Agric. Soc. 70 (1909), 119-139. t4] Besag, J. E., Spatial interaction and the statistical analysis of lattice systems (with discussion). ~I. R. Stat. Soc., B 36 (1974), 192-236. [5] Besag, J. E., Errors-in-variables estimation for Gaussian lattice schemes, ~7. R. Stat. Soc., B 39 (1977), 73-78. [6] Besag, J. E., On a system of two-dimensional recurrence equations, J. R. Stat. Soc., B 43 (1981), 302-309. [7] Besag, J. E., and R. A. Kempton, Statistical analysis of fielcl experi- ments using neighbouring plots, Biometrics 42 (1986), 231-251. [8] Besag, J. E., and A. H. Seheult, Contribution to discussion of Bruce and Martin, A. R. Stat. Soc., B 51 ~ 1989), 405-406. [9] Cochran, W. G., A catalogue of uniformity trial data, A. R. Stat. Soc. ('Suppl.,) 4 (1937), 233-253. 10] Dahiaus, R., and H. Kunsch, Edge effects and efficient parameter esti- mation for stationary random fields, Biometrika 74 (1987), 877-882. ~] Gleason, A. C., and B. R. CuDis, Residual maximum likelihood (REML) estimation of a neighbour model for field experiments, Biometrics 43 (1987), 277-287. 2] Green, P. J., Linear models for field trials, smoothing and cross- validation, Biometrika 72 (1985), 527-537. 13] Green, P. J., C. Jennison, and A. H. Seheult, Analysis of field ex- periments by least-squares smoothing, if. R. Stat. Soc., B 47 (1985), 299-315.

127 t14] Guyon, X., Parameter estimation for a stationary process on a d- dimensional lattice, Biometrika 69 ~ 1982), 95-105. t15] Kempthorne, O., The Design and Analysis of Experiments, John Wiley and Sons, New York, 1952. [16] Kunsch, H., Tntrinisic autoregressions and related models on the two- dimensional lattice. Biometrika 74 (1987), 517-524. [17] Levy, P., Charges doubles de Markoff et fonctions aleatoires de deux variables, C. R. Acad. Sci., Paris 226 (1948), 53-55. [18] Martin, R. J., The use of time-series models and methods in the analysis of agriculture field trials, Comm. Stat. Theor. Meth. 19 (1989), 55-81. [19] Papadakis, J. S., Advances in the analysis of field experiments, Pro- ceeds. Acad. Athens 59 (1984),326-342. [20] Patterson, H. D., and E. A. Hunter, The efficiency of incomplete block designs in National List and Recommended List cereal variety trials, ]. Agric. Sci. 101 (1983),427-433. [21] Patterson, H. D., and R. Thompson, Recovery of inter-block informa- tion when block sizes are unequal, Biometrika 58 (1971),545-554. [22] Rosanov, Yu. A., On the Gaussian homogeneous fields with given con- ditional distributions, Theor. Probability Appl. 12 (1967),381-391. [23] Tiao, G. C., and M. M. Ali, Analysis of correlated random effects: Linear mode! with two random components, Biometrika 58 (1971), 37-51. [24] Wiancko, A. T., Use and management of check plots in soil fertility investigations, if. Am. Soc. Agron. 13 (1914),368-374. [25] Wilkinson, G. N., S. R. Eckert, T. W. Hancock, and O. Mayo, Nearest-neighbour (NN) analysis of field experiments (with discussion), ]. R. Stat Soc., B 45 (1983),151-211. [26] Wilhams, E. R., A neighbour mode! for field experiments, Biometrika 73 (1986),279-287.

Next: 7. Spatial Statistics in Ecology »

Spatial Statistics and Digital Image Analysis (1991)

Chapter: 6. Spatial Statistics in the Analysis of Agricultural Field Experiments

Welcome to OpenBook!

Get Email Updates