Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 109
6
Spatial Statistics in the
Analysis of Agricultural
Field Experiments
v
6.1 Introcluction
Julian Besag
University of Washingtoni
The main purpose of agricultural field experiments is to compare the effec-
tiveness of different treatments (e.g., fertilizers) on a particular crop variety
or to make comparisons between different varieties of the same crop. Ac-
curacy is paramount, but valid assessment of error is also important. A
typical experimental layout consists of a linear or two-dimensional array
of contiguous rectangular strips of lancl, called plots, each of which is (le-
votec! to a single treatment or variety. Plots are usually Tong and narrow
(e.g., 20 m x 2 m), partly as a trade-off between ease of management and
compactness of the experiment.
In a linear layout, the longer sides are chosen to abut one another, so as
to minimize the impact of fertility gradients across plots (see Figure 6.1~.
The most common measurement is that of plot yield at harvest, which in an
ideal world would provide a direct assessment of the corresponding treatment
or variety eject. However, yield is influenced by external factors such as
weather and plot fertility. It can often be assumed that weather has a
uniform effect, in which case comparisons remain valid under more general
conditions; otherwise, several experiments may be required. On the other
hand, variation in fertility over the experimental region is usually substantial
~ Now at University of Newcastle upon Tyne.
109
OCR for page 110
110
and if ignored can lead to quite erroneous comparisons. Proper design and
analysis of field experiments aims to minimize such problems. By the term
design, here we mean, somewhat narrowly, the rule by which treatments are
allocated to plots (henceforth we include the possibility that treatments are
in fact different varieties).
Clearly, the task of controlling for variation in fertility is inherently spa-
tial and is the focus of this chapter. In §6.2, we discuss general background,
largely from a historical perspective, and in §6.3, we consider some recent
progress and possible future directions at a more technical level. First, how-
ever, it should be noted that there are other types of field experiments that
are not strictly covered by the above description. For example, interest
may center on a measurement other than yield, such as resistance to dis-
ease or quality of product. Also there are experiments that are multisite or
multistage or that involve mixtures, spacings, intercropping, competition,
interference, and so on. Thus some experiments involve spatial considera-
tions rather different from those on which we concentrate in this chapter; for
example, in assessing resistance of different treatments to a particular pest,
the main problem may concern patchiness of infestation over the experi-
mental region. Nevertheless, we hope that in focusing on a single important
topic, the richness of the subject as a whole will not be lost.
6.2 General Backgrounc!
Methods of controlling for variation in fertility across the experimental re-
gion have a long history. Perhaps the first was the use of check plots (e.g.,
Wiancko, 1914~; that is, plots interspersed regularly at frequent intervals
throughout the experiment and containing a standard treatment. The yields
from these plots can be used to calculate a local fertility index for each ex-
perimental plot and to adjust its yield accordingly; the assumption is that
variations in adjacent plots are relatively small. Check plots are still em-
ploye<1 in early generation selection trials where it is required to choose say
logo from many hundreds of varieties for further assessment. This selec-
tion is done at a stage where only a single plot is available to each vari-
ety because of restrictions of management, space, and the quantity of seed
available. Besag and Kempton (1986) give an example involving 1560 (liffer-
ent lines of winter wheat. However, for general use, check plots are rather
crude, demand additional space, labor, and expense, and are somewhat self-
defeating in that experimental plots become even more widely (lispersed
OCR for page 111
111
over the field. Furthermore, there is no obvious means of judging the relia-
bility of estimates. This is also true of systematic designs, such as the Knut
Vik square, in which several plots, regularly dispersed over the experimental
region, are devoted to each treatment. In this case, control for fertility is
implicit rather than explicit; each treatment set of plots should be subject
to approximately the same variation. Systematic designs also have a long
history (e.g., Beaven, 1909) and still find favor in some Nordic countries.
Two other possibilities are (~) the use of soil analysis in each plot to con-
struct a fertility index, although this appears to have little if any support
in practice, and (2) the incorporation of data from previous experiments,
although this may be awkward operationally and requires the generally du-
_ _ _ O ~
bious assumption that fertility gradients remain approximately static from
year to year.
The method of control that has now become standard in most coun-
tries was first proposed by R. A. Fisher in the 1920s. Fisher's triumph was
to construct an entirely self-contained inferential framework that is valid
whatever the pattern of fertility might be, subject only to an assumption
of treatment additivity; that is, it is assumed that the relative eject of any
particular treatment would be the same on any plot. The methodology relies
on three key ingredients: replication, Mocking, and randomization. Replica-
tion means that each treatment appears several times, usually with equal
frequency, in the experiment; this generally improves accuracy and provides
a basis for its assessment. Blocking implies that there are restrictions on the
allocation of treatments to plots, which are imposed to counteract suspected
fertility gradients. The intention is that fertility should be approximately
constant within blocks, so that corresponding differences in plot yields are
meaningful; this further improves the accuracy of treatment comparisons.
Finally, randomization requires that treatments are allocated to plots en-
tirely at random within the constraints of the blocking structure; it is this
step that ensures the unbiasedness of treatment comparisons (contrasts) and
the validity of the associated standard errors within the inferential frame-
work.
Since this framework is not at all obvious, we provide some brief discus-
sion in the particular context of the simplest common example, namely, the
randomized complete blocks design, for which blocks and replicates coincide.
Each plot yield is assumed to be the sum of three components, a fixed block
effect, a treatment effect, and a plot effect. The natural means of introduc-
ing randomness into the mode! is to suppose that the plot effects represent a
realization of a spatial stochastic process. The question, which is addressed
OCR for page 112
112
in §6.3, is, what process? Fisher brilliantly circumvented this problem by as-
suming plot effects to be fixed and by introducing randomness solely through
the act of allocating treatments to plots. In other words, it is the very act
of randomization that alone induces a probability distribution in the yields,
and hence a basis for inference. For details see Kempthorne (1952, Ch. S)
but it turns out that the calculation of treatment estimates and associated
standard errors coincides exactly with an ordinary least-squares analysis of
the corresponding linear mode! assuming a fixed layout and uncorrelated
random plot ejects. For this reason, it is sometimes assumed that the latter
formulation underpins the Fisherian analysis, whereas the two moclels are
quite distinct, with the first addressing a fixed field with a randomly chosen
layout and the second a random field with a fixed layout. Of course, in the
second, an assumption of uncorrelated or equi-correlated plot effects within
blocks is untenable if fertility gradients exist, as is generally the case unless
the number of treatments (i.e., blocksize) is very small. Finally, note that
the "usual" construction of confidence intervals based on the t-distribution
can be shown to be approximately valid under randomization.
Thus, the main problem in adopting a randomized complete blocks cle-
sign is not in the validity of the analysis (although one might challenge the
relevance of the conceptual framework) but in its lack of sensitivity. A graph
of estimated plot effects against their locations in the experiment will almost
inevitably display substantial spatial structure; to ignore this is extremely
wasteful. The classical remedy has been to develop much more sophisticated
designs that employ a local blocking structure within which it can be more
reasonably assumed that fertility is effectively constant. The most efficient
designs, such as completely balanced lattice squares, are rarely practicable
because of restrictions on the number of treatments and replicates. This
difficulty has been met by the introduction of compromise designs, baser! on
partially balanced incomplete blocks, and these are now used quite widely,
especially in variety trials (Patterson and Hunter, 1983~. However, a new
problem arises with sophisticated designs, for there no longer exists a proper
justification for the use of Gaussian-theory confidence intervals, unless ad-
ditional or different assumptions are made in the statistical formulation.
Furthermore, despite the obvious merits of sophisticated designs, alarge
proportion of experiments in the world employs nothing more complicated
than randomized complete blocks, whether for reasons of tradition or ease
of management. Thus it is important that methods of analysis be avail-
able that adopt explicit spatial models for fertility, if only as a means of
salvaging badly designed experiments (Bartlett, 197S, 1988~. Of course, it
OCR for page 113
113
would be unrealistic to expect a spatial mode} to provide more than a crude
representation of a true fertility process, but this is probably unimportant,
since (1) it is the replicated treatment effects rather than the individual
plot fertilities that are of primary concern and (2) the purpose of the model
is essentially one of interpolation rather than extrapolation. There is an
interesting contrast here with the usual requirements in time series analysis.
In fact, the idea of extracting information from neighboring experimental
plots as a means of controlling for variation is not at ah new and was first
proposed in the 1920s by J. S. Papadakis, a distinguished Greek agronomist.
Unfortunately, his entirely empirical approach received very little attention
by others; for some historicaIreflections, see Bartlett (1988~. Nevertheless,
Papadakis himself continued to use and develop his method over several
decades (see Papadakis, 1984), and it is instructive to consider one particular
version below.
Thus, let y denote the vector of observed yields, with plots indexed in
some convenient manner, and suppose
y = Tr+x+z,
(6.1)
where T denotes treatment effects, T is the corresponding fur-rank treatment
design matrix, x represents the (fixed) fertility effects measured about zero,
and z is residual error. If x* denotes a current assessment of x and is
presumed to be correct, the corresponding ordinary least-squares estimate
O : ~ IS
~ (x ~ = (TTT) 1TT(y—x ).
(6.2,
This provides a reassessment y—TT* of x but leads to circularity in the
absence of some form of constraints on the parameter space. Papadakis
resolved this difficulty by using as the new estimate of x
X*(T*) = H(Y - TT*),
(6.3)
where ET is a matrix that reflects anticipated similarity in fertility between
neighboring plots. For example, in a two-dimensional layout, the fertility
in any particular plot might be estimated by the average of the residuals in
the four adjacent plots, with an appropriate modification at the boundary
of the experiment. Papadakis initiated (6.2) with x* = 0 and then iterated
between (6.2) and (6.3), either for a prescribed number of cycles or until
convergence. Here we concentrate exclusively on the latter option, referred
to as the iterated! Papadakis procedure. It follows that the final estimate A*
OCR for page 114
114
satisfies
so that
T* = (~TTT>'—ITTLY _ iI(y—TT a)],
TT(~_ B)TT* = TAT_ H\)Y. (~6.4)
Thus an alternative viewpoint is that r* is the generalized least-squares
estimate of T when z is negligible and x is interpreted as a realization from
a spatial stochastic process with (possibly generalized) inverse covariance
matrix proportional to ~—At, assuming this is symmetric positive (semi-)
definite.
The implication is that Papadakis' empirical procedure may have a sepa-
rate interpretation as a formal model-based approach to fertility adjustment.
We investigate this and consider generalizations in §6.3. Of course, at this
stage, there is no guarantee that r* has any particular merit, or that the
~—~ induced by typical Papadakis adjustment holds appeal as inverse co-
variance matrices in a random field formulation. Finally, note that, where
in our discussion ~—~ is singular, estimates of treatment contrasts rather
than T* itself wiB be uniquely determined.
6.3 Some Recent Progress ant! Future Directions
6.3.1 Aims
It has already been noted that, in most field experiments, plots are long and
narrow. It follows that, even when the layout itself is two-dimensionaI, in-
ternal control for fertility variation is usually profitable only in the direction
of the shorter plot axis. Thus, in §6.3.2, we concentrate on one-dimensional
adjustment. In particular, we first discuss the role of simple stochastic mocI-
els in experiments that only involve a single linear array (column) of plots;
the results extend immediately to trials that in effect employ several sepa-
rate columns. Then, in §6.3.3, we tackle the less common but nevertheless
important situation in which genuine two-dimensional adjustment is nec-
essary; this is a topic that requires considerable further research. Finally,
in §6.3.4, we briefly discuss some other approaches and some outstanding
problems.
In §§6.3.2 and 6.3.3, we assume a formulation that is in accordance with
equation (6.~), where now y, x, and z are interpreted as realizations of
spatial stochastic processes Y. X, and Z; thus,
Y = TT+X+Z, (6.5)
OCR for page 115
115
where Y is the vector of random plot yields, T represents fixed treatment
effects, and T is the design matrix for the experiment. We further suppose
that the components of Z are uncorrelated, with zero means and common
variance lo, and that Z and X are uncorrelated; Z takes account of residual
errors and is often negligible in practice when compared with the variation
in the fertility process X.
6.3.2 One-Dimensional Adjustment
In discussing specification of the fertility process X for layouts that consist
of a single column of n plots, it proves convenient initially to consider an
ostensibly infinite column, with plots labelled i = 0,+1,..., according to
their positions with respect to a reference plot 0. Let Xi denote the random
fertility in plot i, measured about zero. The simplest specification of the
Xi's that departs from independence is the classical first-order stationary
autoregression in which the lag k autocorrelation is Pk = A~k~,k = 0,+1,..
where ~~ < 1. The corresponding autocorrelation generating function is
·7
oo ~ _ `2
C(u)— ~ pku 1 + A2 _ A(u + u ) (6.6)
In the present context, the above unilateral model is more naturally formu-
lated as a bilateral autoregression, with
E (Xi all xj, j if i) = ,B(Xi_1 + Xi+l ),
var(Xi~ all xj,j 76 i) = K, (6.7,
where ,B = A/(1 + A2) and ~,0~ < 2. The equivalence is confirmed by noting
that (6.7) implies that the corresponding autocorrelations Pk satisfy
Pk = 3(Pk—1 + Pk+~), k = +1,+2, . . ., (6.S,
and hence that Pk = inks. The Nudity between the unilateral and bilateral
formulations does not generally extend to higher dimensions and rests on the
factorization h(u~h(u-~) of the denominator in (6.6), where in(u) = 1—Au
see §6.3.4 for some further comments.
Of course, in reality, neither X nor Z is observed but only Y over a
finite column of plots, i = 1, 2, . . . ,n, say. If ~ = w/2n and ,5 were known,
then estimation of T COUi] proceed by generalized least squares. Otherwise,
OCR for page 116
116
several methods of estimating at, a, and ,B are available. For example, one
might assume additionally that X and Z are Gaussian and apply standard
maximum-likelihood estimation (cf. Tiao and Ali, 1971, in a different con-
text) or the residual maximum likelihood (REME) variation (Patterson and
Thompson, 1971~. What is important here is that in practice it is common
that the estimate of ax is zero or close to zero and that of,0 is very close to
its maximum possible value, 2. It is therefore instructive to consider both
or = 0 and ,(} ~ - in more detail. In each case, we again begin with an infinite
line of plots.
First suppose that ~ = 0 with ,(] known and let H denote the doubly-
infinite matrix with (i, j) element
H~ —J ~ i = ii 1
~ l O otherwise
Then it is easily checked that the inverse covariance matrix (i.e., precision
matrix) for Y is proportional to ~—H. It follows that the generalized
least squares estimate T* of T agrees with the iterated Papadakis estimate
in (6.4) for which fertility in plot i is estimated at each stage by ,0 x {sum
of the residuals in the two adjacent plots). This provides a useful connec-
tion between the present model-based approach and that espoused empir-
icaDy by Papadakis. Furthermore, the duality between bilateral modeling
and Papadakis' method is perfectly general, provided H in (6.3) is sym-
metric positive (semi-) definite, and extends not only to more complicated
one-dimensional adjustment but also to higher dimensions (cf. §6.3.3~. Of
course, for real experiments with finite numbers of plots, it holds only as an
approximation because of edge effects.
Second, suppose that ,l} ~ 2 for fixed ct. Though the distribution of X
itself degenerates, the differences Xi—Xj remain well behaved with zero
means and with variances
var(Xi Xj) ~ _ >2 Hi JO
as ,0 ~ -. Equivalently, first differences Xi—Xi+~ are, in the limit, un-
correlated and have equal variance 2n, so that X can be thought of as a
random walk. It is also of interest that, in the limit ~ = -, the conditional
expectation formulation (6.7) remains valid, even though the marginal mean
of Xi is undefined and the marginal variance is infinite. This is the most
OCR for page 117
117
basic example of an intrinsic autoregression (Kunsch, 1987) and provides a
stochastic version of simple linear interpolation; see also §6.3.3. The degen-
eracy is unimportant in practice because it is only comparisons of treatment
effects that are being assessed. This becomes apparent algebraically if we
return to the actual experiment. Thus, let Ui = Yi - Yin+, i = 1, . . ., n - 1,
or, in vector notation, U = MY, where ~ is the n - 1 by n matrix taking
first differences of the Yi'S. It is convenient at this point to single out the
overall level ~ of the experiment. Suppose T iS a ~vector; then perhaps after
reparameterization, we can write
Tr = y1 + Do,
(6.9)
where 1 is an e-vector of l's, ~ is a p—1 vector of (relative) treatment effects,
and D is an ~ by p—1 design matrix of rank p- 1. Since any treatment
contrast ~ = Ate, where aT1 - 0, can be written in terms of i, it is sufficient
to concentrate on the estimation of I. The mean and variance-covariance
structures of U are given by
E(U) = Ff, ~ ,
DfU) = 2nQ,
(6.10)
where F = AD, Q = ~ + orbit' and ~ is now the (n—1) x (n—1) identity.
Note the absence of end-plot problems and the retention of information on
treatment contrasts in the reduction of the data to n—1 first differences.
Finally, let Icy denote the generalized [Least-squares estimate of i, so that
a* =(FTQ—iFy iFTQ u,
where u is the observed value of U.
(6.11)
Two special cases of (6.~) merit attention. At one extreme, fO is the
ordinary least-squares estimate of ~ based on a; at the other, bOO is the ordi-
nary least-squares estimate of ~ based on y (cf. Besag and Kempton, 1986~.
Thus, for any intermediate value of a, i,' provides a compromise between
the ordinary least-squares estimates of ~ based on u and on y, respectively;
this resembles the combination of intra- and interblock information in the
classical analysis of incomplete block designs but here using the notion of a
moving block of size two.
The above discussion indicates that there is little point in retaining a
flexible value off and that ,B = - should be adopted from the outset. In
practice, it is still often found that the estimate of or is essentially zero,
OCR for page 118
118
TABLE 6.1: Layout and Yields (t/ha) for 62 Varieties of Winter Wheat
Repl icat e 1
Variety Yield
10 7.32
50 6.34
18 7.44
58 8.54
42 7.26
26 7.50
2 6.92
34 8.46
56 7.22
32 8.22
16 7.15
8 6.90
24 7.48
48 7.02
40 8.16
41 7.92
9 7.56
33 8.57
49 8.57
1 7.35
25 8.85
57 8.06
17 8.10
45 8.64
61 8.41
13 8.95
37 7.39
5 7.30
29 8.31
53 8.87
21 7.83
20 8.45
36 7.07
52 8.00
44 8.28
4 7.48
60 8.69
28 6.88
12 8.17
62 7.99
14 7.48
54 7.17
22 7.67
6 7.60
38 7.40
46 8.55
30 7.50
39 7.75
15 4.82
55 8.40
31 9.02
47 7.57
23 9.12
7 9.96
11 8.51
19 8.55
59 9.14
3 7.70
43 8.43
27 7.98
51 7.66
35 8.24
Repl icat e 2
Variety Yield
58 7.52
31 7.50
40 6.61
41 7.00
4 5.59
22 6.01
13 7.52
51 7.07
27 7.63
18 7.28
8 6.32
45 7.43
55 7.31
62 8.36
36 7.04
9 7.23
56 8.16
28 6.86
46 8.21
19 8.54
10 7.56
37 7.41
1 7.35
44 8.50
26 9.01
17 8.60
54 7.25
61 8.39
35 8.45
16 7.70
7 9.46
48 7.28
57 7.71
3 7.46
30 7.63
50 6.32
21 7.29
12 8.29
39 7.09
49 8.30
2 7.49
20 8.94
11 8.09
29 7.99
47 8.05
38 7.38
32 9.00
52 9.24
59 9.60
33 9.13
42 8.20
14 7.90
23 9.26
5 7.80
15 6.28
43 8.95
53 8.96
25 9.32
24 8.24
34 9.15
60 9.40
6 8.17
Replicate 3
Variety Yield
32 7.29
14 6.70
23 7.57
51 7.38
33 7.71
2 6.78
60 8.44
45 7.61
48 6.93
9 7.23
36 6.76
5 6.19
27 7.86
18 7.82
54 6.69
34 8.35
25 8.35
24 7.69
52 8.25
3 7.77
15 4.38
61 8.31
46 8.45
11 8.30
42 7.43
7 9.51
38 6.87
20 8.69
57 7.69
56 7.84
29 7.96
6 7.75
19 7.98
55 7.82
28 6.58
37 7.51
10 7.31
41 7.90
35 8.60
62 8.22
16 7 ~ 55
53 8.52
26 8.58
47 7.15
4 8.37
17 7.89
59 9.16
13 9.04
1 7.72
40 8.15
31 8.98
50 7.10
44 8.20
22 8.20
58 -8.92
39 7.68
8 6.78
21 7.67
49 8.57
12 8.14
43 7.95
30 7.67
OCR for page 119
119
especially if due allowance for outTiers and jumps in fertility is made (cf.
§6.3.4~; this leads to a very simple estimation procedure. Although design is
not a consideration here, it is worth noting that if plots 1 and n contain the
same treatment, {0 is a linear combination of the n - 2 second differences
hi-—2Yi + yin+ and is therefore invariant to linear trends in fertility and
approximately so to locally linear trends (which are of greater concern).
Incidentally, it also turns out that [0 is featured in several other proposals
for fertility adjustment and provides an agreeable unity between ostensibly
different approaches; again see §6.3.4.
The general analysis with ,B = 12 extends immediately to several (effec-
tively) independent columns of plots. The single parameter fly is replaced
by a vector of separate column effects and, subject to the usual necessity
of such terms, there is again no Toss of information on treatment contrasts
by the reduction of the data to first differences within columns. For further
details, including determination of stanciard errors, the analysis of a facto-
rial experiment on triticale, and an assessment of accuracy and precision,
see Besag and Kempton (1986~.
Here we illustrate first-differences analysis on an official United Kingdom
trial for winter wheat, carried out by the East of Scotland College of Agri-
culture and involving final assessment of 62 different varieties. The layout of
the experiment, in three physically separated complete replicates, and the
corresponding yields (t/ha) are listed in Table 6.1. The yields are graphed
against plot position in Figure 6.1 (top); there is clear evidence of mod-
est fertility gradients within replicates. First-differences analysis (i.e., with
~ = 2 ~ produces the estimate ~ = 1.76 and the decomposition of yields into
relative variety effects, fertility effects, and resicluals shown in the bottom
three panels of Figure 6.~. The standard errors of pairwise differences be-
tween varieties range from .202 to .238 with a mean of .223, compared with
the value .418 for a complete block analysis. Thus, there is substantial im-
provement in precision and presumably in accuracy of variety estimates; for
objective methods of assessment, see the penultimate paragraph of §6.3.4.
Moreover, fertility gradients are often more pronounced than in this partic-
ular experiment (see, for example, the associated trials analyzed by Green et
al., 1985), particularly in countries that do not have the temperate climate
of the United Kingdom.
In fact, because of the importance of the above trial, the within-replicate
layout conformed to a generalized lattice design. This confers no particular
benefit to the first-differences analysis, except perhaps to reduce the range
O
OCR for page 120
120
replicate 1
2
~_f
raw yields
+ fertility
+ residuals
f
'I' ~~ _ ~ 21· ~ ._ ,. <~,_, ,, I
FIGURE 6.1: Raw yields and subsequent first-differences decomposition for
62 varieties of winter wheat.
of the standard errors, but enables classical incomplete-blocks analysis to be
carried out. The corresponding standard error of a varietal difference, kindly
supplied by Professor H. D. Patterson, is .235. The fairly close agreement
with spatial statistical analysis seems typical but of course the latter does
not require a sophisticated design and applies equally to the simple layouts
encountered more commonly in practice.
6.3.3 Two-Dimensional Formulation
In this section, we seek to generalize the previous one-dimensional formula-
tion. Thus, we again adopt equation (6.5) but now identify plots by integer
OCR for page 121
121
pairs of Cartesian coordinates i = (il,i2). Practical aspects are not devel-
oped to the same extent in two dimensions, the main problem being that
edge effects are much more important than in one dimension (Guyon, 1982;
DahThaus and Kunsch, 1987) and must not be ignored, although their effects
are sometimes overemphasized.
The generalization of equation (6.7) to a two-dimensional model is given
by
E (Xii all Xj, j 7t i) = pl~xil_l,i2 + xil+l~i2) + p2(xi~,i2-l + xil,i2+l),
var(Xi| all xj,j ~ i) = K, (6.12)
where ~,B~ + ~2~ < 2; we assume neither 9~ nor p2 iS zero. It follows that
the autocorrelations Pk satisfy
Pk = ~l(Pki—l,k2 + Pki+l.k2) + 32(pk~,k2—1 + Pki,k2+1) (6.13)
for k 76 (O. O) and that the corresponding generating function is
oo oo
C(U)—— ~ ~ Pkul1u22 or [1—,ll1(ul+ul 1)—,(12(U2+U2 1)] 1, (6.14)
kl=—ook2=—oo
which cannot be reproduced by any finite unilateral autoregression. For-
mulae exist for the low-order autocorrelations but generally the pk'S are
best calculated by recursive algorithms (these can be quite delicate) or ap-
prox~mated using Bessel functions; for details, see Besag (1981~. Equation
(6.12) can be easily extended to include more distant plot fertilities, with
appropriate modification to (6.13) and (6.14~; see Rosanov (1967) or Besag
(1974), though (6.12) itself dates back to Levy (1948~. If Z is negligible
(i.e., ct = 0), the equivalent iterated Papadakis adjustment (6.3) for any
particular bilateral autoregression can be written down immediately.
Given a partial realization of X in (6.12) over a finite region, the parame-
ters ,0~, 32, and ~ can be estimated by matching the theoretical variance and
first-order autocovariances with the edge-corrected (i.e., unbiased) empirical
values (Besag, 1974; Guyon, 1982~. This generalizes to arbitrary bilateral
autoregressions and, if X is Gaussian, is equivalent to asymptotic maximum-
likelihood estimation. However, here we are concerned with observation on
Y in (6.5) rather than on X. The incorporation of treatment effects T iS
straightforward but that of random error Z is more problematical, primarily
because of edge effects. When Z is ignored, it is generally found that the
estimate of ,B~ + p2 iS very close to -. One means of including Z is to make
OCR for page 122
122
toroidal assumptions, identifying opposite edges of the field (Besag, 1977~.
Although this device is of minimal direct practical relevance, one might rea-
sonably expect that it
should depress the estimates of p~ and 3:, and yet
the above behavior seems to be reproduced. Thus, we might well abandon
the stationarity assumption and rather adopt (6.12) with ,0~ + p2 = 2; that
is, on the infinite lattice, an intrinsic bilateral autoregression of class zero
(Kunsch, 1987~. Such a formulation is of course entirely consistent with
the one-dimensional development in §6.3.2; again, we may expect that Z
will often be negligible, which if assumed from the start would lead to en-
tirely straightforward estimation although we do not wish to exclude the
possibility of a non-zero or.
~ v
Certainly the problems of estimation are not insuperable but they require
further research, especially as regards standard errors for treatment con-
trasts; these must retain approximate validity somewhat outside the narrow
confines of the model itself. We briefly consider the assessment of different
methods in §6.3.4, but here we conclude with some remarks concerning the
structure of intrinsic autoregressions. Thus, suppose ,( = ,5 and p2 = 2 -
in (6.12~. In the absence of stationarity, we need a new measure of covari-
ation and the obvious choice is the semi-variogram (of chapter 5), defined
by
Ok = 2 var(Xi - Xi+k), i, k ~ Z2
It follows that vo = 0 and, from (6.12),
he =—Wok t 4(Vk,—l,k2 + Vk,+l,k2 ) + ( 2 —Irk, ,k2 - 1 + Vkl ,k2+1 ),
where Ek = 1 if k = (O. O) and is otherwise zero. It can be shown (Kunsch,1987)
that the asymptotic growth of the semi-variogram is logarithmic. Although
explicit results are not generally available, it may be noted that, when ,B = -,
veto = vow = in,
Vk,k = 1r (i + 3 + 5 + + 2~' ~ ) ~ k = 1, 2, .
1 '
(cf. Besag, 1981), and the semi-variogram can be easily evaluated for all
lags. On the other hand, ,B = 2 Of course reverts to the model of §6.3.2
for independent columns of plots. At first sight, it might appear that the
value of ~ should simply be determined by plot shape, assuming isotropy
of the underlying fertility process, but in practice, fertility patterns are also
induced by the management of the experiment.
OCR for page 123
123
6.3.4 Other Approaches ant! Further Research
Several other methods of analysis for agricultural field experiments, adopting
explicit spatial assumptions, have been proposed in the literature; see, for
example, Wilkinson at al. (1983), Green et al. (1985), Williams (1986),
Gleason and Cullis (1987), and Martin (1989~. Here we briefly consider two,
the first based on a time-series formulation, the second on a data-analytic
approach.
We have already noted, in §6.3.1, the equivalence between first-order
unilateral and bilateral autoregressions in one dimension. This extends to
models of arbitrary order. Thus, for trials that only require one-dimensional
adjustment, Martin (1989) proposes that classical time-series methodology
should be used to select and fit an appropriate mode] in advance. He then
extends this approach to two-dimensional adjustment by considering only
the class of processes that, after row or column differencing, are separable;
that is, are stationary and have interplot autocorrelations Pk satisfying
PI ,k2 = Pki,0 Polka -
(6.15)
Separability leads to a considerable simplification in the computation of
parameter estimates, though the advantage is diminished with the inclusion
of superimposed random error; see Martin (1989) for details. At first sight,
the Manhattan metric of (6.15) is unappealing but could be appropriate
when fertility patterns are largely the product of cultivation practice and
may in any case provide an adequate approximation. Model selection based
on very limited data is perhaps the major handicap of the approach, though
this aspect could be abandoned. As with other methods, there is a need for
further research, including practical investigation.
For a data-analytic viewpoint, we consider the approach proposed by
Green et al. (1985~; this also supplies further insight into Papaciakis' and
most other methods. Equation (6.1) again provides the starting point. How-
ever, equations (6.2) and (6.3) are generalized to
T (X ~ = f (y—X ~
X (T ~ = S(Y—TT ),
(6.16)
where S and f may be linear or nonlinear operators; S is a smoother of fer-
tility and other extraneous variation, whilst ~ allows (e.g., robust/resistant)
alternatives to be substituted for the ordinary least-squares estimate (6.2~.
Here we concentrate on adjustment along a single column of n plots, in
which case the basic choices of S and ~ are made as follows. First it is as-
sumed that fertility variation is approximately locally linear, so that second
OCR for page 124
124
differences A2x are small, where /~2 iS the n—2 by n matrix taking second
differences. Then x and ~ are estimated by [east-squares smoothing; that is,
x* and T* minimize
C~XT~2T~2z + zTz
with the effect that smoothness of the fitted fertility pattern is offset against
the residual variation, according to the value of the "tuning constant" or.
Hence x* and T* satisfy (6.16) with f as in (6.2) and
S=~+~2~2)
(6.17)
As might be anticipated, (6.16) determines estimates of treatment contrasts
rather than T* itself; see below. Green et al. (1985) suggest several data-
analytic prescriptions for the choice of or, including cross validation, and
illustrate their methodology on data from three different trials. The anal-
yses also include approximate standard errors for treatment contrasts ant!
graphs of estimated treatment, fertility, and residual effects across each of
the experimental areas. As with other methods of fertility adjustment that
involve a tuning parameter, the exact value of c' does not seem to be critical.
It is of interest that an alternative derivation of (6.17) is available through
the random field formulation (6.5), with the assumption that second differ-
ences in X are uncorrelated and have equal variance 2~. The generalized
least-squares estimate of ~ in (6.9), based on second differences of the yi's,
is then given by (6.11) but with i\ replaced by ~2 in the definitions of F. Q.
and U. The equivalence follows since (6.17) implies that
~2~/—S)+~2 = ~ Q ~
where A+ denotes any generalized inverse of A.
(6.18)
Note that, since (6.18)
also holds if ~2 iS replaced by ~ throughout, the above argument can be
inverted to provide a least-squares smoothing interpretation of the first-
differences analysis in §6.3.2. In fact, the generalized equations (6.16) are
of very wide applicability. For example, they include, on the one hand, the
estimates obtained from a classical analysis of an incomplete block design
and, on the other, those obtained from the "NN" methodology of Wilkinson
et aL (1983~; for a comprehensive discussion, see Green (1985~. Incidentally,
NN analysis over a finite region provides an example where a random field
formulation is not strictly available: S. though linear, is not completely
symmetric because of border plots. However, there are close asymptotic
links between NN analysis and that based on first differences.
A very important aspect of any model-based statistical procedure is its
robustness to departures from the underlying assumptions. Our initial opti-
mism concerning the adequacy of a fairly cru(le fertility mode: is supported
OCR for page 125
125
in practice by the general similarity between treatment estimates obtained
from different spatial formulations and by close agreement with classical
results when sophisticated design and analysis, such as balanced lattice
squares, has been used. Furthermore, there is frequent disparity between
spatial and conventional estimates when a poor design, such as randomized
complete blocks, has been employed, so that fertility adjustment seems to
be worthwhile. We briefly discuss quantitative assessment below but it may
also be desirable to modify a model-based procedure to accommodate gross
anomalies, particularly those caused by measurement errors or by abrupt
jumps in fertility, which may be the product of a change in underlying geo-
Togical structure, for example. Papadakis (1984) reviews his previous work
on this topic and Besag and Seheult (1989) summarize a closely related ap-
proach geared to first-differences analysis. In the context of (6.16), the two
types of anomaly might be catered for by nonlinear resistant versions of
and S respectively.
How can quantitative assessments be made?
Perhaps the only rigor-
ous method is to use data from uniformity trials in which all plots receive
common treatment. If a mock design is superimposed on such a trial, any
particular procedure can be used to estimate the relative "treatment" ef-
fects and, since these are known to be zero, an assessment of accuracy can
be made. Furthermore, the results are relevant to a real experiment under
the usual assumption of treatment additivity, provided the method of analy-
sis also acts additively. Predicted standard errors can also be compared with
actual variability of estimates. Unfortunately, in a random field framework,
each trial provides but a single assessment and many sets of uniformity data
are required for a proper evaluation. Moreover, uniformity trials are rarely
carried out these days, though, for an early catalog, see Cochran (1937~.
An alternative is to carry out an assessment within a randomization frame-
work (e.g., Besag and Kempton, 1986, Appendix 2), although of course, this
addresses a population for which the procedure was not designed.
Finally, what of Bayesian formulations? They are indeed conspicuous
by their virtual absence from the literature. The main difficulty is that of
representing in probabilistic terms one's prior beliefs about fertility varia-
tion. Thus, considerations are likely to be very similar to those arising in a
random field formulation (6.5) but inferential aspects may be more akin to
recent developments in Bayesian image analysis.
OCR for page 126
126
Bibliography
[1] Bartlett, M. S., Nearest neighbour models in the analysis of field ex-
periments (with discussion), ]. R. Stat. Soc., B 40 (1978), 147-174.
t2] Bartlett, M. S., Stochastic models and field trials, '7. Appl. Prob. 25A
(1988), 79-89.
[3] Beaven, E. S., Pedigree seed corn, .7. R. Agric. Soc. 70 (1909), 119-139.
t4] Besag, J. E., Spatial interaction and the statistical analysis of lattice
systems (with discussion). ~I. R. Stat. Soc., B 36 (1974), 192-236.
[5] Besag, J. E., Errors-in-variables estimation for Gaussian lattice
schemes, ~7. R. Stat. Soc., B 39 (1977), 73-78.
[6] Besag, J. E., On a system of two-dimensional recurrence equations,
J. R. Stat. Soc., B 43 (1981), 302-309.
[7] Besag, J. E., and R. A. Kempton, Statistical analysis of fielcl experi-
ments using neighbouring plots, Biometrics 42 (1986), 231-251.
[8] Besag, J. E., and A. H. Seheult, Contribution to discussion of Bruce
and Martin, A. R. Stat. Soc., B 51 ~ 1989), 405-406.
[9] Cochran, W. G., A catalogue of uniformity trial data, A. R. Stat. Soc.
('Suppl.,) 4 (1937), 233-253.
10] Dahiaus, R., and H. Kunsch, Edge effects and efficient parameter esti-
mation for stationary random fields, Biometrika 74 (1987), 877-882.
~] Gleason, A. C., and B. R. CuDis, Residual maximum likelihood (REML)
estimation of a neighbour model for field experiments, Biometrics 43
(1987), 277-287.
2] Green, P. J., Linear models for field trials, smoothing and cross-
validation, Biometrika 72 (1985), 527-537.
13] Green, P. J., C. Jennison, and A. H. Seheult, Analysis of field ex-
periments by least-squares smoothing, if. R. Stat. Soc., B 47 (1985),
299-315.
OCR for page 127
127
t14] Guyon, X., Parameter estimation for a stationary process on a d-
dimensional lattice, Biometrika 69 ~ 1982), 95-105.
t15] Kempthorne, O., The Design and Analysis of Experiments, John Wiley
and Sons, New York, 1952.
[16] Kunsch, H., Tntrinisic autoregressions and related models on the two-
dimensional lattice. Biometrika 74 (1987), 517-524.
[17] Levy, P., Charges doubles de Markoff et fonctions aleatoires de deux
variables, C. R. Acad. Sci., Paris 226 (1948), 53-55.
[18] Martin, R. J., The use of time-series models and methods in the analysis
of agriculture field trials, Comm. Stat. Theor. Meth. 19 (1989), 55-81.
[19] Papadakis, J. S., Advances in the analysis of field experiments, Pro-
ceeds. Acad. Athens 59 (1984),326-342.
[20] Patterson, H. D., and E. A. Hunter, The efficiency of incomplete block
designs in National List and Recommended List cereal variety trials,
]. Agric. Sci. 101 (1983),427-433.
[21] Patterson, H. D., and R. Thompson, Recovery of inter-block informa-
tion when block sizes are unequal, Biometrika 58 (1971),545-554.
[22] Rosanov, Yu. A., On the Gaussian homogeneous fields with given con-
ditional distributions, Theor. Probability Appl. 12 (1967),381-391.
[23] Tiao, G. C., and M. M. Ali, Analysis of correlated random effects:
Linear mode! with two random components, Biometrika 58 (1971),
37-51.
[24] Wiancko, A. T., Use and management of check plots in soil fertility
investigations, if. Am. Soc. Agron. 13 (1914),368-374.
[25] Wilkinson, G. N., S. R. Eckert, T. W. Hancock, and O. Mayo,
Nearest-neighbour (NN) analysis of field experiments (with discussion),
]. R. Stat Soc., B 45 (1983),151-211.
[26] Wilhams, E. R., A neighbour mode! for field experiments, Biometrika
73 (1986),279-287.
OCR for page 128
Representative terms from entire chapter:
treatment contrasts