Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.

Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 313

Predictability of Large-scale Spatially Embedded
Networks
Carter T. Butts;
10/23/02
Abstract
Although it is well-known that there is a relationship between soci~physical dim
tance and edge probability in interpersonal networks, the predictive power of such
distances for total network structure has not been established. Here, it is shown that
upper bounds on the marginal edge probabilities for far-flung dyads can be used to
place a lower bound on the predictive power of distance, and one such bound Is de-
rived. Application of this bound to the special case of uniformly placed vertices on
the plane suggests that only modest constraints are required for distance effects to
dominate at large physical scales.
Keywords: social networks, spatial models, Shannon entropy, predictive power, dis-
tance
~ Introcluction
Numerous studies show a strong relationship between physical distance and social structure
(e.g., Merton (1948~; Festinger et al. (1950~; Caplow and Forman (1950~; Blake et al. (1956~;
Whyte (1957~; Sommer (1969~; Snow et al. (1981~; Latane et al. (1995~; arguably, few other
findings in the social sciences can claim such a degree of strength and generality. While this
is an interesting and important result in and of itself, it begs a critical question: assuming
*This material is based upon work supported by the National Science Foundation under Grant No.
0100999, as well as an NSF Graduate Research Fellowship. Additional support was provided by tlie Center for
the Computational Analysis of Social and Organizational Systems and the Institute for Complex Engineered
Systems at Carnegie Mellon University.
tI would like to thank Kathleen Carley, David Krackhardt, Shelby Stewman, David Rode, and Katie
Faust for their comments and input.
"Institute for Mathematical Behavioral Sciences and Department of Sociology, University of California-
Irvine, Irvine, CA 92697, buttsc@uci.edu
DYNAMIC SOCIAL N-ETWORKMODELTNG~4ND A/JAl~YSIS
313

OCR for page 313

that the distance/edge probability relationship is as it appears to be, to what extent can this
account for the variability of social structure writ large? In his 1984 comment, "Chance and
Necessity in Sociological Theory," Bruce Mayhew makes the characteristically boIct claim
that well over 905 of the variation in social structure is determined by physical spacei.
If Mayhew's assertion is correct, then we would expect for network models based on vertex
position to allow us to develop extremely credible predictions of large-scale network structure.
Since individual positions can be inferred from population data, such a result should (in
principle, at least) allow us to reduce the problem of macrostructural prediction to one of
spatial demography. If the assertion is false, by contrast, then other approaches will be
required to effectively mode! the structure of social macrostructure.
While the "Mayhew question" is unlikely to be settled by a single paper, it is shown here
that the requirements for predicting network structure from vertex layout are fairly modest.
Fairly minimal constraints on the probability of edges between distant alters are sufficient
to establish a lower bound on the predictive power of distance, where predictive power is
defined in terms of reduction in the Shannon entropy of the total structure. Application of
this bound to the special case of vertices in a planar region suggests that the requirements
for strong distance effects (e.g., > 90'7o uncertainty reduction) are likely to be attainable
in practice for moderate to large physical scales, and thus that it is reasonable to expect
that large-scare spatially embedded social networks will be readily predictable from vertex
position data.
2 Notation en c} Basic Assumptions
For the results which follow, we will focus exclusively on the case of a loopless undirected
graph G = (V, E) with known vertex set V and uncertain (i.e., random) edge set E. (For
convenience, we denote the cardinality of V by N V.) It is assumed further that G is
spatially embedded, in the sense that there exists some space $ and set V {v~,...,vN)
such that V C $ and vi is the position of vertex vi. We assume that there exists some
distance function, cl, on $, but do not require it to be a metric (i.e., it need not satisfy the
triangle inequality). Substantively, the the most obvious interpretation of $ is as a social
physical space (often called a "Blau" space (Blau, 1977~; such a space may include both
physical and demographic dimensions, including gender, age, race, and primary language.
For the purposes of our present application, we will focus on a physical space interpretation
of $, but it should be emphasized that this is not required for the general result to apply.
iThis, he quips, being an artificially low estimate due to measurement error.
314
DYNAMIC SOCIAL N~TWORKMODELI?JG AND ANALYSIS

OCR for page 313

3 Predictability of Spatially Embedded Networks
What, precisely, is meant by the "predictability" of spatially embedded networks? In the
context of this paper, predictability is understood to be the extent to which our initial
uncertainty regarding network structure is reduced by the provision of new information.
Specifically, we are interested in the extent to which knowledge of vertex positions within $
reduces our uncertainty regarding the edge set of G. A natural measure of uncertainty - and
the one which we shall employ here - is the Shannon entropy, which can be interpreted as the
expected length of an optimally encoded signal expressing the value of a random variable.
Denoting the entropy function by I, we form the R2-like predictability measure
P(G~V)—~ _ i(G~V)
(I)
which expresses the extent to which knowledge of the vertex position set, V, can account
for the tote] information content of G. With P as our notion of predictability, we can state
the following general result:
Theorem 1 (Predictability). Let G = (V,E) be a spatially embedded random graph
with vertex position set V and distance function 1, and let G be distributed such that
p({vi, vj) ~ E (G)) ~ ~ bt vi, vj : ~ (vi, vj) > rc, for some ~ < 0.5. Then
P(G~V) > p( rc)~]—IBM), and lime ~oP(G~V) > p~d(vi,vj) > rc), where
P(G~V) = ~ _ ~v'), ~ is the Shannon entropy, and IBM) =—clog26—(1—6)iog2(1—61.
Proof. For convenience in notation, let clij - cl(vi,vj) and let eij _ {vi,vj) ~ E(G).
We begin by assuming that, for actors within radius rc, distance tells us nothing regard-
ing edge probability; that is to say, p(ez'- Idi~ < rc) = 0~5. Then it trivially follows from
the definition of the Shannon entropy that I (eij Idij < rc) - 1. For the complementary
case' we begin by noting that I (eij |dij > rC) —p (eij |dij > rC) log2 p (en |dij > rC ) - (1—
p(eij Idij > rc)) log2(1—p(eij |dij > rc)). The fact that p(eij Idij > rc) < ~ < 0.5 then
implies that I (eij idle > rc) <—~ log2 ~—(1—c) log2 (1—6) = IBM)
We now consider the entropy of the entire graph. Using the well-known result that
I(X, Y) < I(X) + I(Y) for (possibly dependent) random variables X, Y. we can bound the
DYNAMIC SOCIAL NETWORK MODELING AND ANALYSIS
315

OCR for page 313

entropy of the graph by the sum of the (independent) edgewise entropies. Therefore we have
I (G IV) ~ At, I (eel IV)
{i,j}
At, 1 + it, IBM)
{i~j}:dii rc
= (I (` )I)p(di; rC)IB(6)
= [p(dij rc) IB(~)1J (IV (\G)1)
= tp (dij < rC) + (1—p (dij < rC))]B(~)] (I ~ ) 1),
Since the uninformative entropy of G is given by I(G) = (IV(2G)I)7 it follows that
P(GIV) 1 _ I (G TV
> 1 -
[p (do < rc) + (1—p (dij < rC))IB (I)] ( 2 )
1—p (do ~ rc) - (1—p (A < rc))Is (a)
p (dij > rc)—p (dij > rc) IB (I)
p (all; ~ rc) (1—IB (I))
which demonstrates the first portion of Theorem 1. To complete the proof, we allow ~ ~ O
and take the limit:
imP(G~V) > limp (clij > rc) (1—IB (I))
= limp (dij > rc) (1 + clog2 ~ + (1—c) log2 (1—c))
P (all; ~ rc)
Lo
This is a powerful and general result: it tells us that whenever we can place a reasonable
upper bound on the marginal edge probability between distant vertices, we can use the
quartiles of the distance distribution to place a lower bound on the predictive power of V.
Furthermore, when this bound on marginal edge probability becomes small, the predictive
power of the position set becomes bounded by the probability that the distance between
two randomly selected vertices will exceed the critical threshold. Thus, where the threshold
316
DYNAMIC SOCIAL NETWORK MODELING AND ANALYSIS

OCR for page 313

distance is small relative to the overall distribution, we can guarantee that the total structure
will be easily predicted from vertex position alone
One important and somewhat counter-intuitive aspect of Theorem ~ is that it does not
depend on N: the predictability of the total structure can be bounded by a function which
depends only on the geometry of the population layout. Similarly we did not have to
assume dyadic independence to obtain this result (only bounds on the edgewise marginals).
These two facts greatly facilitate the application of Theorem ~ in the field. where population
distributions and some crude estimates of the distanceiedge probability relationship may be
all that is available. They also serve to reinforce the argument that the predictive power
of distance is robust to varying assumptions about the precise determinants of network
structure.
4 Uniform Population Distribution on the Plane
Consider the special case in which a population of arbitrary size is placed uniformly within
a square region of size ~ x {. Such a model may be thought of as a first approximation to
a sparse population distribution in physical space, particularly over large areas. Here, we
show the minimum threshold distances necessary to obtain a given level of predictive power
for a structure on the plane, as a function of the linear scale fed of the embedding region.
As will be shown, the only modest critical thresholds are required to guarantee high levels
of predictability under uniform vertex placement.
4.] Distribution of Inter-point Distances
In order to apply Theorem I, we must first know the distribution of inter-point distances
for square planar regions. Under the assumption that ~ is the euclidean distance, we derive
this distribution in the following lemma:
Lemma 1. Let vi,vj designate two randomly selected points on a two-dimensional plane,
with each coordinate being IID u(o,e). Then the density function of d (vi,vj) is given by
~ 2d [e2—Ed + e2]
f(d) = 1 2d (~sin 1 (I) _ e (I - i) + e4 ] ~ < d < It,
O _ d <
otherwise
DYNAMIC SOCIAL N~:TWORK MODELING AND ANALYSIS
317

OCR for page 313

with associates! distribution function
~0
1 2 And _ 83 + d4]
F

OCR for page 313

which, by symmetry, can be collapsed to
(for O < x < b2).
1 _ 2 (b—>/~)
2bb
b_ Aft
I_
b2
To obtain the associated density function, fr2 (X), we simply differentiate:
fT2 (X) C, FT2 (X)
~ (1_ (b—
fix ~ b2 J
_ ~ o < X < b2
O otherwise
Having derived the density of a single T2 variate, we now must consider the sum of two
such (IID) variates. Since the variates in question are independent, their joint density is
simply the product of their individual densities, and we may obtain the density of the sum
via convolution. In this case, however, we note that the domain of T2 + T2 depends on the
individual variates, and hence some care is needed in choosing the limits of integration We
divide the density at x—b2 (the point beyond which both variates must be greater than 0),
taking the tower region first:
rz
fT2+T2 (z 1° < x ~ b ) ~ fT2 (y) fT2 (x—y) dy
o
(bit b2)(b~ b2)
b2 sin ( Y ~)1 _2~I~+2v~I~ y1=
b2 [sin-1 (1)—sin-1 ( - 1)]— ~— ~ + b4
~ 4~ + x
b2 be b4
DYNAMIC SOCIAL NETWO=AJODELING AND ANALYSIS
319

OCR for page 313

Now, we consider the upper region:
b2
fT2+T2 (X |b < x ~ 2b ) / fT2 (y) fT2 (x y) dy
+
Jx - b2
i_b2 (bit b2) (be - b2) dy
1 1 (2y—z ) 162 24|b2 2~lb2 ~ lb2
b2 ~ X J |=—b2 b3 1=—b2 b3 Ix—be b4 1X—b2
1 [sin~1 (2b2 _ 2 ) _ sin~1 (2 - 2b2)] _ 2 (b—A/ )
b3(~ b)+2b2_ 2
2 1 2b2 - _ 4 b - —b2 + _
b2 ( 2 ) b3( ) b
b3 1=—h2
Finally, putting this together, we obtain the complete density for a sum of two squared
triangular variates:
fT2+T2 (X)
~ Or 4~ x
. - _ ~ _
1 b2 b3 1 b4
sin ~ (IBM ) —~ (b ·~) + 2b2_x
LO
O _ x < b2
b2 < x < 2b2
otherwise
From the above, we may now derive the density of d. The last step in this process
involves applying a positive square root transformation to the sum of squared triangular
variates. This is a monotonically increasing one-tmone transformation, and we can thus
derive the new density by a simple change of variables:
fJ~(X) = fT2+T2 (X ) ~J~
(where ~~ is the Jacobian determinant of the transformation)
fT2+T2 (X ) 2x
{2z t~—~ + ~b4 ~ O < ~ < b
= 2 - 1 2b2_ 2 _ 4 2b2_ 2
2x it sin (I) ~ (b ~ + b4 ~ b < x < Mob
LO
otherwise
To arrive at the distribution function of d, we need merely integrate fame over the desired
320
DYNAMIC SOCIAL N~:TWORK MODELING AND ANAl~YSIS

OCR for page 313

: - - -
Scale P(GIV)
(xt 90~0 95% 99~O
O.IxO.l
Next ~
lOxIO km
lOOxlOO km
0.019 0.013 0.006
0 192 0.133 0.058
1.924 1.331 0.578
_ 19.236 13.311 5.923
. .
All rC values given in kilometers.
Table 1: Maximum Critical Radius as a Function of Scale and Uncertainty Reduction, Uni-
form Vertex Placement
range.
F,7~(x) =
[O X < 0
to 2X ~b2 — 4b3 + ~b4 ~ my O < x < b
i0 2x ~b2 — b3 + b4 ~ my
+ ib 2x tb2 sin ~ =2 ~
—b3 (b—~ + b ala ~ dY b /7b
1
TO
2 [2b2 3b3 + 4b4]
2~2 [i 2 (but—b4) + sin 1 ( =2 )]
+8(22_b2)2 =4
1 b3 b4
x > Mob
Substitution of ~ for x and e for b completes the proof
4.2 Predictability
x < 0
O ~ x < b
b < x < Mob
x > pub
[1
Using Lemma ~ together with Theorem 1, we may determine the maximum rC value needed
to guarantee that a given fraction of the uncertainty in G can be accounted for by the
euclidean distances between vertex positions; these threshold values are shown in Table I.
As the table indicates, a critical radius at or below approximately ~o20 of the linear scale of
the region is adequate for a ~o90 uncertainty reduction, with a reduction of %99 possible for
radii of 0.06t or less.
DYNAMIC SOCKS NETWO~MODE~WG AND TRYSTS
321

OCR for page 313

How do these threshold values compare to empirical assessments of the distance/edge
probability relationship? Fits of dyadic edge models to existing data sets suggest that a tow-
probability threshold is attainable, but also clearly indicate that thresholds will depend upon
relational content (Butts, 2002~. Recalling that P(G~V) > p(d (vi, vj) > rc) (l—IB any), we
can express the adequacy of the threshold approximation in terms of JIB (~) In this regard,
~ ~ 0.001 is sufficient to bring the predictability bound within approximately Ado of the limit;
such a bound is not hard to achieve in practice. Based on the Butts models, thresholds of
as little as 0.05km may be reasonable approximations for face-t~face contact, with larger
thresholds of approximately 0.25km and 1Skm for social friendship and telephone contacts,
respectively. Although it may be possible to obtain more predictive power using these or
other models than Table ~ would suggest, the lower bounds alone indicate that physical
layout has the potential to account for the overwhelming majority of network structure at
even modest spatial scales.
5 Conclusion
To summarize, then, it would seem that even a very modest null mode] based on physical
distance (the threshold model) must account for the vast majority of network structure in
large-scale networks, under quite minimal assumptions. Since fitted models have the capacity
to be much more informative than the null model, they are expected to provide even more
information about network macrostructure at even smaller scales. Thus, not only might one
reasonably speculate that distance could account for most of the uncertainty in large-scale
interpersonal networks, it almost has to do so. This would seem to vindicate the intuition
of theorists such as Mayhew, who perceived that physical space was a critical structuring
force, but who did not demonstrate the extent of that result.
References
Blake, R. R., Rhead, C, Wedge, B., and Monton, ]. (1956~. Housing architecture and social
interaction. Sociometry, 19:133-139.
Blau, P M. (1977~. Inequality and Heterogeneity. Free Press, New York.
Butts, C. T. (2002~. Spatial Models of Large-scale Interpersonal Networks. Doctoral Disser-
tation, Carnegie MeHon University.
Caplow, T. and Forman, R. (1950~. Neighborhood interaction in a homogeneous community.
American Sociological Review, 15:357-366.
Evans, M., Hastings, N., and Peacock. B. (2000~. Statistical Distributions. John Wiley and
Sons, New York, third edition.
,
322
DYNAMIC SOCIAL NETWORK MODELING AND ANALYSIS

OCR for page 313

Festinger, L., Schachter, S., and Back, K. (1950~.
Stanford University Press, Stanford, California.
Social Pressures in Informal Groups.
Latane, B., Liu J. H., Nowak, A., Bonevento, M., and Zheng, L. (1995~. Distance matters:
Physical space and social impact. Personality and Social Psychology But7tetin, 21~:795-
805.
Mayhew, B. H. (1984~. Chance and necessity in sociological theory. Journal of Mathematical
Sociology, 9:305-339.
Merton,R. (1948~. The socialpsychologyof housing. InDennis,W.,editor, Current Trends
in Social Psychology, pages 163-217. University of Pittsburgh Press, Pittsburgh, PA.
Snow, D. A., Leahy, P. ]., and Schwab, W. A. (1981~. Social interaction in a heterogeneous
apartment: An investigation of the effects of environment upon behavior. Sociological
Focus, 14~4~:309-319.
Sommer' R. (1969~. Personal Space. Prentice-Hall, Englewood Cliffs, N].
Whyte, NAi. H. (1957~. The Organization Man. Doubleday, Garden City, NY.
DYNAMIC SOCIAL N~TWO=MODE~NG ED PHYSIC
323