| Copyright © 2009. National Academy of Sciences. All rights reserved. Terms of Use and Privacy Statement |
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 229
Statistical Models for Social Networks:
Inference and Degeneracy
Mark S. Hancock
Department of Statistics
Center for Statistics arid the Social Sciences
University of Washington
Box 354322
Seattle, WA 98195-4322, USA
January ~ 8, 2003
Abstract
This paper presents recent advances in the statistical modeling of random graphs that have
an impact on the representation of social networks. We also consider issues related to the
estimation of random graph models. For concreteness the focus is cross-sectional social
networks.
Statistical exponential family models (Wasserman and Pattison, 1996) are a generalization
of the Markov random graph models introduced by Frank and Strauss (19861' which in turn
derived from developments in spatial statistics (Besag, 1974~. These models recognize the
complex dependencies within relational data structures.
To date. the use of stochastic graph models for networks has been limited by three
interrelated factors: the complexity of realistic models, lack of use of simulation studies, and
a poor understanding of the properties of inferential methods.
We discuss these factors and related issues of the degeneracy of inference for commonly
promoted models. As a cornerstone of this development we present a Markov Chain Monte
Carlo (MCMC) algorithm for general random graph models. We also review the role of these
MCMC algorithms in simulation, likelihood-based inference, identifying degeneracy of
inference, and Bayesian formulations.
KEYWORDS: Random graph models; Markov Chain Monte Carlo; Bayesian statistics.
I. Introduction
Networks are a form of "relational data". Relational data arise in many social science fields
and graph models are a natural approach to representing the structure of these relations. In
these applications, the nodes usually represent people, and the edges represent a specified
relationship between the people. This framework has many applications including, for
example, the structure of social networks, the behavior of epidemics, the interconnectedness
of the WWW, and lon=-distance telephone calling patterns.
We consider stochastic models for such graphs that form a statistical exponential family. This
class has been referred to as the " pi " class of models in the psychology and sociology
literatures (Wasserman and Pattison, 19961. Given their general nature and applicability, we
shall refer to them simply as random graph models. A much studied sub-class of models are
the Markov random graph models introduced by Frank and Strauss (19861. These models
DYNAMIC SOCIAL NETWORK MODELING ED ^^YSIS
229
OCR for page 230
230
attempt to model the stochastic mechanism that produces the social ties and so represent the
complex dependencies so induced. There is a large social network literature on this area
which we do not review here. See Wasserman and Faust (1994), and Pattison and Robins
(2000) for detailed information. The topic has connections to a broad array of literatures and
here we emphasize the links to spatial statistics, statistical exponential families, log-linear
models and statistical physics.
The exploration of the properties of graph models has been limited by three factors. First the
complexity of realistic models has limited the insight that has been gained by analytical
methods. Most analytical work has focused on simple one or two parameter models with
independence between the dyads (Wasserman and Faust 1994, Frank 1997~. Second
statistical methods for the stochastic simulation from general random graph models have not
existed. Because of this the properties of general models (e.g., range of graphs represented,
dependence among the parameters) can not be explored though simulation studies. Third the
properties of statistical methods for estimahn:, the parameters based on observed networks
are poorly understood (we consider inference in Section 3~. Hence the range of parameter
values relevant to real networks is largely unknown. In this paper we improve understanding
of the nature and properties of graph models important to social networks by further
considering methods for the stochastic simulation of, and inference for, random graphs.
We address one aspect of modeling, that has been a persistent obstacle in the work in this
area: inferential degeneracy. Many previous attempts to develop MCMC based estimation for
Markov models have found that the algorithms nearly always converge to degenerate graphs
- graphs that are either empty or full, or the algorithms do not converge consistently. Using
statistical exponential family theory, we show that this is a function of the form of the model
and algorithm used.
In the next section we review the statistical theory of social network models expressed in
exponential form. We also consider MCMC algorithms to simulate graphs from these models.
In Section 3 we consider forms of inference for the parameters of the models with focus on
likelihood-based forms and Bayesian methods in particular. In Section we consider model
degeneracy and some possible solutions to it.
2. Review of theory
Let the random matrix X represent the incidence matrix of an undirected graph onn
individuals. Thus X is an n x n symmetric matnx, each of whose entries is a Bernoulli
random variable; we assume further that the diagonal elements of X are 0, which is to say that
self-partnerships are disallowed. Suppose that X denotes the set of all possible graphs on the
given ': individuals. The multivanate distribution of X can be parameterized in the form:
P,9(X=x)= Pow' ~ xeX <~'
where ~ ~ ~ c i 4 iS the model parameter and text is a q -vector of statistics based on the
graph x. (Wasserman and Pattison, 1996~. The denominator c(~) is the constant that ensures
the distribution sums to one: c(~) = ~ exp :RTt~y)] . Note that X contains at most
.~
N = non -1~/2 graphs, and C) = l0: c(~) < oo3 . The dimension of ~ is at most 2N _ 1 (for
the "saturated" model). although is typically much smaller than this. The parameter and
DYNAMIC SOCIAL NETWORK MODELING AND ANALYSIS
OCR for page 231
OCR for page 233
OCR for page 234
OCR for page 235
OCR for page 236
OCR for page 237
OCR for page 238
OCR for page 239
OCR for page 240
Representative terms from entire chapter:
random graph
statistics that correspond to a Oven model for X can be identified usin;, the Hammersley-
Clifford theorem (Frank and Strauss 1986~. Conversely, each choice of parameter and graph
statistics text specifies a model for X . For example, if the dyads X~j are mutually
independent the model can be wntten: logEP'7(X = x)] = ~ Pijxij - K(~), X~ X where
icj
all = loait[P~(X~j = xij )] and K(~) = log[c(~)] = ~ logy ~ exp(6jj)] . Thus q=N and the
red
elements of text are just xjj . This model is often called a Bernoulli graph. In the special case
where the dyads have a common probability, lo~tP6,fX = x)] = ot(X)- 7~) where q = 1,
text is the number of partnerships in the graph. and ~ can be interpreted as the common lo=-
odds of partnership formation within a dyed. The mathematical properties of this
homogeneous Bernoulli graph model has been extensively studied (see, e.;,., Renyi and Erdos
1959), although its simplicity and homogeneity make it less usefu] as a realistic model for
social phenomena.
An alternative specification of the model (1) clarifies the interpretation of the parameters. Let
X,,. = 1' X,;,: kl ~ id, k < Ill ~ x`,. = ,f x,,': kl ~ id, k < i}, x`,. = {x`,. and xjj = l}, and
x`,. = {x`, and x,; = 0} . Thus, X`,: represents all elements in the graph excluding Xjj, while x`,
and x`, represent the graph with the xij equal to 1 and 0, respectively. The full conditional
distnbutions of Xij are given by
lo~t~pr(Xij = 1 ~ X,,. = x`,.~] = B7 3(x`,) xe X (2)
where 3(x,~.~=~(x,,.~-t~x,,) (S~aussandIkedal991~.Thestatistic §(x`,) is the changein
the graph statistics when xij changes from O to 1. Hence ~ can be interpreted as the increase
in the conditional log-odds of a partnership between individuals i and j induced by the
formation of the partnership and conditional on all other ties remaining unchanged. In the
homogeneous Bernoulli graph, for example. ~ is the common log-odds of individual
partnership formation.
Holland and Leinhardt (1981) appear to be the first to propose log-linear models for social
networks. Suppose that the dyads are independent with
| my if x=l,y=i
Pr(X`; =X,Xj, = y)=5 a;j if x=O,y=1 (3)
l nij if x=O,y=o
Thus each dyed can have its own probability distribution. Thus the mode] can represent
arbitrary attractiveness between individuals and degree of reciprocity within relationships.
However the dyed independence implies specific transitivity and higher-order interactions.
The model can be expressed in log linear form as:
log [Pr(X = x)] = ~ pijx~jxji ~ ~ 9,j x;,—K(~) XE X
i
232
analysis of graphs with "colors" on the nodes, the coloring indicating the attribute. Recent
developments have included new forms of dependency structures, to take into account social
settings, and on the other hand a relaxation of Markovian dependence assumptions, allowing
investigation of longer range configurations, such as longer paths in the network or larger
cycles (Pathson and Robins, 2000~. Models for bipartite (Faust and Skvoretz, 1999) and
tripartite (Mische and Robins, 2000) graph structures have also been developed.
The incorporation of attribute information into the random graph mode] is straightforward.
Suppose we wish to incorporate p exogenous covariates represented by a n x n x p array of
attributes W where the ijkth element is the covar~ate for the prh attribute on the id dyed.
Note that this allows the covanates to be attributes of the pairs of individuals (e.g., age
difference) as well as specific to the individual alone. The graph statistic text in (1) is then
replaced by t(x,W), indicating that the statistics depend on the attribute infonnation in
addition to the relationship information. In this sense, the statistic tf ~ can be defined to
reflect the attribute information. We consider an implementation of this in the next section.
2.1. Cross-sect~onal modelsfor random graphs
The most common form of random graph models exhibit Markov dependence in the sense of
Frank and Strauss (1986~. For these models, dyads that do not share an individual are
conditional independent: an idea analogous to the nearest neighbor concept in spatial
statistics. Typically a homogeneity condition is added: all isomorphic graphs have the same
probability under the model. Frank and Strauss (1986) show that homogeneous Markov
Graphs are exactly those having the triangle parametenzatior?: q = n, Be I) =; N and
I, (x) = k! an, x~,i, ...x;`y' k = I, ..., n—~ en (x) 6 ~ XijXjkXki
where [k (X) is called a k - star and In (x) is a count of Me complete mads. An equivalent
form is the degree distribution parametenzahon:
d'` (x) = the proportion of individuals with exactly k relationships k = 1, ..., n - 1
t'' (X) = 6 ~ x~jXj`Xk''
in which d~ (x) counts the proportion of individuals with degree k. The degree distribution
parameterization has the advantage that it is directly interpretable in terms of concurrency of
partnerships (i.e. dam) for m > 0 counts the number of individuals with m concurrent
partners).
A form of model that may realistically represent dependent social processes, reflect the
. ,~ . .
impact or covanates 1S:
exp ~xTZ,B + ~~0 ~; dl. (x) + 0tt(x)]
c(a,,B, B)
x ~ X (4)
where x is the N -vector of the unique elements of X, Z = {Z;j }N is a matrix of
(exogenous) covanates on the id"' dyed, text a q -vector of additional network statistics, ~
p -vector of regression parameters. ~ g -vector of degree parameters, and ~ a q -vector of
DYNAMIC SOCIAL NETWORK MODELING AND ANALYSIS
network structure parameters.
This mode] is an extension of equation (1) and contains a number of special cases that have
been considered in separate literatures. First the homogeneous Markov graphs of Frank and
Strauss (1986) correspond to the case where there are no covariates and the additional
network statistic is the count of complete triads. Second, if there are no covanates and no
additional network stausucs the model corresponds to the random graphs with arbitrary
degree distributions considered in Newman, Stro~,atz and Watts (20011. This model assumes
that all graphs with the same degree distribution are equally likely. On particular interest are
the various "scale-free," preferential attachment and power-law models popular in the physics
literature (see. e.:,., Newman 2002~. Third, if there are no decree distribution parameters and
no additional network statistics Me mode] corresponds to the random graphs with
independent dyads. The model Shll allows differential activity levels and assortative missing
via the covanate parameters. Such models have been traditionally specified as log-linear
models of mass action. See Morns. Goodreau and Koehly (2003) for an examination of the
connections between the two.
As an example, consider a random graph model for a social network based on heterosexual
partnerships. An example of a cross-sectional mode] that incorporates captures both
concurrency and the assortative mixing based on race is:
P''kX = x) = exp
~ ~ exude (x) + prace(x)] / C((X9 p, 6) X ~ X
k-=1
Po(X =x)=0 xe X/X
where race(x) is the proportion of partnerships where both partners say they are of the same
race, Xh is the subset of graphs in X in which all ties are heterosexual. and He j 4 . The
parameter ~x, represents the propensity to form monogamous relationships. The parameters
~x~, ..., ~x: represent the propensity to form concurrent partnerships with 2, 3 and 4 partners,
respectively. Hence positive values for the latter parameters indicate the propensity of the
graph toward higher degrees of concurrency. The parameter ~ represents the propensity for
partnerships to be formed between individuals of the same race. Note that the seraph and
model reflect the heterosexual nature of the graph (i.e., Xi, = 0 unless i and j are of
opposite sex). If A,..., a~ are zero the dyads are independent and the mode] is simply
assortative mixing by race. The model can then be re-wntten as a contingency table form
(Morris, Goodreau and Koehly 2003~. If ,6 is zero then there are no preferences for
partnerships based on race. We consider alternative parameterizations for this model in
Section 3.
These models have a form that is flexible enough to capture the structure of many real
networks. However to be practically useful we need to be able to simulate graphs with these
distributions, infer their parameters from real data and measure their overall and relative
goodness-of-fit. We now turn to each of these topics.
2.2. Simulating random graph models
The mathematical form of the models (1) and (4) allow graphs to be generated from them
using Markov Chain Monte Carlo (MCMC) algorithms. Indeed MCMC algorithms have been
much studied and are a nature way to simulate social networks (e.g. Gilks et. al. 1996,
DYNAMIC SOCIAL N~TWORKMODE~ING~D ISIS
233
Newman and Barkema 19991. The basic idea is to construct a Markov Chain onX with
P(X = x) as the equilibrium distribution. This is operational~zed by starting from a graph In
X, and making a large number of Markov transitions until approximate convergence to
P(X = x) is reached. Subsequent transitions are sampled and form a (dependent) sample
from the desired model. For details, see the extensive literature cited in the above books.
Many chains are possible with vastly different statistical properties. However convergence is
"ensured" under mild conditions on the Markov Chain (irreducibility and aperiodicity). For
the social network representation (1) this process has been studied by Crouch, Wasserman
and Trachtenberg (1998), Conander et. al. (1998), and Snijders (20029.
The full-conditional MCMC for the model (1) and (4) has a simple form and is motivated by
the auto-logistic representation (21. In this algorithm each dyed is updated separately based
on the conditional probabilities given in (2~. This so-called "Gibbs samplin;," or "heat bath.'
algorithm chooses the dyads randomly, sequentially, or using some mixture of the two. Each
update requires the difference S(x,, ~ = tax,,. ~ - text, ~ to be detennined. The speed of the
calculation of Six,, ~ is art important factor in the computational quality of the aIgonthm (i.e.,
speed of convergence to P(X = x), the relative frequency of producing each graph in X ).
Alternative Metropolis algorithms propose transitions from curry t to XP~OPOSC~J based on the
unconditional probabilities require the determination of t~xp~opoSca, ~ - t~xCur~er'`, ~ . These tend to
perform better for the same amount of computation. More general Metropolis-Hastings
algorithms choose XprOpOse`~ from an ancillary stochastic process dependent on XCUrren! and
aimed at either focusing the transitions or spreading them more broadly through X . The
behavior of these aIgonthms is very dependent on the choice of statistics text .
The papers that use MCMC algorithms to simulate social network models report difficulties
in convergence to realistic distributions (Crouch, Wasserman and Trachtenberg 1998
Conander et al 1998. and Snijders 2002). A typical occurrence is for the algorithm to produce
graphs that are complete, empty, or else are extreme in related ways. Conander et. al. 1998
consider al~onthms that condition on the number of ties in the graph. This ensure that the
realizations are not complete or empty graphs. However, in most circumstances the density of
the graph is a product of the social process that produced it and can not be assumed to be
known in advance. SnijUers 2002 reports on a number of related odd properties. In some
cases the sequences of realizations transition quickly between very different graphs after
periods of minor variation. In Section 3 we indicate why these algorithms behave in this way,
how it relates to the model and how it can be resolved.
3. Inference for random graph models
Statistical methods for estimating the parameters in a random seraph model are
underdeveloped. Developing inference with a likelihood framework has the advantage of
being able to draw upon a statistical theory for closely related models in statistical physics
and spatial statistics (Besag 1975, Geyer 1999~. This framework is also compatible with the
Bayesian paradigm (German, Carlin. Stern and Rubin 1995~. The likelihood framework
makes available exploratory graphical tools useful for inference about the underlying random
field (Handcock, Meier and Nychka 19941. These tools, for example, can identify when a
maximum likelihood approach is insufficient for inferential purposes.
234
DYNAMIC SOCKS NETWO=MODEL~G~D TRYSTS
Major barrier to the application of random graph models to social networks has been the lack
of a sound statistical theory to evaluate how closely the models capture the structure in the
observed graphs. This has two dimensions to it: the degree to which the graph structure of the
models matched that in the data, and second the degree to which disease propagation through
the model matches that of the data. The likelihood-based approach allow one to measure the
first dimension of "goodness-of-f~t" using Monte-Carlo p-value statistics and Bayes Factors
under the Bayesian paradigm (Gelfand 19961.
3. 1. Likelihood inference for random graph knowers
In principal, likelihood inference is straightforward for exponential random seraph models. In
this section we describe the most fundamental ideas and why direct computation is difficult.
The log-likelihood for model (1) is:
L(~;x) - log[P6,fX = x)] = B7t~x)- Axe X
where K(~) = logy. Likelihood inference is difficult as direct evaluation of L(~;x)
requires calculating
C(~) = ~ expEBTt(X)]
.~e x
For even simple and modest models this can be problematic. Table 1 Eves the number of
elements in X (i.e., 2N ~ as the number of actors grows.
Table 1: Number of Graphs for Different Numbers of Actors
Number of Actors. Number of Elements
3
6
in X
8
32,728
1.24 x 1027
2.307x109°
10
25
3.2. Alternative inference for random graph models
Clearly for realistic graphs direct enumeration is not feasible. Many alternatives have been
proposed. Frank (1971) and Frank and Strauss (1986) consider (linear) approximations to the
cumulant generating function:
as a means to solve the likelihood equations
C(!V) = E[log(ivt(X))]
E,3 [ t( X )] = t(Xobsen cd )
~ ~ ~ ~ 1
This approach is generally difficult to apply to general multiparameter models unless
supplemented by a means of simulation from the same network model (Coriander et al 19981.
One commonly used method is pseudolikelihood originally motivated by, and developed for,
spatial models by Besa=, (19751. The idea is to use an alternative local version of the
DYNAMIC SOCIAL NETWORK HODELI?JG AND ANALYSIS
(5)
(6)
235
likelihood function defined the pse~`a?olikelihood'. The pseudolikelihood for model (1) is
algebraically identical to the likelihood for a logistic regression of the unique elements of X
on the design matrix with i th row Six,,. ). The value of the maximum pseudolikelihood
estimator for social networks can then be expediently found by using logistic regression as a
computational device. Explicitly, the value of the maximum likelihood estimator for the
logistic regression will also be the maximum pseudolikelihood estimator. Note, however, that
the other characteristics of the maximum likelihood estimator do not necessarily carry over.
In particular the standard errors of the estimates of ~ from the logistic regression will not be
appropriate for the maximum pseudolikelihood estimator. The statistical properties of
pseudolikelihood estimators for social networks are only partially understood and are
discussed in Handcock (2000~. In the remainder of this paper we focus on likelihood based
estimation.
3.3. Existence and uniqueness of MLE
The maximum likelihood estimator (MLE) for ~ is:
~ = argmax,'~~,~,L(~;xObse,ve~ ~
Many properties of the MLE can be derived from statistical exponential family theory. Let C
be the closed convex support of t(X) - the convex hull of the discrete support points. Denote
the interior of C by ins(C) and the boundary by act
Result:
a) If (I) is open, the MLE exists if, and only if, t~xobsc,~'e`;~ ~ is inside the interior of C.
b) If it exists, it is unique. In addition, when it exists, it can be found as the unique solution
of (6) or equivalently as (7), the unique local minima of (5~.
c) As the support of t(X) is discrete, and l~(xObse,.ve`,,) is in it, a necessary and sufficient
condition for the MLE not to exist is that Ax ~ ~ falls on the boundary calf the Not
and that boundary be part of C.
~ A ~ov.~ervea ~ ~ ^^—~ ~ ~^ ma_ -~^ -
The result is a fundamental property of statishcat exponential families (Ban~dorff-Nielsen
19781.
In practice this means that attempt to numerically maximize the likelihood leads to
unbounded estimates when the observed graph has statistics falling on the boundary of C.
This typically means the optimization algorithm does not converge. Simulation studies in
Handcock (2000) show that this is common for realistic models. If t(xobs.ened ) falls on the
boundary of C it is still possible that subsets of 19 have MLEs that exist. Let
t(Xobserved ) (ta (Xobsc~ne~d )' Observed )) and ~ = (RO ~ P~) be similar partitions of t(Xobsen.~,d ) and
v. V110~, 11111" ~VIt"~Vl1~' 11 `~^he"m,"~) is on the bounds of C Ed tax ~ . ,) is in the
~O TInA^'~;1~ remits; ;( ~ {~
;~ ~~ +1~ I.. ~¢ rat _
, _
v ~ TV. rim cola ~
relative interior of the convex hull of {t(X): tU(X) = ta(Xo~,Se,`.ed )} then the MLE of 0~ exists
and is the unique local minima of the conditional likelihood equation:
CLIP; x, ~—log [Pi, f X = x ~ ta (X ~ = to (Xobs~ve`~ By] = 6' ~ (X) K(~' ta (Xvbser,~ ))
236
D YNAMIC SOCIAL f JETWORK HODE~G ED ^^ YSIS
(7)
X ~ X hi (X ) = Ice (subserved )
3.4. Likelihood inference based on MCMC algorithms
The difficulties in determining the normalizing function for the model (1) can be overcome
using MCMC algorithms. The approach is based on the following idea: If a large number of
simulations from a social network in the same ballpark as that of the observed network can be
generated there these can be used to approximate the normalization function to a desired
accuracy. Estimate the "population" value:
C(~) = ~ expL07t~x)
xeX
by the weighted "sample'' mean:
c(~=~X~
M sampled fray ·
expels t~y)]wty)
where the sample weights whys are internally normalized to sum to unity. We can use the
MCMC approach of Section 2.2 to simulate a sample of graphs from a model similar to the
true, but unknown, model. The approximation to the likelihood using this approach is called
the MCMC likelihood. The MLE for the data is them approximated by the maximum of the
of the MCMC likelihood (the MC-MLE). This idea has been made precise and studied by
Geyer and Thompson (1992~. They show the MC-MLE converges to the true MLE as the
number of simulations increases. The procedure also produces estimates of the asymptotic
covariance matrix, the size of the MCMC induced error, and other related quantities.
As the sampled graphs fond the basis of a statistical exponential family with sample space
the realized values and probabilities the empirical proportions, the existence and uniqueness
of the MC-MLE can be understood:
Result:
Let CO be the convex huI1 of sampled sufficient statistics. In practice, there are two
situations:
1 ~t(XobS,~,n,e`~E int(CO) C int(C): The MC-MLE exists and is unique. It is found as the unique
maximum of the MCMC likelihood.
2) to ~ ~ int(CO) but is in int(C): The MC-MLE does not exist, even though MLE
does.
3) t~xO,~Se,`'e`, ~ ~ int(C): Neither the MC-MLE nor the MLE exists.
This result explains why attempts to calculate MC-MLE estimates for social network models
fail. If the model used to simulate the graphs is not close enough to produce realizations that
cover the observed values of the statistics, the MC-MLE will not exist even in cases where
the MLE does. This behavior is quite common. As we shall see in Section 3, it is magnified
by properties of commonly used models that do not place probability mass broadly enough.
In sum. the MC-MLE may not exist for at least two reasons. First, the MLE itself may not
exist (in which case neither does the MC-MLE). Second, it is difficult to specify parameter
values for commonly used models to produce realizations that cover the observed values of
the network statistics. In particular simulations in Handcock (2000) show that simulations
based on the maximum pseudolikelihood estimates usually do not produce realizations
DYNAMIC SOCIAL NF:TWORKMODE:LING AND ANALYSIS
237
.
sufficiently similar to those of the true values to support MC-MLE estimation. These
properties of the models are very similar to those for spatial point processes considered in
Geyer (1999~.
3.5. Bayesian models and inference for random graph models
In random graph models of the form (1) the maximum likelihood estimator Is not necessary
optimal and, indeed. any point estimator of ~ can be very poor. As such we need to move
away from point estimators of ~ and to frameworks for inference that naturally allow the
uncertainty about the value of ~ to be expressed.
We do this by supplementing likelihood-based inference with the Bayesian paradigm, mainly
as it provides an elegant way of incorporating parameter uncertainty into the final inference
and the incorporation of expert knowledge when it exists (German, Carlin, Stern and Rubin
1995~. Inference with the Bayesian paradigm, implemented via the now standard Markov
Chain Monte CarIo (MCMC) methods can address, and even solve, many very difficult
inferential problems. often making it the only realistic option.
Bayesian approaches to random graph models for social networks have received little
attention (Wang and Wong 1987~. We are in the process of developing Bayesian
methodology for random graphs paying particular attention to the specification of prior
distributions for the parameters that are meaningful for social networks. Under a Bayesian
formulation. let pried be an arbitrary prior distribution for P. The posterior distribution of
~ is then pr(~.;X =x)= prkf;J)P~(X =x)/m~x) where mix)= ~pr(~)P(,fX =x)~¢ can tee
interpreted as the apriori probability that X = x pr(~;.X = x) captures our knowledge about
~ after the observed data (i.e., that is X = x ~ has been taken into account.
Note that pried may be chosen to restrict the parameter space to a subset of (9 by placing
zero prior mass on the excluded values. Under these circumstances the posterior mass on
these same values will always be zero regardless of their likelihood under the model and data.
Computationally, Bayesian inference for an arbitrary prior is straightforward if MCMC
simulation of the process is available. Conceptually this can be achieved by sampling a value
Am from the prior distribution for 0, proof, and then sampling Xm from the distribution
P`,,,, (X = x) using MCMC simulation. If this is repeated until Xm equals the observed graph
X`,b5 then the corresponding value of Em, Bm (say), is a random sample from the posterior
prig; X = x,,b5) . By repeating this process, a large sample of values of ~ from
prim; X = Xobs) cam be generated. The posterior mean of ~ can then be estimated by the
mean of the sample. The posterior density corresponding to Pratt.; X = xO','~ can be estimated
by a non-parametric density estimate based on the sample. Note that this applies even when
~ is multivariate, and so the posterior dependence between the components of 6 can also be
determined. There are alternative computational methods that are vastly superior in terms of
computational efficiency to the above al ,onthm. These alternatives will almost always be
used in practice. [Iowever, the above algorithm does show how Bayesian inference is
achieved usin;, MCMC simulation.
The definition and properties of families of conjugate priors for exponential families has been
addressed in Barndorff-Nielsen (1978), Diaconnis and Ylvisaker (1979) and. more recently,
238
DYNAMIC SOCIAL NETWORK MODELING AND ANALYSIS
GutieITez-Pena and Smith (1997). Of the many possible families possible, one of the most
richest classes is discussed by Gutierrez-Pena and Smith (1997). Let g(v, y) be the (q + 1)
parameter exponential family with distributions
exp Evri9 + W(~)]
Pr(~;v,y)= Pe(~),Y>0
c(y, v)
Thus g(v, A) has canonical statistics ~ and -() and is minimal unless ~(~) is a
linear function of B. Here ~f( ~ is a prespecified function. The hyperparameters v and y
specify. respectively, the location and scale of 0. If (I) =; 4 then this distribution is proper if,
and only if, fly is in the interior of the convex hull of the sample space of t(X) and A> 0.
Suppose the prior for ~ is chosen to be gin, y) then, formally, the posterior for 0, having
observed X = x, is go + text, A+ 1~. That is,
t(V + t(X))T ~ + ~ y + 11~( o) ]
This family can be generalized to a p-vanate y. Let g~v,y) tee the (quip)
parameter exponential family with distributions:
Prig; v by = eXP :V 0 + Y74v(~]
cry, v)
oe 0, y! >O,...,yp > e
Here Ivy ~ is a prespecified p -vanate function. Thus gin, y) has canonical statistics
~ and -me) and is minimal unless some component of the Ivy ~ is a linear function of 0.
Suppose the prior for ~ is chosen to be gin, y) then, formally, the posterior for 6, having
observed X = x, is go + text, A+ l).
While inference for this family is quite tractable it is unclear under which circumstances the
prior can be chosen to realistically represent knowledge about B. It is more natural to choose
the prior to reflect knowledge we do have and which is often quite strong. Examples include
selectivity of graphs, degeneracy and stability. For these cases the prior family is unlikely to
be conjugate.
4. Iden~ability, Degeneracy and Stability for social networks
models
The research reviews in the previous sections has allowed progress to be made in both the
estimation and simulation of random graphs models. This work, and that of Besa~ (2000), has
allowed provided insight into the properties of random graphs models. Two properties of
random graph models that have a big impact on practice are degeneracy and stability. This
builds on ideas of Ruelle (1969), Strauss (1986), Geyer (1999) and Baddeley (1999), Section
4. This work is in statistical physics and spatial point process theory.
The issue of what makes a useful model for a social network is a complex one. All models
are representations of the manifestations of the social processes that produced the social
network. Some models represent the proximate mechanisms of the formation of ties. This can
be done by representation the temporal dynamics of the social process that produced the
social network, or by direct representation of the network itself through appeal to concepts
like structural balance and transitivity. Alternatively models may only aim at describing the
probabilistic structure of the social ties. In all cases the models will be idealizations and
simplification of the actual processes. Ultimately the answer to the question of what is a
DYNAMIC SOCIAL N~:TWORKMODE~G ED ANALYSIS
239
useful model is fundamentally dependent on the purpose for which the model is intended.
Simple models are often adequate, and even desirable, if there is a simple objective (e.g.,
what is the density of ties in the network). For other purposes, such as understanding the
dynamics of the spread of STDs over networks, more sophisticated models are necessary.
Here we consider a basic property that a useful model should have. Broadly speaking, useful
stochastic models place a significant proportion of their probability mass on graphs that have
high probability of being produced by the underlying social process. We define a random
graph model to be degenerate if the model places almost all its probability mass on a small
number of seraph configurations inX . Hence degeneracy of a graph model occurs when the
model places disproportionate probability mass on only a few of the possible graph
configurations. A common case is where the distribution places almost all its mass on the
empty graph (i.e., Xi, = 0 Vi, j), andlor the complete graph (i.e., Xjj = 1 Vi, j). Such models
are not useful for modeling networks as almost all realizations from these models will be
empty or full. In addition is such a model is used for simulation and MCMC likelihood
inference the approximations to the tme model will be very poor.
4. Discussion and future work
While estimation techniques are often of little interest to non-statisticians, this case is an
exception. The complete enumeration of all graphs required by the denominator in (1 ) makes
simple maximum likelihood estimation impossible for graphs larger than about 30 nodes. In
early applications a form of pseudo-likelihood was used to solve this problem (Besag 1974,
Wasserman and Pattison 1996~. More recent approaches, however, employ Markov Chain
Monte Carlo (MCMC) methods (Geyer and Thompson 1992~. This is particularly interesting
for our purposes because MCMC methods effectively simulate the network over the space of
possible graphs in order to maximize the likelihood. One can, however' just as easily use the
MCMC algorithm to simulate the network given the parameter estimates, and this provides
the solution to the problem of linking network data to the network simulation.
This paper address the question of what is a good model and in doing, so helps resolve the
dilemma lack of convergence when estimation or simulating using, MCMC. Many of the
models that are proposed in literature suffer Tom degeneracy. Further algorithms used for
inferential purposes are often inadvertently based on degenerate forms resulting in inferential
degeneracy. One implication of these results is that the effective parameter space of
exponentially parametenzed random graph models is bounded and a small subset of the
theoretical parameter space. The Bayesian framework for inference promises to be very
powerful in social network modeling. It facilitates the propagation of parameter uncertainty
into the final inference and allowing the incorporation of expert prior knowledge when it
exists.
240
DYNAMIC SOCKS NETWO~MODEfING ED ANALYSIS