Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Statistical Models for Social Networks: Inference and Degeneracy Mark S. Hancock Department of Statistics Center for Statistics arid the Social Sciences University of Washington Box 354322 Seattle, WA 98195-4322, USA January ~ 8, 2003 Abstract This paper presents recent advances in the statistical modeling of random graphs that have an impact on the representation of social networks. We also consider issues related to the estimation of random graph models. For concreteness the focus is cross-sectional social networks. Statistical exponential family models (Wasserman and Pattison, 1996) are a generalization of the Markov random graph models introduced by Frank and Strauss (19861' which in turn derived from developments in spatial statistics (Besag, 1974~. These models recognize the complex dependencies within relational data structures. To date. the use of stochastic graph models for networks has been limited by three interrelated factors: the complexity of realistic models, lack of use of simulation studies, and a poor understanding of the properties of inferential methods. We discuss these factors and related issues of the degeneracy of inference for commonly promoted models. As a cornerstone of this development we present a Markov Chain Monte Carlo (MCMC) algorithm for general random graph models. We also review the role of these MCMC algorithms in simulation, likelihood-based inference, identifying degeneracy of inference, and Bayesian formulations. KEYWORDS: Random graph models; Markov Chain Monte Carlo; Bayesian statistics. I. Introduction Networks are a form of "relational data". Relational data arise in many social science fields and graph models are a natural approach to representing the structure of these relations. In these applications, the nodes usually represent people, and the edges represent a specified relationship between the people. This framework has many applications including, for example, the structure of social networks, the behavior of epidemics, the interconnectedness of the WWW, and lon=-distance telephone calling patterns. We consider stochastic models for such graphs that form a statistical exponential family. This class has been referred to as the " pi " class of models in the psychology and sociology literatures (Wasserman and Pattison, 19961. Given their general nature and applicability, we shall refer to them simply as random graph models. A much studied sub-class of models are the Markov random graph models introduced by Frank and Strauss (19861. These models DYNAMIC SOCIAL NETWORK MODELING ED ^^YSIS 229

230 attempt to model the stochastic mechanism that produces the social ties and so represent the complex dependencies so induced. There is a large social network literature on this area which we do not review here. See Wasserman and Faust (1994), and Pattison and Robins (2000) for detailed information. The topic has connections to a broad array of literatures and here we emphasize the links to spatial statistics, statistical exponential families, log-linear models and statistical physics. The exploration of the properties of graph models has been limited by three factors. First the complexity of realistic models has limited the insight that has been gained by analytical methods. Most analytical work has focused on simple one or two parameter models with independence between the dyads (Wasserman and Faust 1994, Frank 1997~. Second statistical methods for the stochastic simulation from general random graph models have not existed. Because of this the properties of general models (e.g., range of graphs represented, dependence among the parameters) can not be explored though simulation studies. Third the properties of statistical methods for estimahn:, the parameters based on observed networks are poorly understood (we consider inference in Section 3~. Hence the range of parameter values relevant to real networks is largely unknown. In this paper we improve understanding of the nature and properties of graph models important to social networks by further considering methods for the stochastic simulation of, and inference for, random graphs. We address one aspect of modeling, that has been a persistent obstacle in the work in this area: inferential degeneracy. Many previous attempts to develop MCMC based estimation for Markov models have found that the algorithms nearly always converge to degenerate graphs - graphs that are either empty or full, or the algorithms do not converge consistently. Using statistical exponential family theory, we show that this is a function of the form of the model and algorithm used. In the next section we review the statistical theory of social network models expressed in exponential form. We also consider MCMC algorithms to simulate graphs from these models. In Section 3 we consider forms of inference for the parameters of the models with focus on likelihood-based forms and Bayesian methods in particular. In Section we consider model degeneracy and some possible solutions to it. 2. Review of theory Let the random matrix X represent the incidence matrix of an undirected graph onn individuals. Thus X is an n x n symmetric matnx, each of whose entries is a Bernoulli random variable; we assume further that the diagonal elements of X are 0, which is to say that self-partnerships are disallowed. Suppose that X denotes the set of all possible graphs on the given ': individuals. The multivanate distribution of X can be parameterized in the form: P,9(X=x)= Pow' ~ xeX <~' where ~ ~ ~ c i 4 iS the model parameter and text is a q -vector of statistics based on the graph x. (Wasserman and Pattison, 1996~. The denominator c(~) is the constant that ensures the distribution sums to one: c(~) = ~ exp :RTt~y)] . Note that X contains at most .~ N = non -1~/2 graphs, and C) = l0: c(~) < oo3 . The dimension of ~ is at most 2N _ 1 (for the "saturated" model). although is typically much smaller than this. The parameter and DYNAMIC SOCIAL NETWORK MODELING AND ANALYSIS

statistics that correspond to a Oven model for X can be identified usin;, the Hammersley- Clifford theorem (Frank and Strauss 1986~. Conversely, each choice of parameter and graph statistics text specifies a model for X . For example, if the dyads X~j are mutually independent the model can be wntten: logEP'7(X = x)] = ~ Pijxij - K(~), X~ X where icj all = loait[P~(X~j = xij )] and K(~) = log[c(~)] = ~ logy ~ exp(6jj)] . Thus q=N and the red elements of text are just xjj . This model is often called a Bernoulli graph. In the special case where the dyads have a common probability, lo~tP6,fX = x)] = ot(X)- 7~) where q = 1, text is the number of partnerships in the graph. and ~ can be interpreted as the common lo=- odds of partnership formation within a dyed. The mathematical properties of this homogeneous Bernoulli graph model has been extensively studied (see, e.;,., Renyi and Erdos 1959), although its simplicity and homogeneity make it less usefu] as a realistic model for social phenomena. An alternative specification of the model (1) clarifies the interpretation of the parameters. Let X,,. = 1' X,;,: kl ~ id, k < Ill ~ x`,. = ,f x,,': kl ~ id, k < i}, x`,. = {x`,. and xjj = l}, and x`,. = {x`, and x,; = 0} . Thus, X`,: represents all elements in the graph excluding Xjj, while x`, and x`, represent the graph with the xij equal to 1 and 0, respectively. The full conditional distnbutions of Xij are given by lo~t~pr(Xij = 1 ~ X,,. = x`,.~] = B7 3(x`,) xe X (2) where 3(x,~.~=~(x,,.~-t~x,,) (S~aussandIkedal991~.Thestatistic §(x`,) is the changein the graph statistics when xij changes from O to 1. Hence ~ can be interpreted as the increase in the conditional log-odds of a partnership between individuals i and j induced by the formation of the partnership and conditional on all other ties remaining unchanged. In the homogeneous Bernoulli graph, for example. ~ is the common log-odds of individual partnership formation. Holland and Leinhardt (1981) appear to be the first to propose log-linear models for social networks. Suppose that the dyads are independent with | my if x=l,y=i Pr(X`; =X,Xj, = y)=5 a;j if x=O,y=1 (3) l nij if x=O,y=o Thus each dyed can have its own probability distribution. Thus the mode] can represent arbitrary attractiveness between individuals and degree of reciprocity within relationships. However the dyed independence implies specific transitivity and higher-order interactions. The model can be expressed in log linear form as: log [Pr(X = x)] = ~ pijx~jxji ~ ~ 9,j x;,K(~) XE X i<j ,~j They called this the p, model. Based on developments in spatial statistics (Besag 1974), Frank and Strauss (1986) extended this work and introduced forms of dependence with Markov structure. Further extension were made by Wasserman and Pattison (1996) to incorporate actor attributes (Pattison and Wasserman 1999) and to allow explanatory and response variables (Robins, Pattison and Wasserman 1999), resulting in social influence (Robins, Pattison and Elliott 7000) and social selection models (Robins, Elliott and Pattison, 20001. These generalizations essentially allow D YNAMIC SOCIAL f NETWORK MODELING ED ~~ YSIS 231

232 analysis of graphs with "colors" on the nodes, the coloring indicating the attribute. Recent developments have included new forms of dependency structures, to take into account social settings, and on the other hand a relaxation of Markovian dependence assumptions, allowing investigation of longer range configurations, such as longer paths in the network or larger cycles (Pathson and Robins, 2000~. Models for bipartite (Faust and Skvoretz, 1999) and tripartite (Mische and Robins, 2000) graph structures have also been developed. The incorporation of attribute information into the random graph mode] is straightforward. Suppose we wish to incorporate p exogenous covariates represented by a n x n x p array of attributes W where the ijkth element is the covar~ate for the prh attribute on the id dyed. Note that this allows the covanates to be attributes of the pairs of individuals (e.g., age difference) as well as specific to the individual alone. The graph statistic text in (1) is then replaced by t(x,W), indicating that the statistics depend on the attribute infonnation in addition to the relationship information. In this sense, the statistic tf ~ can be defined to reflect the attribute information. We consider an implementation of this in the next section. 2.1. Cross-sect~onal modelsfor random graphs The most common form of random graph models exhibit Markov dependence in the sense of Frank and Strauss (1986~. For these models, dyads that do not share an individual are conditional independent: an idea analogous to the nearest neighbor concept in spatial statistics. Typically a homogeneity condition is added: all isomorphic graphs have the same probability under the model. Frank and Strauss (1986) show that homogeneous Markov Graphs are exactly those having the triangle parametenzatior?: q = n, Be I) =; N and I, (x) = k! an, x~,i, ...x;`y' k = I, ..., n~ en (x) 6 ~ XijXjkXki where [k (X) is called a k - star and In (x) is a count of Me complete mads. An equivalent form is the degree distribution parametenzahon: d'` (x) = the proportion of individuals with exactly k relationships k = 1, ..., n - 1 t'' (X) = 6 ~ x~jXj`Xk'' in which d~ (x) counts the proportion of individuals with degree k. The degree distribution parameterization has the advantage that it is directly interpretable in terms of concurrency of partnerships (i.e. dam) for m > 0 counts the number of individuals with m concurrent partners). A form of model that may realistically represent dependent social processes, reflect the . ,~ . . impact or covanates 1S: exp ~xTZ,B + ~~0 ~; dl. (x) + 0tt(x)] c(a,,B, B) x ~ X (4) where x is the N -vector of the unique elements of X, Z = {Z;j }N is a matrix of (exogenous) covanates on the id"' dyed, text a q -vector of additional network statistics, ~ p -vector of regression parameters. ~ g -vector of degree parameters, and ~ a q -vector of DYNAMIC SOCIAL NETWORK MODELING AND ANALYSIS

network structure parameters. This mode] is an extension of equation (1) and contains a number of special cases that have been considered in separate literatures. First the homogeneous Markov graphs of Frank and Strauss (1986) correspond to the case where there are no covariates and the additional network statistic is the count of complete triads. Second, if there are no covanates and no additional network stausucs the model corresponds to the random graphs with arbitrary degree distributions considered in Newman, Stro~,atz and Watts (20011. This model assumes that all graphs with the same degree distribution are equally likely. On particular interest are the various "scale-free," preferential attachment and power-law models popular in the physics literature (see. e.:,., Newman 2002~. Third, if there are no decree distribution parameters and no additional network statistics Me mode] corresponds to the random graphs with independent dyads. The model Shll allows differential activity levels and assortative missing via the covanate parameters. Such models have been traditionally specified as log-linear models of mass action. See Morns. Goodreau and Koehly (2003) for an examination of the connections between the two. As an example, consider a random graph model for a social network based on heterosexual partnerships. An example of a cross-sectional mode] that incorporates captures both concurrency and the assortative mixing based on race is: P''kX = x) = exp ~ ~ exude (x) + prace(x)] / C((X9 p, 6) X ~ X k-=1 Po(X =x)=0 xe X/X where race(x) is the proportion of partnerships where both partners say they are of the same race, Xh is the subset of graphs in X in which all ties are heterosexual. and He j 4 . The parameter ~x, represents the propensity to form monogamous relationships. The parameters ~x~, ..., ~x: represent the propensity to form concurrent partnerships with 2, 3 and 4 partners, respectively. Hence positive values for the latter parameters indicate the propensity of the graph toward higher degrees of concurrency. The parameter ~ represents the propensity for partnerships to be formed between individuals of the same race. Note that the seraph and model reflect the heterosexual nature of the graph (i.e., Xi, = 0 unless i and j are of opposite sex). If A,..., a~ are zero the dyads are independent and the mode] is simply assortative mixing by race. The model can then be re-wntten as a contingency table form (Morris, Goodreau and Koehly 2003~. If ,6 is zero then there are no preferences for partnerships based on race. We consider alternative parameterizations for this model in Section 3. These models have a form that is flexible enough to capture the structure of many real networks. However to be practically useful we need to be able to simulate graphs with these distributions, infer their parameters from real data and measure their overall and relative goodness-of-fit. We now turn to each of these topics. 2.2. Simulating random graph models The mathematical form of the models (1) and (4) allow graphs to be generated from them using Markov Chain Monte Carlo (MCMC) algorithms. Indeed MCMC algorithms have been much studied and are a nature way to simulate social networks (e.g. Gilks et. al. 1996, DYNAMIC SOCIAL N~TWORKMODE~ING~D ISIS 233

Newman and Barkema 19991. The basic idea is to construct a Markov Chain onX with P(X = x) as the equilibrium distribution. This is operational~zed by starting from a graph In X, and making a large number of Markov transitions until approximate convergence to P(X = x) is reached. Subsequent transitions are sampled and form a (dependent) sample from the desired model. For details, see the extensive literature cited in the above books. Many chains are possible with vastly different statistical properties. However convergence is "ensured" under mild conditions on the Markov Chain (irreducibility and aperiodicity). For the social network representation (1) this process has been studied by Crouch, Wasserman and Trachtenberg (1998), Conander et. al. (1998), and Snijders (20029. The full-conditional MCMC for the model (1) and (4) has a simple form and is motivated by the auto-logistic representation (21. In this algorithm each dyed is updated separately based on the conditional probabilities given in (2~. This so-called "Gibbs samplin;," or "heat bath.' algorithm chooses the dyads randomly, sequentially, or using some mixture of the two. Each update requires the difference S(x,, ~ = tax,,. ~ - text, ~ to be detennined. The speed of the calculation of Six,, ~ is art important factor in the computational quality of the aIgonthm (i.e., speed of convergence to P(X = x), the relative frequency of producing each graph in X ). Alternative Metropolis algorithms propose transitions from curry t to XP~OPOSC~J based on the unconditional probabilities require the determination of t~xp~opoSca, ~ - t~xCur~er'`, ~ . These tend to perform better for the same amount of computation. More general Metropolis-Hastings algorithms choose XprOpOse`~ from an ancillary stochastic process dependent on XCUrren! and aimed at either focusing the transitions or spreading them more broadly through X . The behavior of these aIgonthms is very dependent on the choice of statistics text . The papers that use MCMC algorithms to simulate social network models report difficulties in convergence to realistic distributions (Crouch, Wasserman and Trachtenberg 1998 Conander et al 1998. and Snijders 2002). A typical occurrence is for the algorithm to produce graphs that are complete, empty, or else are extreme in related ways. Conander et. al. 1998 consider al~onthms that condition on the number of ties in the graph. This ensure that the realizations are not complete or empty graphs. However, in most circumstances the density of the graph is a product of the social process that produced it and can not be assumed to be known in advance. SnijUers 2002 reports on a number of related odd properties. In some cases the sequences of realizations transition quickly between very different graphs after periods of minor variation. In Section 3 we indicate why these algorithms behave in this way, how it relates to the model and how it can be resolved. 3. Inference for random graph models Statistical methods for estimating the parameters in a random seraph model are underdeveloped. Developing inference with a likelihood framework has the advantage of being able to draw upon a statistical theory for closely related models in statistical physics and spatial statistics (Besag 1975, Geyer 1999~. This framework is also compatible with the Bayesian paradigm (German, Carlin. Stern and Rubin 1995~. The likelihood framework makes available exploratory graphical tools useful for inference about the underlying random field (Handcock, Meier and Nychka 19941. These tools, for example, can identify when a maximum likelihood approach is insufficient for inferential purposes. 234 DYNAMIC SOCKS NETWO=MODEL~G~D TRYSTS

Major barrier to the application of random graph models to social networks has been the lack of a sound statistical theory to evaluate how closely the models capture the structure in the observed graphs. This has two dimensions to it: the degree to which the graph structure of the models matched that in the data, and second the degree to which disease propagation through the model matches that of the data. The likelihood-based approach allow one to measure the first dimension of "goodness-of-f~t" using Monte-Carlo p-value statistics and Bayes Factors under the Bayesian paradigm (Gelfand 19961. 3. 1. Likelihood inference for random graph knowers In principal, likelihood inference is straightforward for exponential random seraph models. In this section we describe the most fundamental ideas and why direct computation is difficult. The log-likelihood for model (1) is: L(~;x) - log[P6,fX = x)] = B7t~x)- Axe X where K(~) = logy. Likelihood inference is difficult as direct evaluation of L(~;x) requires calculating C(~) = ~ expEBTt(X)] .~e x For even simple and modest models this can be problematic. Table 1 Eves the number of elements in X (i.e., 2N ~ as the number of actors grows. Table 1: Number of Graphs for Different Numbers of Actors Number of Actors. Number of Elements 3 6 in X 8 32,728 1.24 x 1027 2.307x109° 10 25 3.2. Alternative inference for random graph models Clearly for realistic graphs direct enumeration is not feasible. Many alternatives have been proposed. Frank (1971) and Frank and Strauss (1986) consider (linear) approximations to the cumulant generating function: as a means to solve the likelihood equations C(!V) = E[log(ivt(X))] E,3 [ t( X )] = t(Xobsen cd ) ~ ~ ~ ~ 1 This approach is generally difficult to apply to general multiparameter models unless supplemented by a means of simulation from the same network model (Coriander et al 19981. One commonly used method is pseudolikelihood originally motivated by, and developed for, spatial models by Besa=, (19751. The idea is to use an alternative local version of the DYNAMIC SOCIAL NETWORK HODELI?JG AND ANALYSIS (5) (6) 235

likelihood function defined the pse~`a?olikelihood'. The pseudolikelihood for model (1) is algebraically identical to the likelihood for a logistic regression of the unique elements of X on the design matrix with i th row Six,,. ). The value of the maximum pseudolikelihood estimator for social networks can then be expediently found by using logistic regression as a computational device. Explicitly, the value of the maximum likelihood estimator for the logistic regression will also be the maximum pseudolikelihood estimator. Note, however, that the other characteristics of the maximum likelihood estimator do not necessarily carry over. In particular the standard errors of the estimates of ~ from the logistic regression will not be appropriate for the maximum pseudolikelihood estimator. The statistical properties of pseudolikelihood estimators for social networks are only partially understood and are discussed in Handcock (2000~. In the remainder of this paper we focus on likelihood based estimation. 3.3. Existence and uniqueness of MLE The maximum likelihood estimator (MLE) for ~ is: ~ = argmax,'~~,~,L(~;xObse,ve~ ~ Many properties of the MLE can be derived from statistical exponential family theory. Let C be the closed convex support of t(X) - the convex hull of the discrete support points. Denote the interior of C by ins(C) and the boundary by act Result: a) If (I) is open, the MLE exists if, and only if, t~xobsc,~'e`;~ ~ is inside the interior of C. b) If it exists, it is unique. In addition, when it exists, it can be found as the unique solution of (6) or equivalently as (7), the unique local minima of (5~. c) As the support of t(X) is discrete, and l~(xObse,.ve`,,) is in it, a necessary and sufficient condition for the MLE not to exist is that Ax ~ ~ falls on the boundary calf the Not and that boundary be part of C. ~ A ~ov.~ervea ~ ~ ^^~ ~ ~^ ma_ -~^ - The result is a fundamental property of statishcat exponential families (Ban~dorff-Nielsen 19781. In practice this means that attempt to numerically maximize the likelihood leads to unbounded estimates when the observed graph has statistics falling on the boundary of C. This typically means the optimization algorithm does not converge. Simulation studies in Handcock (2000) show that this is common for realistic models. If t(xobs.ened ) falls on the boundary of C it is still possible that subsets of 19 have MLEs that exist. Let t(Xobserved ) (ta (Xobsc~ne~d )' Observed )) and ~ = (RO ~ P~) be similar partitions of t(Xobsen.~,d ) and v. V110~, 11111" ~VIt"~Vl1~' 11 `~^he"m,"~) is on the bounds of C Ed tax ~ . ,) is in the ~O TInA^'~;1~ remits; ;( ~ {~ ;~ ~~ +1~ I.. ~¢ rat _ , _ v ~ TV. rim cola ~ relative interior of the convex hull of {t(X): tU(X) = ta(Xo~,Se,`.ed )} then the MLE of 0~ exists and is the unique local minima of the conditional likelihood equation: CLIP; x, ~log [Pi, f X = x ~ ta (X ~ = to (Xobs~ve`~ By] = 6' ~ (X) K(~' ta (Xvbser,~ )) 236 D YNAMIC SOCIAL f JETWORK HODE~G ED ^^ YSIS (7)

X ~ X hi (X ) = Ice (subserved ) 3.4. Likelihood inference based on MCMC algorithms The difficulties in determining the normalizing function for the model (1) can be overcome using MCMC algorithms. The approach is based on the following idea: If a large number of simulations from a social network in the same ballpark as that of the observed network can be generated there these can be used to approximate the normalization function to a desired accuracy. Estimate the "population" value: C(~) = ~ expL07t~x) xeX by the weighted "sample'' mean: c(~=~X~ M sampled fray · expels t~y)]wty) where the sample weights whys are internally normalized to sum to unity. We can use the MCMC approach of Section 2.2 to simulate a sample of graphs from a model similar to the true, but unknown, model. The approximation to the likelihood using this approach is called the MCMC likelihood. The MLE for the data is them approximated by the maximum of the of the MCMC likelihood (the MC-MLE). This idea has been made precise and studied by Geyer and Thompson (1992~. They show the MC-MLE converges to the true MLE as the number of simulations increases. The procedure also produces estimates of the asymptotic covariance matrix, the size of the MCMC induced error, and other related quantities. As the sampled graphs fond the basis of a statistical exponential family with sample space the realized values and probabilities the empirical proportions, the existence and uniqueness of the MC-MLE can be understood: Result: Let CO be the convex huI1 of sampled sufficient statistics. In practice, there are two situations: 1 ~t(XobS,~,n,e`~E int(CO) C int(C): The MC-MLE exists and is unique. It is found as the unique maximum of the MCMC likelihood. 2) to ~ ~ int(CO) but is in int(C): The MC-MLE does not exist, even though MLE does. 3) t~xO,~Se,`'e`, ~ ~ int(C): Neither the MC-MLE nor the MLE exists. This result explains why attempts to calculate MC-MLE estimates for social network models fail. If the model used to simulate the graphs is not close enough to produce realizations that cover the observed values of the statistics, the MC-MLE will not exist even in cases where the MLE does. This behavior is quite common. As we shall see in Section 3, it is magnified by properties of commonly used models that do not place probability mass broadly enough. In sum. the MC-MLE may not exist for at least two reasons. First, the MLE itself may not exist (in which case neither does the MC-MLE). Second, it is difficult to specify parameter values for commonly used models to produce realizations that cover the observed values of the network statistics. In particular simulations in Handcock (2000) show that simulations based on the maximum pseudolikelihood estimates usually do not produce realizations DYNAMIC SOCIAL NF:TWORKMODE:LING AND ANALYSIS 237

. sufficiently similar to those of the true values to support MC-MLE estimation. These properties of the models are very similar to those for spatial point processes considered in Geyer (1999~. 3.5. Bayesian models and inference for random graph models In random graph models of the form (1) the maximum likelihood estimator Is not necessary optimal and, indeed. any point estimator of ~ can be very poor. As such we need to move away from point estimators of ~ and to frameworks for inference that naturally allow the uncertainty about the value of ~ to be expressed. We do this by supplementing likelihood-based inference with the Bayesian paradigm, mainly as it provides an elegant way of incorporating parameter uncertainty into the final inference and the incorporation of expert knowledge when it exists (German, Carlin, Stern and Rubin 1995~. Inference with the Bayesian paradigm, implemented via the now standard Markov Chain Monte CarIo (MCMC) methods can address, and even solve, many very difficult inferential problems. often making it the only realistic option. Bayesian approaches to random graph models for social networks have received little attention (Wang and Wong 1987~. We are in the process of developing Bayesian methodology for random graphs paying particular attention to the specification of prior distributions for the parameters that are meaningful for social networks. Under a Bayesian formulation. let pried be an arbitrary prior distribution for P. The posterior distribution of ~ is then pr(~.;X =x)= prkf;J)P~(X =x)/m~x) where mix)= ~pr(~)P(,fX =x)~¢ can tee interpreted as the apriori probability that X = x pr(~;.X = x) captures our knowledge about ~ after the observed data (i.e., that is X = x ~ has been taken into account. Note that pried may be chosen to restrict the parameter space to a subset of (9 by placing zero prior mass on the excluded values. Under these circumstances the posterior mass on these same values will always be zero regardless of their likelihood under the model and data. Computationally, Bayesian inference for an arbitrary prior is straightforward if MCMC simulation of the process is available. Conceptually this can be achieved by sampling a value Am from the prior distribution for 0, proof, and then sampling Xm from the distribution P`,,,, (X = x) using MCMC simulation. If this is repeated until Xm equals the observed graph X`,b5 then the corresponding value of Em, Bm (say), is a random sample from the posterior prig; X = x,,b5) . By repeating this process, a large sample of values of ~ from prim; X = Xobs) cam be generated. The posterior mean of ~ can then be estimated by the mean of the sample. The posterior density corresponding to Pratt.; X = xO','~ can be estimated by a non-parametric density estimate based on the sample. Note that this applies even when ~ is multivariate, and so the posterior dependence between the components of 6 can also be determined. There are alternative computational methods that are vastly superior in terms of computational efficiency to the above al ,onthm. These alternatives will almost always be used in practice. [Iowever, the above algorithm does show how Bayesian inference is achieved usin;, MCMC simulation. The definition and properties of families of conjugate priors for exponential families has been addressed in Barndorff-Nielsen (1978), Diaconnis and Ylvisaker (1979) and. more recently, 238 DYNAMIC SOCIAL NETWORK MODELING AND ANALYSIS

GutieITez-Pena and Smith (1997). Of the many possible families possible, one of the most richest classes is discussed by Gutierrez-Pena and Smith (1997). Let g(v, y) be the (q + 1) parameter exponential family with distributions exp Evri9 + W(~)] Pr(~;v,y)= Pe(~),Y>0 c(y, v) Thus g(v, A) has canonical statistics ~ and -() and is minimal unless ~(~) is a linear function of B. Here ~f( ~ is a prespecified function. The hyperparameters v and y specify. respectively, the location and scale of 0. If (I) =; 4 then this distribution is proper if, and only if, fly is in the interior of the convex hull of the sample space of t(X) and A> 0. Suppose the prior for ~ is chosen to be gin, y) then, formally, the posterior for 0, having observed X = x, is go + text, A+ 1~. That is, t(V + t(X))T ~ + ~ y + 11~( o) ] This family can be generalized to a p-vanate y. Let g~v,y) tee the (quip) parameter exponential family with distributions: Prig; v by = eXP :V 0 + Y74v(~] cry, v) oe 0, y! >O,...,yp > e Here Ivy ~ is a prespecified p -vanate function. Thus gin, y) has canonical statistics ~ and -me) and is minimal unless some component of the Ivy ~ is a linear function of 0. Suppose the prior for ~ is chosen to be gin, y) then, formally, the posterior for 6, having observed X = x, is go + text, A+ l). While inference for this family is quite tractable it is unclear under which circumstances the prior can be chosen to realistically represent knowledge about B. It is more natural to choose the prior to reflect knowledge we do have and which is often quite strong. Examples include selectivity of graphs, degeneracy and stability. For these cases the prior family is unlikely to be conjugate. 4. Iden~ability, Degeneracy and Stability for social networks models The research reviews in the previous sections has allowed progress to be made in both the estimation and simulation of random graphs models. This work, and that of Besa~ (2000), has allowed provided insight into the properties of random graphs models. Two properties of random graph models that have a big impact on practice are degeneracy and stability. This builds on ideas of Ruelle (1969), Strauss (1986), Geyer (1999) and Baddeley (1999), Section 4. This work is in statistical physics and spatial point process theory. The issue of what makes a useful model for a social network is a complex one. All models are representations of the manifestations of the social processes that produced the social network. Some models represent the proximate mechanisms of the formation of ties. This can be done by representation the temporal dynamics of the social process that produced the social network, or by direct representation of the network itself through appeal to concepts like structural balance and transitivity. Alternatively models may only aim at describing the probabilistic structure of the social ties. In all cases the models will be idealizations and simplification of the actual processes. Ultimately the answer to the question of what is a DYNAMIC SOCIAL N~:TWORKMODE~G ED ANALYSIS 239

useful model is fundamentally dependent on the purpose for which the model is intended. Simple models are often adequate, and even desirable, if there is a simple objective (e.g., what is the density of ties in the network). For other purposes, such as understanding the dynamics of the spread of STDs over networks, more sophisticated models are necessary. Here we consider a basic property that a useful model should have. Broadly speaking, useful stochastic models place a significant proportion of their probability mass on graphs that have high probability of being produced by the underlying social process. We define a random graph model to be degenerate if the model places almost all its probability mass on a small number of seraph configurations inX . Hence degeneracy of a graph model occurs when the model places disproportionate probability mass on only a few of the possible graph configurations. A common case is where the distribution places almost all its mass on the empty graph (i.e., Xi, = 0 Vi, j), andlor the complete graph (i.e., Xjj = 1 Vi, j). Such models are not useful for modeling networks as almost all realizations from these models will be empty or full. In addition is such a model is used for simulation and MCMC likelihood inference the approximations to the tme model will be very poor. 4. Discussion and future work While estimation techniques are often of little interest to non-statisticians, this case is an exception. The complete enumeration of all graphs required by the denominator in (1 ) makes simple maximum likelihood estimation impossible for graphs larger than about 30 nodes. In early applications a form of pseudo-likelihood was used to solve this problem (Besag 1974, Wasserman and Pattison 1996~. More recent approaches, however, employ Markov Chain Monte Carlo (MCMC) methods (Geyer and Thompson 1992~. This is particularly interesting for our purposes because MCMC methods effectively simulate the network over the space of possible graphs in order to maximize the likelihood. One can, however' just as easily use the MCMC algorithm to simulate the network given the parameter estimates, and this provides the solution to the problem of linking network data to the network simulation. This paper address the question of what is a good model and in doing, so helps resolve the dilemma lack of convergence when estimation or simulating using, MCMC. Many of the models that are proposed in literature suffer Tom degeneracy. Further algorithms used for inferential purposes are often inadvertently based on degenerate forms resulting in inferential degeneracy. One implication of these results is that the effective parameter space of exponentially parametenzed random graph models is bounded and a small subset of the theoretical parameter space. The Bayesian framework for inference promises to be very powerful in social network modeling. It facilitates the propagation of parameter uncertainty into the final inference and allowing the incorporation of expert prior knowledge when it exists. 240 DYNAMIC SOCKS NETWO~MODEfING ED ANALYSIS