Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 65
Network Science D Questionnaire Data In this appendix, the committee provides more detail on its analysis of responses to the questionnaire that underlies the results presented in Chapter 6, following the same order of presentation. After describing the questionnaire process and further characterizing the respondents, the committee draws on the responses to more fully characterize the research community’s approaches to the possible field of network science: both doubts about the existence of such a field and shared notions of what it would encompass. Although the full context from the corresponding sections of Chapter 6 is not repeated, brief summaries introduce specific details. The appendix also contains the analysis by Katy Börner, of Indiana University, of the social structure of network science revealed by the questionnaire responses.1 Network science has reached its present level of visibility from the convergence of two phenomena: the ever-rising importance of networks for the national well-being and security and the achievement of promising formal results (primarily from graph theory) relating aspects of network topology to network properties such as resilience. But long before this convergence, many disciplines studied phenomena arising from recognizable networks in their subject domains. While some of the studies inspired increased multidisciplinary research, their sheer diversity also raised significant doubts about the coherence and usefulness of an underlying “network science” that might investigate further substantive commonalities across the many specific uses of networks as we know them. If there are such commonalities, then there is great potential benefit in pursuing better theories, tools, and insights that address the network science core shared across so many critical domains, and such a pursuit becomes of pressing importance. The committee adopted an empirical approach to determining the status of this emerging field, if there is one, to assessing a common core for the disparate fields of application, and to identifying the key challenges the science might address. The committee’s approach was to solicit the insights of researchers from the communities doing work in the related domains. THE QUESTIONNAIRE PROCESS As noted in Chapter 6, the questionnaire text was developed by the committee through an iterative process, starting with the statement of task and continuing through committee discussion and beta-testing. The focus of the committee discussion is clarity and brevity in the questionnaire and consideration of the kinds of information that might reasonably be elicited from the research community. The beta-testing allowed committee members to assess the ease of understanding and responding to the questionnaire text. In its final form, the questionnaire addressed four broad areas: the respondent, the respondent’s work, the respondent’s view of the potential for network science to exist as a discipline, and an open-ended opportunity to provide further information. To avoid as far as possible prejudging the question of whether there exists a field of network science—and, if there is such a field, its nature—the questionnaire and the various solicitations of respondents intentionally did not define the term “network science.” The committee initially expected that this lack of definition might bring many requests for clarification from the research community; in practice, fewer than 1 percent of those solicited explicitly made such a request. Of those who responded to the questionnaire, 9 percent indicated that the term was unclear or had no well-defined core (see “Dissenting Voices,” below). The committee did not determine how many of those solicited decided not to respond because the term was not defined in advance. Content of the Questionnaire Box D-1 contains the complete online text of the resulting questionnaire, which was posted as a National Academies of Science (NAS) Web link on December 20, 2004. 1 Katy Börner, associate professor, Indiana University, “Mapping the expertise and social network of network science researchers,” briefing to the committee on April 13, 2005.
OCR for page 66
OCR for page 67
OCR for page 68
OCR for page 69
OCR for page 70
OCR for page 71
Network Science Soliciting Responses to the Questionnaire The goal of the ensuing solicitation was to reach as large, diverse, and representative a sample of the many relevant research communities as feasible within the study’s resources. Because both the full range of relevant communities and the populations of researchers within those communities could not be readily defined in advance, the primary process of choice was a snowballing outreach. To begin the process, 113 recognized researchers working on topics in candidate areas clearly relevant to the possible science of networks were sent e-mails asking them to complete the questionnaire. These solicitations briefly described the nature and purpose of the study, provided the URL (Uniform Resource Locator) of the on-line questionnaire, and invited the recipient to forward the announcement to other researchers who might be interested. The solicitation process was then continuously iterated, drawing on the responses to identify more people—collaborators, project principal investigators, and so on—to receive a solicitation. The snowballing process was stopped whenever it stepped outside the field of network science as indicated by the respondents—that is, if a respondent explicitly indicated that he or she was not working in “network science,” the respondent’s list of collaborators and principal investigators was not added to the pool of names. As of April 29, 2005, a total of 2,040 people had been directly contacted through the solicitation process. Some heuristic sanity checks were performed to catch hoax entries; only three of the responses appeared suspect, and their content did not affect the conclusions of this study. A variety of spot checks were performed. For example, Is the response internally consistent? Does it appear to name nonexistent or wildly irrelevant people, programs, or organizations? Does a respondent appear as an author in the technical literature (as indexed by Google Scholar Beta). However, such checks were limited and are indicative at best; a thorough screening analysis was not attempted. There are many inherent limitations to a snowballing process. For instance, poorly connected members of the underlying communities might be left out, or there could be under-or overrepresentation of communities or specific programs owing to differences in willingness to respond to such a questionnaire or to provide information that would allow further snowballing. In addition, the committee observed that a few highly connected people provided no information on collaborators or projects: Instead of listing names and projects, they sent replies such as “too many to list.” The committee was especially concerned about these outreach limitations, as the on-going questionnaire analysis quickly demonstrated that many target communities were weakly interacting. The committee followed several ancillary processes to offset the limitations of snowball instituted coverage by bringing in additional sources of names throughout the solicitation process: literature citation studies, sequential tracing of collaborative ventures, conference attendance, mailing lists, and personal interviews with the authors of recent books and reviews. The citation study and analysis looked at some selected researchers’ work and collected the names of coauthors, cited authors, and authors who cited those researchers’ work. A key goal of this analysis was to improve the coverage of subject fields that had not yet seen many questionnaire responses. Although resource limitations constrained the amount of citation analysis that could be performed, the amount that was done succeeded in introducing several hundred names that had not been uncovered by the snowballing process to that point. Spot checks suggest that at least several thousand additional names might have been produced by more intensive citation tracking.
OCR for page 72
Network Science In view of the methodological concern over potentially uneven community representation due to systematically varying response rates, it might be worth noting that (1) the number of new names provided by each respondent was not strongly dependent on his or her self-identified field of study and (2) the overall rates of response to the committee’s solicitations were not strongly dependent on the fields of study of the respondents who provided the names to solicit. Similarly, the number of new names provided by respondents was independent of whether the respondents were from the United States or not. Regardless of field or location, each respondent provided a mean of 2.8 names that had not been previously identified in the study. The committee recognizes that it cannot quantify the completeness of the resulting coverage nor the degree to which the responses are statistically representative of the underlying communities of researchers (see also the discussion of coverage saturation below). For this reason, some classes of analysis could not be reliably performed and are not addressed in this report: For example, the committee explicitly chose not to attempt to identify a top-100 list of researchers, programs, or institutions. Nonetheless, its analysis of the key responses relating to the existence and nature of a possible field of network science appears solid: The responses are stable across all responses obtained when they are partitioned by such factors as when in the solicitation process the response was received, whether the response was directly solicited or not, whether a solicitation was generated by snowballing or from the ancillary sources, which research communities the respondents self-identified as their own, and what country the respondents worked in. Some distinct differences appear between respondents who believe there is a field of network science and those who do not; these differences are described below. The committee is confident that the solicitation process, despite the multiple approaches and continued effort, did not saturate the population of researchers whose work touches on the potential field of network science. A variety of heuristic measures contribute to this confidence. The results from the limited citation analysis have been mentioned above. Another reason for the committee’s confidence is that new names (that is, names not previously encountered in questionnaire responses or ancillary sources) continued to be provided by successive increments of responses without letup until the end of the study (see Figure D-1). The latest responses provided essentially as many new names as the earliest ones; in other words, the empirical probability of a response-provided name falling outside the set of already-known names did not decrease as the number of responses grew from 50 to over 600. A similar conclusion is suggested by the fact that once a name is cited by a respondent, it is unlikely to be cited by any other respondent: 71 percent of cited names are never cited again. In short, there is no indication of saturation in the coverage, so one may conclude that the questionnaire solicitation process does not approximate complete coverage of those who would be interested. THE RESPONDENTS Over half of the responses (57 percent) came from people who had been directly invited during the snowballing process; the remaining spontaneous responses are believed to have largely been induced by individuals forwarding the solicitation note and by its dissemination in online mailing lists (see Figure D-1). The questionnaire did not ask how the respondent learned of the study, but this substantial proportion of spontaneous responses sheds light on both the limitations of the committee’s explicit snowballing and the effectiveness of using additional solicitation mechanisms. In aggregate, names of 2,374 distinct people were provided by these responses and the ancillary sources, although valid e-mail addresses were identified for only 2,123 of them. FIGURE D-1 New names by response ID.
OCR for page 73
Network Science Geographic Locales Questionnaire responses were received from 29 countries; the two most recurrent were the United States (497 responses representing 39 states) and Canada (23 responses representing 6 provinces) (see Figure D-2). In analyzing each question, the results for individual countries were compared against the aggregate results (see Tables D-1 and D-2). Because most countries had few entries, U.S. responses were compared with the aggregate figures for all non-U.S. responses. No significant differences appeared. For example, the percentages of those self-identifying their work as being in network science, of those stating there is an identifiable field of network science, and of those providing definitions, interests, application, and challenges were closely comparable. Similarly, there were no significant differences in the mean number per response of fields selected; collaborators; projects; or new names (or of new names that later responded). This was also true when U.S. responses were analyzed by state (see Figure D-3 and Table D-3). Fields of Study The best-represented fields, as identified by the respondents, are computer science (and its closely related areas), other (described in more detail below), math, biology, and physics (see Figure D-4). The questionnaire was structured to allow each respondent to indicate more than one field of interest, and this opportunity was heavily used: the mean number of fields selected by a respondent was 3.6, and 80 percent of the respondents selected more than one field. For this reason, the response-per-field figures shown in Table D-4 are 3.6 times the number of responses. These data also demonstrate the success of the solicitation process in bringing in research communities that had not been identified in advance as involved in network science: Some 159 (28 percent) of the responses indicated a field other than the fields initially provided by the online questionnaire (see Figure D-5). Analysis of the free-form text entries describing these other fields shows great diversity, with the most numerous being engineering, geosciences, and human communication. The category labeled “Unclassified other” represents fields with single entries; examples include botany and economic history. The respondents overwhelmingly came from academia; of the 619 respondents (98 percent of all responses) who indicated the type of organization they worked in, the questionnaire received only 46 (7 percent) from industry and only 12 (2 percent) from the military (see Table D-5). One significant contributor to the low representation from industry and the military was the comparative difficulty of finding e-mail addresses or other contact information for people outside academia. Compounding this problem, the effectiveness FIGURE D-2 Countries where respondents were located.
OCR for page 74
Network Science TABLE D-1 Respondent’s Country Country Number of Respondents Who Selected the Country Percent of Respondents Who Selected the Country Known non-U.S. 132 20.9 Argentina 1 0.2 Australia 5 0.8 Belgium 4 0.6 Brazil 4 0.6 Bulgaria 1 0.2 Canada 23 3.6 China 2 0.3 Denmark 3 0.5 France 5 0.8 Germany 14 2.2 Great Britain 19 3.0 Greece 1 0.2 Hungary 1 0.2 India 2 0.3 Israel 6 0.9 Italy 10 1.6 Japan 3 0.5 Korea 4 0.6 Mexico 1 0.2 Netherlands 2 0.3 New Zealand 1 0.2 Poland 1 0.2 Portugal 2 0.3 Russia 2 0.3 South Africa 1 0.2 Spain 8 1.3 Sweden 3 0.5 Switzerland 3 0.5 United States 497 78.5 Unknown 4 0.6 of snowballing was significantly greater for academic respondents: On average, each response from academia provided 3.1 new names, while responses from outside academia provided only 2.1 new names. In turn, the people identified by academic respondents were also 50 percent more likely to respond. The committee speculates that researchers in academia may perceive more incentive to respond and may attach more importance to influencing the study. Judging from personal experience and anecdotal evidence, industrial researchers today are under intense pressure to focus on near-term financial return. The respondents were nearly unanimous in describing their own work as related to network science: Only 24 respondents (4 percent) did not so describe it. An additional 1 percent of the solicitations elicited personal e-mails to the committee indicating that the recipient declined to submit the questionnaire, usually because he or she did not work in the area (see Table D-6). The near unanimity on working in TABLE D-2 Canadian Respondent Provinces Province Number of Respondents Who Selected the Province Percent of All Respondents Who Selected the Province Percent of Respondents in Canada Who Selected the Province Alberta 1 0.2 4.3 British Columbia 7 1.1 30.4 Newfoundland 3 0.5 13.0 Nova Scotia 3 0.5 13.0 Ontario 8 1.3 34.8 Quebec 1 0.2 4.3 network science was independent of the specific fields that researchers worked in and of the country where they worked. Ninety-seven percent of respondents from academia said they worked in network science, as did 93 percent of the other respondents. Overall, this result is a reminder that the results of the questionnaire reflect self-selection on the part of those who responded; it must also be considered in light of the fact that only 70 percent of these respondents indicated that there was an identifiable field of network science (see “Dissenting Voices,” below). DISSENTING VOICES The questionnaire analysis demonstrates that there is a widespread but not universal belief among the respondents that there is an identifiable field of network science. Although 95 percent classify their own work as potentially belonging to an emerging field of network science, only 70 percent state that such a field is currently identifiable. The main reasons for saying there is no such field are that the term has no coherent definition, that it is broad to the point of vacuity, that it is too soon to define the field, that the field is merely a new name for an already existing field, or that it represents the wrong approach. The lack of consensus is shown clearly in the questionnaire responses: Of the responses that had been received as of April 29, 2005, only 442 (70 percent) answered yes to Q3a: Is there an identifiable field of network science? Of the remaining responses, 146 (23 percent) answered no and 45 (7 percent) did not answer. These percentages proved stable as the number of responses grew and were only mildly dependent on the field of study of a respondent. Table D-7 lists the fields in decreasing order of the proportion of positive responses to the question. Only political science and public policy are notably more skeptical. Many more fields have yes percentages above the mean for the entire sample than below it: This “Lake Wobegon” effect primarily arises from a positive correlation between answering yes to this question and marking oneself in more fields of study.
OCR for page 75
Network Science FIGURE D-3 States where respondents were located. Although 23 percent of the respondents explicitly stated that there was no such identifiable field, there is some sign in the results that more than 23 percent felt that the question was debatable: 26 percent answered Q3e, “If no, what are the principal reasons for your answer?” and 33 percent answered Q3f, “Should there be such a field of study?” Q3e allowed free-form descriptions of the principal reason for saying there was no identifiable field of network science. The committee analysis of the 163 responses to this question indicates five broad reasons; a given respondent often would offer more than one reason (see Figure D-6). In addition, respondents also said that the field suffered from excessive hype. The five broad reasons there is no field of network science are these: The term is unclear or has no coherent core. For example, “Network science combines two words such that the resulting pair specifies less information than either individual word alone.” The term reflects a field that is still emerging, so it is too early to tell if it will bear substantive results. This phrasing may occasionally be a more tactful variation on the preceding wording, but often specific emerging application domains are mentioned. The work labeled network science is simply part of some specific existing fields under a new name. There is disagreement on just what existing field it is; frequent candidates include graph theory, complexity theory, systems theory, computer science, and control theory. The phrase is too broad, to the point of being vacuous: anything can be represented as a network, but doing so does not provide meaningful insight. For example, “Network theories turn out wrong when applied to particular application areas.” The idea of developing network science as a separate field is a wrong or barren approach. For example, “The interesting questions arise from function, rather than topology.” These answers agree that there is work that can be called network science but disagree that developing it as a discipline science will benefit the many other application domains that refer to networks in some form. When these comments indicate what the
OCR for page 82
Network Science driven by these defining constraint models through feedback or signaling mechanisms. The cost models and benefit models may have both structural components (determined by static attributes of the network) and dynamic components. “Cost” is an abstract term measuring consumption of resources or decrease in value; while engineered networks may have cost models that output actual dollar costs (among other penalty factors), many networks measure costs in other units. The dynamic components of a cost model reflect aspects of the temporal behavior of the network connectivity and exchange, including the degradation of an exchanged resource’s value and the consumption of resources on the nodes and links. As mentioned above, both nodes and links have dynamic components to their definition in that they transform resources passing through them. The transformation is reflected in both cost models and benefit models defining the network. Depending on a given network’s definition, a transformation may contribute to either cost or benefit or both. For a concrete example, the transformation of electrical energy into heat may appear in the constraint models defining a power distribution grid not as a contribution to the penalty figures but as a contribution to the benefit model of a heating system. The output penalty and merit figures generated by a network’s constraint models are where the end-to-end and systemwide attributes of a network first emerge into view as a network is defined. The concomitant feedback or signaling mechanisms may be implicit or explicit and may be any mixture of in-band and out-of-band. In-band feedback mechanisms are those exploiting signaling explicitly carried or implied by components of the resources exchanged. Note that the existence of constraint models is an inherent factor in network science research and one of the dominant reasons for interdisciplinary approaches. All networks implicitly or explicitly have one critical set of dynamic cost models: The links have finite speeds for exchanging resources and nodes have finite throughput. In particular, the costs derived from these models ensure that the processing of exchanges and the actions of the feedback mechanisms occur locally (see Table D-10). Driving Dimensions The analysis of the questionnaire responses also identified three additional significant and common dimensions that are drivers of the difficulty of many associated challenges and of the research effort to address them: complexity, scale range, and network context. Note that these three dimensions, although widely mentioned in the responses and critical to the challenges and TABLE D-9 Summary Decomposition of the Derived Properties of Networks Property Structure Dynamics Characterization Which nodes and links are “important”? What roles do they play in the network? How would one modify a network to change the role of a node in a specified way? What is the performance of the network, typically in terms of qualities of the resource exchange? What network attributes are required to achieve specified behavior? How do we measure performance and maintain it? Cost What is the cost of the aggregate suite of nodes and links of a network, given their defined attributes? What is the cost of the latency and degradation of the resource exchange realized by this network? What is the least cost network, given input constraints, to achieve a specified performance? Efficiency What is the expected utilization of the network nodes, links, and their limiting resources? What is the trade-off between cost and performance for the available design space? How could the behavioral attributes of the network components be modified to improve the efficiency of the network realization? Evolution Which structural attributes are preserved as the network evolves? What structure should be created to assure its stable evolution? What evolutionary path will emerge under specific rules and constraint models for the addition, modification, and deletion of nodes, links, and their attributes? How can one design or promote local behaviors that will result in a desired evolutionary path? How do failure modes and attacks evolve in response to the network evolution? Resilience What are the structural attributes that resist accidental or intelligently planned damage and overload? How does the behavior of a network change under damage and overload? What input behavioral rules induce better behavior under these scenarios? Scalability Which structures scale in terms of the measures of complexity? How does network behavior change as network scale changes? What rules and constraint models assure desired behavior across changes in scale?
OCR for page 83
Network Science TABLE D-10 Summary Decomposition of Constraint Models Constraint Models Structure Dynamics Cost models, benefit models, and the associated models of feedback effects. Models for determining the costs and benefits for a given node of the locally visible neighborhood. Models for determining the costs and benefits of adding, deleting, and modifying nodes and links. Models for determining aggregate costs and benefits of the network nodes, links, and their properties. Models for costs and benefits for the exchange of classes of resource, with its associated transformation. Models for how these costs and benefits affect network structure and its evolution. Models for how these costs and benefits affect network behavior and its evolution. potential value of a possible discipline of network science, are not required for a system to be studied as a network. The greatest benefit from a rigorous network science, however, would lie in understanding the laws that drive the structure and dynamics of networks across the extremes of these three critical dimensions: Complexity. This dimension includes issues such as large scale (a large number of nodes, links, classes of exchanged resources, or constraints), as well as how the nodes and links behave. Scale range. This dimension reflects the wide range of interacting critical temporal and spatial scales in the structure and dynamics of a network. Network context. This dimension addresses the environment of a network as it relates to other networks: Most networks exist in the context of a larger set of other networks on which they depend and with which they interact. These networks may be naturally captured as different levels of abstraction or as competing and cooperating networks at the same level of abstraction. A social network, for example, may be strongly influenced by the characteristics of the communications, economic, and transportation networks in which the social organisms are embedded (and they, in turn, affect those networks), but each network is best dealt with to a first approximation as its own form of abstraction, using appropriate approximations to reflect how the other networks affect the exchange, storage, and transformation of its various classes of resource of interest (see Table D-11). Driving Applications The 427 responses through April 29, 2005, that proposed driving applications for network science (68 percent of all responses, including 92 percent of those who said an identifiable field exists and 10 percent of those who said it does not) described a highly disparate set of applications, generally tightly bound to specific other fields or problem areas (see Figure D-9). Most of the responses were fairly terse and high level and showed little consensus on specific applications. The committee’s analysis identified five major communities of research players: technological, biological, social sciences, interdisciplinary, and physical sciences and math. When viewed at a level high enough to allow identifi- TABLE D-11 Summary Decomposition of the Problem Dimensions of Networks Dimension Structure Dynamics Complexity A high number of nodes, links, resource classes, or rules in the cost models and benefit models. High numbers of internal states and transition rules for the behavior of nodes and links. Scale range Dependencies across wide range of spatial dimensions. Highly disparate node and link attributes within one network. Dependencies across a wide range of timescales. Highly disparate rates of interaction and evolution in different regions of the same network. Network context Number and nature of peer networks and of networks at other levels of abstraction. Opacity: unavailability of information or resources across network boundaries. Definition of behavioral rules governing resource exchange, constraints, and interactions between nodes.
OCR for page 84
Network Science FIGURE D-9 Driving applications identified by respondents. cation of shared concerns, the current driving applications proved to be closely related to the description of the major research challenges (covered in the next section). There were also a few voices dissenting on the question itself. For example, one response was that there are hundreds of applications called network science; another was that applications are not the drivers for network science, as it is still an emerging basic research field. In contrast to the view that there were too many applications to consider, another respondent’s view was that more effort has been spent on the search for universality principles in networks than on the rigorous study of stand-alone application areas. These dissents echoed the reasons given for saying there is no such field as network science. Because so few responses were received from outside academia, no useful conclusions can be drawn about the interests of specific nonacademic communities. Within the academic world, the players are generally grouped into well-defined communities focused on particular domains of study. For this reason, the frequency with which particular classes of applications are cited in the questionnaire response closely tracks the response rate by field shown in Table D-4. Within each community, the leading applications in terms of the number of responses are shown in Figure D-10 and Table D-12. The leading concerns of the technology and engineering communities are closely related. Distributed computing focuses on the efficient realization of applications that are FIGURE D-10 Number of responses to driving applications question.
OCR for page 85
Network Science TABLE D-12 Major Players and Cited Applications Players Summary of Most Cited Applications Technological Distributed computing Information sharing and discovery Telecommunications Biological Public health and disease transmission Ecosystem modeling Systems biology Social sciences Social network analysis Economic models and resource distribution Interdisciplinary Understanding complex systems Intersection of human interactions and networking technology Physical sciences and math High-energy physics Mathematical models of networks highly decentralized; many of these, in turn, are entwined with issues of networked information sharing, such as peer-to-peer methods. The classic network problems of telecommunications engineering (network design, reliability and resilience, cost-performance trade-offs) appear frequently, but generally with an emphasis on wireless infrastructures (including self-organizing ad hoc and sensor networks). Network collapse (meaning the collapse of the ability to transport the communications payload) is also involved. The driving applications pursued in the biological sciences include models of disease transmission, ecological modeling and biodiversity, and systems biology. The focus of network understanding of disease transmission is both predictive—How will epidemics spread? What is the relation between the structure of the transmission network and the evolution of the disease?—and interventional—What changes to the underlying social and transportation networks would prevent or reduce epidemics? Ecological applications focus on understanding the flows of energy and nutrients in ecological networks, the interdependence of organisms and species, and qualitative changes in an ecosystem (such as biodiversity stability and collapse). Systems biology refers to the need to understand the system-level architecture of a cell or an organism, as well as to design drugs and interventions that can cause the desired effects and very few side effects. However, a few responses expressed concern about the time required to go from basic system biology to specific medical applications. The applications in the social sciences are affected by the deep history of the discipline’s analysis of social interaction and influence networks. Other respondents were concerned with understanding the well-known, heavy-tailed distributions in numerous social constructs in terms of the underlying social networks. Economic network issues such as the flow of capital also relate to the comparative impact of underlying infrastructure networks for communications, application information sharing, and transportation of people and material. Network science is being applied to distribution channel behavior, such as interpersonal ties within a market or interorganizational ties in a value chain. Smaller but still significant numbers of responses mentioned organization models and political applications, ranging from disrupting terrorist networks to supporting prodemocracy organizations under authoritarian regimes. “Interdisciplinary communities” refers specifically to respondents who self-identified as being involved in several disparate fields and/or who proposed application topics that explicitly span or relate disparate fields. This distinction is necessary because many responses gave lists of unrelated applications drawn from different fields. In this category the committee included “complex systems,” which occurred commonly but was generally not given further definition; the term “emergent phenomena” is closely related and likewise undefined by the respondents. Descriptions of driving application also commonly mentioned the need to understand the relationship between disciplines such as biological and computer networks. The use of telecommunications networks for data-intensive computing applications was cited as one driving application on the physical sciences; other such applications in that area spanned physics and chemistry, such as understanding high-dimensional dynamical systems and the network structures underlying the energy landscapes that drive protein folding and similar optimization behaviors. One driving application in mathematics was theories for relating systemwide behavior and network structures. RESEARCH CHALLENGES The committee, aided by the questionnaire, identified a number of important research challenges that should be addressed if the field of network science is to be moved forward. The analysis of research challenges was based on the responses to question 3d (What are the key research challenges?) and was performed by reading through all of the responses to the questionnaire and binning the results to infer broad topics that recurred frequently. To ensure that the responses were not being biased by individual committee members, the responses were compared with an earlier, independent analysis of the responses. The responses that fit within a broad category of challenge were counted, and the seven most highly populated categories were selected for inclusion in the report. Each category had between 25 and 100 responses identifying it as a research challenge; the proportion of responses garnered by each challenge is shown in Figure D-11. The seven primary challenges that were identified were these:
OCR for page 86
Network Science FIGURE D-11 Major research challenges. Dynamics, spatial location, and information propagation in networks. A major need in network science is a better understanding of the relationship between the architecture of the network and its function. This is particularly important in networks where dynamics plays a large role, either through the flow of information around the network or through changes in the network structure (by evolution or adaptation). How the structure of a network relates to the behavior of the system is still not well understood and will be a major impediment to progress in many applications. Modeling and analysis of very large networks. Present tools and approaches are designed to work with relatively small networks, but many of the most important problems involve much larger networks. Examples of such networks include cell regulatory networks in biology, social and economic networks, and computer communication networks (including military command-and-control networks). Abstractions and approximations are needed that allow reasoning about these large-scale networks, as well as techniques for modeling networks with noisy and incomplete data. Analysis techniques for such networks must have good scaling properties so that they can be applied to the very large networks that are key to network science. Design and synthesis of networks. While a lot of engineers, scientists, mathematicians, and sociologists are simply trying to understand complex networks already in existence, in many application areas the goal is to design a network to obtain a desired behavior—for example, scalability, robustness, usability, resiliency, efficiency, and resolvability (or adaptability). It may be possible here to learn from biological systems how to design engineered systems that exhibit equally complex, adaptive, and robust behavior. Increasing the level of rigor and mathematical structure. Many of the respondents to the questionnaire felt that the state of the art in network science did not have an appropriate mathematical basis. This level of mathematical rigor could be achieved by a combination of defining the appropriate levels of abstraction for analysis, developing better tools in graph theory and other relevant disciplines, and searching for fundamental limits of performance. Abstracting common concepts across fields. Many members of the committee and respondents to the questionnaire cited the need to define common concepts across the disparate disciplines and applications that are part of network science. The multidisciplinary nature of the work is a challenge, but results could be transferable from one field to another if appropriate unifying principles can be developed. Better experiments and measurements of network structure. Current data sets on large-scale networks tend to be sparse, and tools for investigating the structure and function of these networks are limited in many domains. There was a general feeling shared across many fields that there needs to be more and better access to data, which in some domains requires new measurement techniques to be developed—for example, to obtain a detailed spatiotemporal measurement of the operation of a cell. One respondent suggested the de-
OCR for page 87
Network Science velopment of a so-called “macroscope” to detect, communicate, and understand the structure and dynamics of large-scale networks. Robustness and security of networks. Finally, there is a clear need to better understand and design networked systems that are both robust to variations in the components (including localized failures) and secure against malicious intent. This requires a much more sophisticated understanding of the failure mechanisms in networked systems as well as better tools for predicting the impact of perturbations on networked systems. The Social Structure of Network Science In addition to the analysis performed by the committee, the data were also analyzed by Katy Börner, who addressed the visible social structure of research in network science as indicated by the collaboration and invitation entries of each respondent. Her analysis is provided in Box D-2. Upon reviewing her analysis, the committee consensus was expanded to include the following two findings presented in Chapter 6 on the empirical state of the proposed field of network science: Finding 6-7. Analysis of the social and collaboration networks of the respondents provides additional evidence that network science is an emerging area of investigation. While the clusters within the network are only weakly connected, a large connected core spans many of them. Based on Dr. Börner’s extensive experience, and on the judgement of the committee, this pattern is characteristic of an emerging field and constitutes objective evidence that network science is a field, but an immature one whose future is still undecided. Finding 6-8. Analysis of the social and collaboration networks of the respondents provides additional evidence of the multidisciplinary nature of network science. Dr. Börner’s analysis of the social and collaboration networks provides additional evidence of the multidisciplinary nature of network science. Researchers from any given discipline are distributed throughout the graph, and any given subcommunity includes researchers from multiple disciplines. This pattern is unlike any field previously analyzed by Dr. Börner. In the committee’s judgment, therefore, the pattern constitutes objective evidence that network science is a field that is distinctly interdisciplinary, with research concerns that support multiple application domains.
OCR for page 88
Network Science BOX D-2 Mapping the Social Network and Expertise of “Network Science” Researchers This box presents the anonymized results of a bibliometric analysis1,2 of the social networks and expertise coverage of network science researchers prepared at the committee’s request by K. Börner and W. Ke, of the InfoVis Laboratory at Indiana University. All results are based on the self-reported data in the file named “cleaned_survey_as_of_050318_0910a_posted.xls.” Subsequently, the authors report the data® cleaning and analyses performed, major results, and their interpretation. They conclude with a set of recommended topics for further study. Data Set Used, Analysis Results, and Interpretation The data file “cleaned_survey_as_of_050318_0910a_posted.xls” comprises 499 completed questionnaires that report 923 “collab_with” links reported under Q2c and 376 “invite” links reported under Q4a. To ensure a high quality of automatic data extraction and analysis, all names reported in free-form text as “Other collaborators” under Q2c and all “Other people to invite” reported under Q4a were not considered. Figure D-2-1 illustrates relationships among the initial invitees, respondents, and identified collaborants. In total, 1,241 unique names of network science researchers were identified. E-mail addresses were used to ensure that these names are truly unique and represent exactly one person. As requested by the National Research Council, author names were replaced by a unique identification number to preserve the anonymity of authors. In addition, the 22 (checkable) fields of interest as well as the free-form text of “other” fields of interest reported in Q1c were analyzed. In total, 138 unique fields of interest were identified. Fields that were mentioned most often were computer science (mentioned 201 times), information technology (166), and Internet (156). Data Quality Issues The “collab_with” links are mostly made to researchers in spatial or thematic proximity. Hence, these links help to grow the social network of network science researchers locally. Colleagues reported that they tried to “invite” people who were not yet in the data set. There was no question that asked users to identify “major players” or “gatekeepers.” There are many misspellings of names and disciplines in this data set. Information provided in the “other collaborators” and “other people to invite” section could not be used in this automatic analysis. Data Analysis Results Here we report “major researchers” who are frequently mentioned in the data set, who act as gatekeepers, and who interlink many scientific fields. In addition, we extracted and will present existing social and collaboration networks. Researchers who are frequently mentioned in the complete data set and the number of times they are listed as a collaborator are given inTable D-2-1. Figure D-2-2 shows the major components (size≥10) network of the network science researcher network (NSRN). The Pajek3 plot shows exactly 630 of the 1,241 unique researchers, and their “collab_with” links and “invite” links are shown. Each researcher is represented by a node. Node color coding is used to identify researchers that submitted (brown) or did not submit (orange) questionnaires. The node areas’ size corresponds to the number of times a researcher is mentioned by other researchers. Each FIGURE D-2-1 Relationships among invitees, respondents, and collaborators. TABLE D-2-1 Researchers Who Are Frequently Mentioned and Listed as Collaborators ID No. Listed No. Listed as Collaborator 1005 12 8 9 8 7 512 12 6 1009 7 5 139 7 5 1023 8 5 1047 5 4 784 6 4 455 6 4 814 4 4 1238 7 4 925 5 4
OCR for page 89
Network Science FIGURE D-2-2 Network science researchers network. node with a betweenness centrality no less than 0.00001 or a size (number of appearances in the data set) ≥3 was labeled with the author’s name (ID). Links denote “collab_with” links (in orange) and “invite” links (green). Subsequently, researchers who act as gatekeepers were identified based on an examination of the betweenness centrality (BC) values4,5 of nodes in the NSRN. The top 10 researchers are given in Table D-2-2. Figure D-2-3 indicates nodes with a BC value ≥0.00001 by a black ring and shows them in the context of the NSRN. To examine the community structure of network science researchers, we examined the different components in the NSRN. Table D-2-3 shows the size of existing components, the number of components that have this size, and the total number of nodes in these components. The largest component in the NSRN is shown in Figure D-2-4 using the color coding introduced in Figure D-2-2. It represents the current coherent core of the new field of network science. TABLE D-2-2 Researchers Who Act as Gatekeepers ID No. Mentioned Betweeness Centrality Value 1066 4 0.00020275 997 2 0.00017878 981 4 0.00015093 9 7 0.00013408 341 2 0.00012502 925 4 0.00010882 845 3 0.00010688 959 1 0.00009716 1225 2 0.00007773 162 3 0.00007060
OCR for page 90
Network Science FIGURE D-2-3 Researchers with high BC values (in black) and low BC values (in gray). TABLE D-2-3 Components in the NSRN Size No. of Components No. of Nodes 1 77 77 2 32 64 3 25 75 4 45 180 5 10 50 6 12 72 7 7 49 8 1 8 9 4 36 10 4 40 11 2 22 13 1 13 14 1 14 15 1 15 17 1 17 18 1 18 30 1 30 33 1 33 73 1 73 355 1 355 Total 1,241
OCR for page 91
Network Science Interpretation Compared with maps of other scientific disciplines, the NSRN clearly exhibits the characteristics of a new and emergent research area: It consists of many unconnected networks of collaborating network science researchers, and the existing networks show a rather heterogeneous coverage of research topics. Figure D-2-5 is a map of all network science researchers visualized in VxInsight.6 The map at the left-hand side shows the NSRN. On the right, the very same graph is shown in “landscape” mode, with colored dots representing the self-reported interest profiles of researchers. A white dot denotes that the researcher listed “biology” as a principal field of interest in Q1c. Yellow denotes “computer sciences,” light blue “Internet,” blue “physics,” and green “sociology.” As can be seen, there are no groupings of researchers with similar fields of interest. Instead, very different research interests seem to be almost equally distributed over the NSRN. As the field of network science matures, subareas devoted to the study of specific research fields are likely to emerge, and many of the separate components will exhibit collaboration links, weak or strong and temporary or stable. FIGURE D-2-4 Largest component of the NSRN.
OCR for page 92
Network Science FIGURE D-2-5 Disciplinary heterogeneity of the NSRN. Social Network of Network Science Researchers: Topics for Further Study Increase our understanding of the interplay of affiliation, thematic, and social interrelations among today’s network science researchers. Invite key network science researchers to identify and label the main research groups key shown in Figure D-2-2. Bibliometric analysis of networkscience publications, patents, and funding data. The InfoVis Laboratory at Indiana University is developing the sociotechnical infrastructure to analyze the structure and evolution of scientific disciplines and all of science on a large scale.7 Major publication, patent, and grant databases (covering mostly U.S. research) are available, as are scalable algorithms and compute power. A detailed, objective analysis of scholarly data would complement the self-reported, subjective data and its analysis reported here. Development of an online portal that tracks and communicates the evolution of network science research and results. Geospatial and semantic maps of network science researchers and publications presented here and proposed in Shiffrin and Börner (2004) can be made available online as a unique interface to data sets, publications, and expertise related to network science research. Researchers interested in being “on the map” should be given the option to submit data about their publications, collaborators, etc. The incentives for researchers to contribute high-quality data can be further increased by using this online map to make funding decisions much as PI’s resumes are used today. Assuming that a comprehensive set of high-quality data can be acquired on a continuous basis, an interactive, continuously evolving, weather-forecast-like map of network science research can be made available to grant agencies, researchers, practitioners, and society at large. NOTE: The authors would like to thank Will E. Leland for compiling the data set used in this study and for insightful feedback on previous results. This work is supported by a National Science Foundation CAREER Grant under IIS-0238261. 1 Börner, K., C. Chen, and K. Boyack. 2003. Visualizing knowledge domains. In Annual Review of Information Science & Technology, B. Cronin, ed. Medford, N.J.: Information Today, Inc./American Society for Information Science and Technology. 2 Shiffrin, R.M., and K. Börner, eds. 2004. Mapping knowledge domains. Proceedings of the National Academy of Sciences of the United States 101 (Suppl. 1). 3 Batagelj, V., and A. Mrvar. 1997. Pajek: Program package for large network analysis. Available at http://vlado.fmf.uni-lj.si/pub/networks/pajek/. 4 Freeman, L.C. 1997. A set of measuring centrality based on betweenness. Sociometry 40:35–41. 5 Brandes, U. 2001. A faster algorithm for betweenness centrality. Journal of Mathematical Sociology 25(1): 163–177. 6 Davidson, G.S., B. Hendrickson, D.K. Johnson, C.E. Meyers, and B.N. Wylie. 1998. Knowledge mining with VxInsight: Discovery through interaction. Journal of Intelligent Information Systems 11(3): 259–285. 7 Available at http://iv.slis.indiana.edu.
Representative terms from entire chapter: