Collaboratory Life: Challenges of Internet-mediated Science for Chemists
Thomas A. Finholt
University of Michigan
Since the birth of modern chemistry in the early 19th century there has been tremendous growth in the knowledge and the practical application of chemical principles. However, in many important ways, the practice of chemistry research and teaching has remained unchanged. The advent of the Internet as a worldwide mechanism for conducting scientific communication challenges this status quo. Specifically, innovations like collaboratories, or network-based virtual laboratories, remove constraints of distance and time on scientific collaboration. In particular, collaboratories increase access to scarce instruments, accelerate the flow of information, and place new demands on senior scientists to mentor students. Chemists need to appreciate how these new ways of doing scientific work will influence the conduct of chemistry research so that they can effectively anticipate and influence the development of emerging Interact technologies.
Collaboratory life: Challenges of Internet-Mediated Science for Chemists
The Internet,1 the World Wide Web,2 and sophisticated collaboration technologies3 represent the raw ingredients for a revolution in the practice of chemistry. Yet today, for many chemists, this revolution is only partially realized or has not begun. As a result, the field of chemistry rests squarely
on techniques and methods, many of which are over a century old. That is, while the content of chemical knowledge has advanced dramatically in the last 200 years, the organization of chemical research and education has remained relatively constant. By contrast, other disciplines race to embrace change, such as physicists' invention and rapid adoption of the World Wide Web and the widespread use of the Web for data dissemination among biomedical scientists. An apt metaphor to describe the challenge of the Internet for chemistry is Paul Gauguin's masterpiece (Figure 7.1), Where Do We Come From? What Are We? Where Are We Going?
In the title of his painting, Gauguin evokes the fear and uncertainty that accompany the transition from the past (Where do we come from? ), through the present (What are we?), and into the unknowable future (Where are going?). The style and content of the painting also underline Gauguin's personal status as a bridging figure between impressionism and modernist schools, such as cubism and fauvism. While Gauguin was captivated by the impressionists early in his career, and worked and showed with them, later in his career he broke away and defined a new kind of art, often labeled post-impressionism. In this later work, Gauguin experimented with the use of color and symbolism in a way that paved the way for those who followed, including Matisse, Picasso, and Munch. Therefore, at many levels, this painting represents the tension of being caught between familiar traditions and the birth of new ways.
Chemists confront a similar tension between tried and true practices from the past and unknown alternative practices made possible through advances in information technology. In this sense Where do we come from? is a question about the traditions and conventions that have defined chemistry, especially with regard to the organization of research and education. The question What are we? offers an opportunity to reflect on the present state of the Internet, while the question Where are we going? forces consideration of the various new paths that Interact-mediated chemistry might follow into the future. The “Gauguin problem," then, is a statement about the difficulty any community faces when past and current success precludes full examination or experimentation with potentially transformational practices and approaches. In chemistry, the Gauguin problem can be framed as the enduring legacy from innovation at the dawn of modern chemistry in the late 18th and early 19th centuries, the mixing of inherited tradition with capabilities provided by the Internet that is occurring today, and alternative views of the future defined by new uses of the Internet.
Where Do We Come from?
As an emergent discipline in the late 18th and early 19th centuries, chemistry had little received tradition in terms of either how to conduct chemical research or how to teach about chemical relationships. Therefore, the first 30 years of the 19th century saw development of many of the research and pedagogical practices that are still in use today. Three particularly important innovations were the creation and elaboration of the research laboratory, the use of laboratory classes in chemical education, and the use of lecture demonstrations to illuminate and clarify chemical principles.
The roots of the modern research laboratory, that is, a physical concentration of personnel and apparatus dedicated to Systematic chemical research, can be found in Humphry Davy's laboratory at the Royal Institution at the turn of the 19th century. Under the patronage of Count Rumford, the founder of the Royal Institution, Davy simultaneously mastered the arts of building voltaic devices for the discovery of new elements as well as raising the funds to construct new devices.4 The impact of Davy' s efforts in terms of new knowledge is obvious in terms of his identification of sodium, potassium, and so forth. Nearly as important, however, is the model that Davy's lab established for subsequent scientists. That is, the halls of the Royal Institution defined not only a physical space that housed critical instruments but also a social organization that produced tradition, fostered networks, and became a place to train future generations of researchers. Indeed, among Davy's greatest Contributions was his mentorship of Michael Faraday. The broader success of the Royal Institution as a research organization is represented in the work of the Institution's nine Nobel laureates. Davy's approach carries through to the present largely unchanged. That is, the imperatives that drive modern research laboratories—hiring good people, installing state-of-the-art equipment, and getting funding—differ in magnitude and sophistication, but not in fundamental character, when compared to Davy's day. In fact, it seems likely that were Davy to travel forward in time to a lab at Michigan, or DuPont, or Cambridge, he might be amazed at the focus of research and the instruments in use, but the organization of scientists and resources to conduct the research would be entirely familiar.
A second great innovation from chemistry's founding era was the invention of the laboratory class. The introduction of laboratory classes is associated with Justus yon Liebig and the organic chemistry curriculum he developed at the University of Giessen, beginning in 1824. Prior to Liebig, chemistry was taught largely through example and demonstration and not through active participation at the lab bench. While the demonstration approach was developed to a high art, it did not produce many chemists, since few students gained hands-on experience in manipulating chemicals.5 By contrast, Liebig's "practical" approach immersed students in exercises that were reasonable analogs to techniques used by practicing chemists. Therefore, students in Liebig's lab gained not only experience but also a sense of the thrill and challenge of conducting research. Pictures of Liebig's laboratory drawn in 1842 show a scene not too different from a modern undergraduate chemistry lab. Students clustered around lab benches produce and analyze specified compounds, guided by an instructor or lab supervisor. Again, as with Davy, we could bring Liebig forward in time and most of what he would observe about the organization of modern lab courses would be similar to his own era.
A final innovation was the public lecture and accompanying demonstrations. This practice origi-
nated with Faraday's Christmas lectures at the Royal Institution, which began in 1826. The Christmas lectures have continued to the present and are now broadcast via television and available via download from the Web. The essence of this tradition is for prominent scientists to communicate the excitement and value of their research to a lay audience, particularly young people. The main mechanism used in the Christmas lecture is explanation accompanied by interesting demonstrations. This pedagogical approach is not just the foundation of the Christmas lecture series; it is also the continuing basis for most secondary and undergraduate instruction in chemistry. Indeed, most large university chemistry lecture halls have an accompanying room where special equipment, just for doing demonstrations, is prepared under the guidance of a demonstrations supervisor. Therefore, as in the two preceding examples, there is not much about contemporary lectures that would be surprising or strange to someone from Faraday's time.
These three innovations do not represent a comprehensive treatment of the historical tradition in chemistry. However, they do signify important cornerstones of past and present practice in chemical research and education. More important, the legacy of the research laboratory, the laboratory course, and the public lecture defines the starting place for thinking about the organization of alternative approaches. Specifically, with the advent of the Internet, the Web, and collaboration tools, chemists confront a choice between extrapolation from a known and successful past (i.e., joining the capabilities of these new technologies to familiar practices) and exploration of entirely new ways of doing and teaching chemistry.
What Are We?
The previous section examines chemistry's past and how that past determines the present. This section considers the present, particularly the new opportunities available to chemists through the expansion of computer and Internet technologies. These opportunities can be thought of in terms of the raw performance of computer processors, the capacity of communication networks, the scope of networks, and the evolution of software.
The engine of progress in computing is Moore's law, or the observation by former Intel CEO Gordon Moore that the performance of computer processor chips—when measured as the number of transistors per processor—doubles roughly every 2 years. Figure 7.2 illustrates the progress of processor development over the last 26 years, using Intel chips as a benchmark. A corollary of Moore's law says that for a constant price, a computer purchaser gets twice as much power every 2 years. Either way, this is a trend toward phenomenal increases in computing capability over time. For instance, it is often observed that current Pentium 2 workstations are roughly comparable in speed to supercomputers sold in the early 1980s.
Recent explosive growth in the size of the Internet points to a new metric of computing performance: network bandwidth. Contemporary performance, or lack of performance, within the Internet is legendary—hence the popular observation that WWW stands for "World Wide Wait." Examining plans for installation of high-capacity fiber across both the Pacific and the Atlantic oceans, however, suggests that current network delays may soon disappear. For example, capacity across the Pacific is slated to increase to 300 gigabits per second (Gbps) by the year 2000 from a 1998 level of 25 Gbps, while capacity across the Atlantic will increase to 250 Gbps from 110 Gbps.6 For comparison purposes, I
Gbps is equivalent to 70,000 phone calls. Estimates are that the expansion of international bandwidth will easily meet projected growth in voice traffic and that the bulk of increased use will be data traffic. This means that many applications that impose prohibitive bandwidth overhead today (such as desktop video conferencing) may, in the near future; become more practical. Plans are already launched in the United States for next-generation network technologies that will exploit increasing bandwidth, such as the University Consortium for Advanced Internet Development (UCAID; <http://www.ucaid.org>). UCAID hopes to deliver network performance of 150 megabits per second, which in most cases will represent dramatic improvement in network throughput. This could mean, for instance, easy transfer of large data sets around the Internet, routine use of bandwidth-intensive applications (such as audio or video), and increased use of applications that require high quality of service (such as remote manipulation of instruments).
A third hallmark of change in the Internet is the tremendous expansion of hosts and the worldwide penetration of the Internet. Recent examination of connected countries shows host domains for all but three nations, with even the tiny island nations of Nauru and Comoros linked to the global network.7 This scope means that, to an unprecedented extent, scientists with access to the Internet can reasonably expect to communicate—and possibly collaborate—with colleagues located anywhere on the face of the Earth. Similarly, through the Internet, scarce resources, such as libraries and rare instruments, can be made available to larger populations of users.
The changes noted above are momentous, but each represents more potential progress than realized progress, at least in terms of practice and behavior. This may be because unprecedented increases in
computing power, network bandwidth, and network scope have not been matched by corresponding improvement in the usability of applications and software. Figure 7.3 represents this trend. The y-axis indicates raw performance of computing technology, such as the benchmarks listed above (processor power, bandwidth capacity). The x-axis indicates time. The upward slope of each curve shows overall improvement. However, the contrast between the curve labeled "raw performance" and the curve labeled "real performance" reflects the difference between what we could do with information technology (shown at the extreme left with the “hype" curve) and what we can actually do. This difference, called the "reality gap," is in part why scientists may be reluctant to launch or adopt bold information technology innovations. For instance, an oft-cited reason for staying with a specific platform or application is the cost of learning a new program. Some have argued that this essential difficulty is the root of the apparent productivity paradox in computing, where massive investment in computing technology has often failed to produce significant increases in output or performance.8 A way out of this bind might be broader application of user-centered design philosophies that, in contrast to traditional development approaches, attempt to evolve applications with constant feedback drawn from users in authentic settings.
In summary, the present is a time of fantastic change in the raw capabilities of information and network technologies. However, the impact of these changes is somewhat reduced by the difficulty of effectively harnessing the potential of new technologies. For instance, producing usable software is still more difficult than it should be, and many software applications don't produce benefits to justify the often painful process of learning to use them. A particular challenge for chemists will be finding ways to train the next generation of chemical software designers to more effectively design code, such that broad populations of users can adopt applications quickly and easily—therefore more fully tapping the rich potential of advances in hardware and network systems.
Where Are We Going?
The Collaboratory Concept
One way to think about the future is to seek examples in the present of possible new modes of doing research and teaching. A dream of Internet proponents has been the creation of collaboratories, or centers without walls ". . . in which the nation's researchers can perform their research without regard to geographical location—interacting with colleagues, accessing instrumentation, sharing data and computational resources, [and] accessing information in digital libraries."9 The earliest and most extensive collaboratory R&D project is the Space Physics and Aeronomy Research Collaboratory (SPARC). This project began in late 1992, and by early 1993 produced the world's first operational collaboratory.10
The SPARC project started by addressing the needs of space physicists who used observational data collected from a suite of ground-based instruments at the Sondrestrom Upper Atmospheric Research Facility, located in Greenland. Between 1992 and 1995, SPARC evolved to provide data viewers for five Sondrestrom instruments: (1) the 60-meter incoherent scatter radar, (2) an all-sky camera, (3) a Fabry-Perot interferometer, (4) an imaging riometer, and (5) a local magnetometer. During this period, the primary use of the collaboratory involved real-time access to these instruments either for ionospheric observations or for instrument testing. At this early stage, collaboratory-based science resembled traditional research practices, although mediated by the Internet.
Between 1995 and 1997 SPARC transformed dramatically to accommodate three major changes. First, early success with the collaboratory led to increased interest from scientists and to demands to include more instruments. Second, the rapid emergence and adoption of the Web suggested the importance of a Web-based interface to SPARC. To accommodate this need, the core technology of SPARC was rebuilt in Java. Third, having seen what might be possible with the early SPARC system, users proposed three new types of uses: (1) expansion of the data sources to produce a global "field of view" in real time; (2) inclusion in real time of theoretical model output side by side with observational data; and (3) use of the SPARC technology to support distributed, online workshops or conferences.
Figure 7.4 shows a snapshot of the SPARC interface during a recent campaign. The SPARC interface has three main components. First, the SPARC "session manager," shown in the upper left of Figure 7.4, organizes scientific activity by topic into groups called "rooms."11 Within these rooms, scientists find useful URLs, chat streams specific to that room, and saved configurations for data viewers relevant to that room. Note that each room name is followed by a number in parentheses, which represents the number of scientists currently using that room, and that the names of participants within a selected room are displayed below the session manager. This information provides a crucial form of presence awareness in the virtual setting that would be obtained automatically in a shared physical setting. The chat window, shown in the lower left of Figure 7.4, is a text-based channel for communication among SPARC users. This chat application is persistent, meaning that scientists can join a conversation in progress and scroll back to review earlier comments. Finally, the bulk of the interface
is devoted to data displays. In this case, Figure 7.4 shows time series plots of electron densities against altitude as observed by five incoherent scatter radars spanning the Northern Hemisphere from the Norwegian Arctic to Puerto Rico. An important feature of the data viewers is the presentation of observations from multiple instruments on a common time axis.
While SPARC has had many kinds of impact on the space physics community, two consequences are particularly noteworthy. First, by relaxing constraints of time and place, SPARC makes it possible to carry out collaborative campaigns with more flexibility in scheduling and participation. For example, SPARC makes it much easier to access complementary expertise and to mentor students. In the past, scientists were restricted to expertise available at the remote observatory site. Similarly, students gained the best opportunities to learn about data collection only by traveling to a remote observatory to participate in a campaign. Today, SPARC allows scientists with complementary expertise to work
together, without imposing demanding travel burdens. For instance, Figure 7.5 shows the pattern of communication from a 1995 campaign. In this campaign, a Florida-based space physicist with no incoherent scatter radar experience was guided through a data collection run by a California-based colleague with an extensive background in radar operations (with the added benefit that the Florida physicist could have his students watch as well).
A second notable impact of SPARC is that the collaboratory has accelerated a paradigm shift in the orientation of space physicists to their data. Specifically, upper-atmospheric phenomena reflect a global system in which the atmosphere, the solar wind, and the magnetosphere produce effects over very broad regions. In the past, understanding this system took months of careful integration of multiple data sources. Today, SPARC makes it possible to examine real-time data coordinated on a common time scale from any source on the Internet. The common framework for viewing data means that events at one location are easily correlated with events at other locations. For example, in recent campaigns SPARC has provided simultaneous data from as many as six incoherent scatter radars, from spacecraft, and from unattended instrument arrays across Europe, Asia, and North America. In addition, SPARC provides a mechanism for the simultaneous display of data and model predictions. Traditionally, the substantial computational demands of such models have meant that most of this work was done long after the observational data were collected. Today, improvements in the models and less expensive supercomputing have made it possible to do data/theory evaluation in real time.
Chemistry on the Internet
The emergence of systems like collaboratories represents an opportunity for chemists but also poses a significant number of challenges. Assuming that many of the technical barriers to Internet use disappear with improved computational and network performance, these challenges can be summarized in terms of potential changes to existing practices. Three changes, in particular, demand attention. First, the introduction of collaboratories—specifically, collaboratories designed to provide remote access to scarce instrumentation—may transform traditional ideas about instrument ownership and control. Second, collaboratories represent new channels for communication; however, additional information flow may be undesirable in many areas of chemistry, particularly if the flow is unregulated (e.g., a threat to proprietary content) or unqualified (e.g., claims that haven't been reviewed or validated). Finally, collaboratories create new arenas for learning by expanding opportunities—specially for students—to join in with experienced scientists in the conduct of research projects. However, this new style of participatory education may require teaching and mentoring skills beyond the demands of familiar lecture and lab-style learning.
The Value of Collaboratories
A key component of the collaboratory concept, at least as realized in the SPARC project described above, is the use of media-rich information technologies to link scientists with each other and with instrument facilities, independent of distance and time. This idea is particularly attractive in fields like space physics, which rely on a limited number of observatories and spacecraft, and where the primary data collection mode is passive. By contrast, in a field like chemistry, experiments often involve direct manipulation of compounds by investigators. That is, while analytic instruments may be viewable and controllable at a distance via network interfaces, many kinds of sample preparation require close proximity between the lab bench and instruments. In these cases, collaboratory technology may not be that useful for chemists. However, there may be productive ways to use the Internet to link chemists to papers, results, or data. For instance, collaboratories may become the mechanism for ongoing electronic workshops where chemists can present and discuss findings while drawing on the tools and literature used to conduct the initial research, such as visualizations or analyses.
An important mechanism in an electronic workshop might be tools for presence awareness. That is, in a physical setting we can know who is present, paying attention, and so forth. In a virtual setting, particularly with participants drawn from multiple time zones, there is a need to more explicitly represent who is doing what and when. Visual Who (<http://judith.www.media.mit.edu/Judith/VisualWho/VisualWho.html>), developed at the MIT Media Lab, is one instance of a device for helping people navigate a virtual space.12 In Visual Who the user display indicates who is active (those with names shown in the display) as well as the recency of activity (indicated by color, where red is more recent, and blue is in the past). Another feature of Visual Who is that the display groups people by sub-dimensions, which could correspond to specialty, status, organizational affiliation, and so forth. Such a tool would help scientists identify experts in unfamiliar specialties, as well as help find old colleagues at distant sites.
Free Flow of Information
The major impact of the Internet on chemistry so far is probably the use of electronic mail for scientific communication. As described by one chemist at Michigan, ". . . exciting things are happening. I have now published papers with people that I've never met, and in one case, never even talked with aside from electronic communications. Sort of a virtual person as far as I can tell, although real samples did arrive by real courier." It is not difficult to imagine this kind of process accelerating with the adoption of collaboratories, where remote collaborators might jointly analyze a sample, then share their results via an electronic workshop, and ultimately use collaboration tools for editing publications.
A variant of this process exists today in chemistry in the form of online conferences. For example, in chemical education, the series of Confchem "meetings" have all been conducted online (see <http://www.chem.vt.edu/confchem/1998). In these sessions, papers are published on Web sites, and over a designated period other scientists read the papers and comment on them via chat rooms and e-mail distribution lists. The authors respond to these comments and over the period of the conference, authors and readers engage in computer-mediated dialog. If ventures like electronic workshops and conferences are to succeed, chemists need to solve the problem of chemical mark-up languages. Today, equations can be represented on Web pages as graphical elements, such as bitmaps or GIF images, but these equations have no formal mark-up syntax, which means documents can't be searched for compounds or equations. For example, efforts to base chemical mark-up on standards adopted by the World Wide Web Consortium, such as XML, suggest that in the future chemists will have convenient tools for writing and reading equations, and for searching (see < http://www.xml-cml.org>).
Visible efforts to introduce new technology into the chemistry curriculum include the use of Web pages to present course content and the creation of CD-ROM supplements to textbooks. These innovations have their primary impact on individual learners, and even in this case there is some skepticism. For instance, using a CD-ROM instead of paper is largely a substitution of one medium for another and not a fundamental shift in pedagogical orientation. More exciting is the possibility that the Web, through collaboratories, may open new avenues for participation by a wide variety of students in chemistry research. An illustration of this approach is the Collaboratory for Undergraduate Research and Education experiment conducted at the Environmental Molecular Science Laboratory (EMSL) of the Pacific Northwest National Laboratory.13 In this setup, a class of honors chemistry students at the University of Washington used the EMSL collaboratory facility to use advanced analytic instruments at PNNL and to interact with expert users of these instruments at PNNL (see <http://www.emsl.pnl.gov:2080/docs/collab/projects/CURE/index.html>). Within SPARC, mentioned above, undergraduates used the collaboratory facility to participate "alongside" senior investigators during a combined optical/incoherent scatter radar campaign. As the diagram in Figure 7.5 shows, through the collaboratory students in Florida viewed live data and discussed it with scientists in Northern California, Michigan, and Greenland. For these students, the opportunity to view phenomena as they occurred brought to life material that previously was only the stuff of lectures and textbook explanations.
The Internet offers exciting new opportunities for chemists. The collaboratory concept is just one illustration of how Internet-mediated science may affect the relationship of researchers to instruments and data, of colleagues to each other, and of teachers and advisors to students. While Web-based tools may alter much of the current familiar landscape of practice and pedagogy, it is important to recognize what the Web cannot do. Specifically, simply “surfing" for information is not a replacement for learning. Amidst the temptation to browse endlessly among an ever widening array of online resources, students and researchers must still take time to absorb and reflect on ideas in order to master and understand key concepts.
Thanks to James Finholt, Albert Finholt, Peter Murray-Rust, and James Penner-Hahn for feedback from the chemistry perspective. Thanks to Dan Atkins for his ideas on the reality gap in information technology. And finally, thanks to Stephanie Teasley for helpful comments and suggestions on earlier drafts. Requests for reprints should be addressed to (a) Thomas A. Finholt, Collaboratory for Research on Electronic Work, C-2420, 701 Tappan St., Ann Arbor, MI 48109-1234; or (b) <firstname.lastname@example.org>.