9
Collaboratories: Building Electronic Scientific Communities

Raymond A. Bair

Pacific Northwest National Laboratory

Abstract

High-speed computation now provides the means to examine and simulate systems at unprecedented levels of detail and accuracy. The combination of computation with large-scale databases enables analysis of the prodigious volumes of data coming from today's experiments and simulations. However, when these enabling technologies are coupled with new capabilities in communications, an opportunity is created that can revolutionize not only the scope but also the process of scientific investigation. In physical science research, new distributed computing and communications technologies are being employed that enable researchers to access data, instruments, and expertise independent of their location.

While the term “collaboratory" (or "virtual laboratory") is often used to refer to a set of technologies, perhaps the most significant impact of collaboratories will be the generation of new opportunities to create and sustain active scientific communities. The development and adoption of electronic collaboration capabilities will provide geographically distributed research teams with greater abilities for the organization, close-knit interaction, and rapid response, needed to address increasingly challenging research problems. This paper examines some of the opportunities and challenges presented by scientific collaboratories, and the interplay between emerging collaboration technologies and the research communities they support. Experiences to date point to requirements and success factors for virtual facilities. Examples are drawn from technology development and chemical/materials pilot collaboratory projects of the U.S. Department of Energy.

Introduction

One of the scarce resources in chemical research is time—for scientists, instruments, and com-

NOTE: Pacific Northwest National Laboratory is a multiprogram national laboratory operated by Battelle Memorial Institute for the U.S. Department of Energy under Contract DE-AC06-76RLO 1830.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 125
9 Collaboratories: Building Electronic Scientific Communities Raymond A. Bair Pacific Northwest National Laboratory Abstract High-speed computation now provides the means to examine and simulate systems at unprecedented levels of detail and accuracy. The combination of computation with large-scale databases enables analysis of the prodigious volumes of data coming from today's experiments and simulations. However, when these enabling technologies are coupled with new capabilities in communications, an opportunity is created that can revolutionize not only the scope but also the process of scientific investigation. In physical science research, new distributed computing and communications technologies are being employed that enable researchers to access data, instruments, and expertise independent of their location. While the term “collaboratory" (or "virtual laboratory") is often used to refer to a set of technologies, perhaps the most significant impact of collaboratories will be the generation of new opportunities to create and sustain active scientific communities. The development and adoption of electronic collaboration capabilities will provide geographically distributed research teams with greater abilities for the organization, close-knit interaction, and rapid response, needed to address increasingly challenging research problems. This paper examines some of the opportunities and challenges presented by scientific collaboratories, and the interplay between emerging collaboration technologies and the research communities they support. Experiences to date point to requirements and success factors for virtual facilities. Examples are drawn from technology development and chemical/materials pilot collaboratory projects of the U.S. Department of Energy. Introduction One of the scarce resources in chemical research is time—for scientists, instruments, and com- NOTE: Pacific Northwest National Laboratory is a multiprogram national laboratory operated by Battelle Memorial Institute for the U.S. Department of Energy under Contract DE-AC06-76RLO 1830.

OCR for page 125
puters—to explore and understand complex phenomena and to unlock the principles governing them. Advances in computing and communications systems are having profound impacts on the capabilities that we can bring to bear on research and development problems, providing extraordinary instrument control and data acquisition capabilities, powerful data analysis and visualization capabilities, and simulations capable of ever more detail and scope. This aspect of the computer revolution is rapidly magnifying our science capabilities while reducing the time needed to perform measurements and simulations. However, outside of these areas wholly new impacts of computing advances are emerging, which will dramatically expand our options for using our time to organize and conduct scientific efforts. The term "collaboratory" (colaborate + laboratory) is attributed to William Wulf, who envisioned the potential impact of the information age on science, creating a ". . . 'center without walls,' in which the nation's researchers can perform their research without regard to geographical location—interacting with colleagues, accessing instrumentation, sharing data and computational resources, [and] accessing information in digital libraries."1 Other terms that are used almost interchangeably with collaboratory are "virtual laboratory," “laboratory without walls,” and "collaboratorium." They all encompass the use of information and communication systems to remove barriers of geographic distance and time from research collaborations, not just scientists working remotely, but working together regardless of their location. A major emphasis of collaboratories is natural, informal work processes, going beyond text exchange and presentation metaphors, to in-depth, collaborative work.2 Collaboratories have potential roles in all stages of the scientific process, from the initial planning and organization of a new project idea and project team, to the design of the experiments and development of software, to the execution of those experiments and simulations and their analysis, to the preparation and dissemination of the results. However, one does not simply deploy a collaboratory like a desktop publishing program; one builds a collaboratory with scientists, information, and tools. The collaboratory tools required are varied and challenging to develop, requiting both generic capabilities like video conferencing and screen sharing, and domain-specific capabilities to handle the manipulation and display of data types particular to each type of scientific work. Integration is a major component of collaboratory development, spanning groupware, legacy modeling and analysis applications, instrument software, files, and databases. Because of their unique requirements, collaboratories are often leading-edge examples of knitting together new distributed systems technologies. Collaboratories are an emerging capability that provides new resources for chemical science. This paper provides an overview of collaboratories from the perspective of scientific research, discussing opportunities for collaboratories, examples of the use of collaboratories in chemistry and related disciplines, the kinds of software that are being developed for collaboratories, the impacts that collaboratories are having, and the requirements and prospects for the future. Opportunities for Collaboratories There are a number of arenas that are fertile ground for the development of collaboratories, particularly among scientific user facilities and institutes that provide unique and specialized resources to the scientific community. Although the examples given in this discussion are drawn from the U.S. Department of Energy (DOE) arena, there are many analogous scenarios in our university and industrial 1   National Research Council, Computer Science and Telecommunications Board, National Collaboratories: Applying Information Technologies for Scientific Research, National Academy Press, Washington, D.C., 1993, p. vii. 2   R.T. Kouzes, J.D. Myers, and W.A. Wulf, 'Collaboratories: Doing Science on the Internet," IEEE Computer, 29(8), 40-46, August 1996.

OCR for page 125
communities, as well as other government agencies. DOE builds and supports a wide range of national scientific user facilities, ". . . built with the express purpose of being available for the performance of research by a broad community of qualified users."3 National scientific user facilities make unique research resources available to DOE scientists and researchers from academia, industry and other federal laboratories, and provide opportunities needed to educate and recruit young scientists to meet the demanding challenges of the future. Figure 9.1 shows 19 of the DOE scientific user facilities most often used for chemical, biological, and materials research. They include four synchrotron radiation light sources, five high-flux neutron sources, four electron beam characterization centers, and five other centers, two specializing in DOE missions of environmental science and combustion. Eighteen of these facilities are operated by DOE's Office of Basic Energy Sciences. The Environmental Molecular Sciences Laboratory in Washington State is operated by the DOE Office of Biological and Environmental Research. The Spallation Neutron Source in Tennessee is under construction. Each of these facilities supports research by individual investigators and collaborative teams from across the country and around the world. Today, scientists often travel to a facility to do their work. Many have limited time and resources to travel and must carefully optimize their experiment time, often limiting their benefit from the facility. The facilities themselves are widely dispersed, and increasingly often scientists need to use more than one facility in the course of a complex investigation. Collaboratories can increase the effectiveness and value of user facilities like these in many ways. More can be done before the scientific team arrives on site, e.g., detailed planning of the research campaign, and training for the specific equipment to be used there. While the team is on site, communication is enhanced with colleagues at the home institution and collaborators at other institutions. For example, although a professor may not be able to stay long at the user facility to mentor his/her students, collaboratory capabilities facilitate following the detailed progress of the work remotely, and helping with problems as they arise. After scientists leave the user facility, shared analysis and discovery are enhanced through collaboratory capabilities. Thus, collaboratories enhance the productivity of research. More complex problems can also be taken on, as collaboratories support the assembly of interdisciplinary teams. Expertise can be drawn from many more sources, including industry and smaller colleges. This ability to handle more complex problems can also have an impact on what science is done. Thus, by enhancing collaboration, collaboratories enable new scientific processes and new science. Although there are enough commonalities between collaboratories to speak about them in general, scientific collaboratories are very individual, customized to the style of a research community and to the nature of their research. The essential suite of capabilities needed is particular to the kind of research being performed, as is the relative importance of those capabilities. The processes used to collect and analyze information are also diverse, and the collaboratory needs to reflect that. To date, we've only scratched the surface in building effective chemistry collaboratories; there's still much to learn. The development of a new chemistry collaboratory typically involves adding domain-specific capabilities to a base of collaboratory tools. However, the first step in the process is to learn about how the scientists work, what their information is like, how they need to store and share it, and where the communications and information management problems are that collaboratories might facilitate. Then the computer scientists can team with the chemists to develop and integrate the necessary tools and applications into a working collaboratory. 3   U.S. Department of Energy, "Office of Energy Research Facilities," in Pricing of Departmental Materials and Services, Order DOE 2110.1A, Change 2, May 18, 1992, Chapter III, section 10.

OCR for page 125
Figure 9.1 Distributed scientific facilities and researchers: selected DOE scientific user facilities often used for chemistry, biology, and materials research.

OCR for page 125
Examples of Collaboratories Experience with many groups indicates that each collaboratory seems to have an overall organizing principle, one of a small number of ways that people create and interact with research results. For example, a couple of pioneering collaboratories had quite different approaches. The Worm Community System4 created an extensive centralized, shared data repository about a single biological organism, Caenorhabditis elegans.5 Today's human genome databases are likely to evolve into such collaboratories centered on a community effort to create and understand the human genome. Another very successful early effort, the Upper Atmospheric Research Collaboratory (UARC)6,7 is largely organized around community experimental campaigns, using instruments at a dozen remote experiment sites (including such inhospitable environments as Greenland). UARC continues into the next generation as SPARC.8 More recently, several pilot collaboratories have been set up by DOE projects. Three of them provide good examples of the diversity and impact of collaboratories on the chemical and materials sciences: the Diesel Combustion Collaboratory,9 the Materials Microcharacterization Collaboratory,10 and the Environmental Molecular Sciences Collaboratory.11 The Diesel Combustion Collaboratory (DCC) assists the partners of the long-standing Heavy Duty Diesel Combustion CRADA, a collaborative research and development agreement among DOE researchers and diesel engine manufacturers. It involves four national laboratories: Sandia (SNL), Lawrence Berkeley (LBNL), Lawrence Livermore (LLNL), and Los Alamos (LANL), scientists at the University of Wisconsin, and three companies: Caterpillar, Cummins Engine, and Detroit Diesel. DCC is a part of the DOE2000 Collaboratory Pilot Projects program, and gives scientists capabilities that do not exist at any single location.12 The DCC enhances the flow of information between and among the experimentalists and modelers at the national laboratories and the engine designers at the industrial sites. So, this collaboratory is focused around the information base of results from experiments and modeling runs, providing visualizations that the investigators share. The DCC also provides capabilities for industrial researchers to run chemical and numerical models and simulations remotely on DOE computers. Because proprietary industry research is involved, there is a significant concern about security, which must be addressed by the collaboratory tools. SNL has enabled the collaboratory partners to have a secure encrypted connection, to discuss engine drawings, experimental data, output from a model, or results of a visualization. They may share secure information or applications from a collaborator's computer. DCC also provides a shared workspace via the BSCW product,13 an image library, a data archive, and shared electronic laboratory notebooks. The Materials Microcharacterization Collaboratory (MMC) has a different set of requirements. 4   B.R. Schatz, "Building an Electronic Community System," J. Management & Information Systems, 1992. 5   The current Web site for C. elegans is <http://elegans.swmed.edu/>. 6   C.R. Clauer, D.E. Atkins, et al., "A Prototype Upper Atmospheric Research Collaboratory (UARC)," Visualization Techniques in Space and Atmospheric Science, E.P. Szuszczewicz and J.H. Bredekamp (eds.), pp. 105-112, NASA SP-519, NASA, Washington, D.C., 1995. 7   N. Ross-Flannigan, "The Virtues (and Vices) of Virtual Colleagues," Technology Review, pp. 52-59, March/April 1998. 8   See the Space Physics and Aeronomy Research Collaboratory (SPARC) Web site at <http://www.crew.umich.edu/UARC/>. 9   See the Diesel Combustion Collaboratory Web site at <http://www-collab.ca.sandia.gov/>. 10   See the Materials Microcharacterization Collaboratory Web site at <http://aem005.amc.anl.gov/MMC/>. 11   See the Environmental Molecular Sciences Collaboratory Web site at <http://www.emsl.pnl.gov:2080/does/collab/>. 12   The DOE2000 program is described on the Web site at <http://www.mcs.anl.gov/DOE2000/>. 13   The Basic Support for Cooperative Work (BSCW) product comes from the German National Research Center for Information Technology. See the BSCW Web site at <http://bscw.gmd.de/>.

OCR for page 125
This collaboratory was constructed by researchers at Argonne National Laboratory (ANL), LBNL, the National Institute of Standards and Technology (NIST), Oak Ridge National Laboratory (ORNL), the University of Illinois, and several instrument/computer manufacturers, including Gatan Inc., R.J. Lee Instruments Ltd., EMiSPEC Systems Inc., Philips Electron Optics, NSA—Hitachi Scientific Instruments, JEOL USA Inc., Sun Microsystems Inc., and Graham Technology Solutions Inc. MMC is also a part of the DOE2000 Collaboratory Pilot Projects program.14 Each of the research labs had diverse microscopy capabilities when the project began, and several had developed remote microscopy tools. A major goal of the MMC is to explore and develop a shared electronic virtual environment around a common theme of microscopy and microanalysis, encompassing leading-edge instrumentation and applied to both education and research. MMC has defined a common set of capabilities that are needed for electron microscopy, and is working on data models, graphical user interfaces, and application program interfaces (APIs) for the common architecture. Although essentially all commercial microscopes have computer control, previous generations had some features that were mechanically adjusted, and the instrument control programs were often difficult to drive from another application. MMC has been working with the instrument manufacturers to convey requirements for remote control. Essentially all are now implementing more useful and complete APIs in their products. MMC not only lets one control the microscopes, but also has established tools for sharing analyses among distributed collaborators, and comparing the images from multiple simultaneous experiments at different locations. The results of the work of the MMC are stored in electronic notebooks. The instrument control architecture that MMC employs (Figure 9.2) is very similar to that for other remote collaborative instruments being developed in the chemical sciences. A commercial microscope may have multiple devices that can be controlled through serial interfaces or network interfaces (TCP/IP protocols). A local server accepts commands, validates them (ensuring the instrument is not operated out of its ranges), issues commands to the appropriate instrument actuators and sensors, and collects data from the spectral and image data acquisition systems. The user interface uses Web technologies so that scientists can interact with the instrument through their Web browser. Note that the user interface (client) is separate from the microscopy server, and the microscopy server is different from the instrument computer. This is key to flexible and extensible designs. Of course, all of the client and server processes can (and sometimes do) run on the same computer. The Environmental Molecular Sciences Collaboratory is yet another type of collaboratory. The Environmental Molecular Sciences Laboratory (EMSL) is DOE's newest user facility, located at Pacific Northwest National Laboratory (PNNL). Instead of providing a particular kind of capability, EMSL is a collection of many unique capabilities and expertise for a particular mission, environmental molecular science. The focus is on developing a molecular-level understanding of the physical, chemical, and biological processes that underlie remediation of contaminated soils and groundwater, processing and disposal of stored waste materials, and human health and ecological effects of exposure to pollutants. EMSL has three major facilities: the Molecular Science Computing Facility, the High Field Magnetic Resonance Facility, and the High Field Mass Spectrometry Facility. Other specialized capabilities and facilities provide resources targeting research areas in nanostructural materials synthesis, interfacial structures and compositions, reactions at interfaces, and gas-phase monitoring and detection. Consequently many EMSL projects and collaborations cross disciplines. The preparations for collaboratories at the EMSL began during its construction. EMSL's networks, computer security, shared file systems, and user services were designed to support both individual users 14   The DOE2000 program is described on the Web site at <http://www.mcs.anl.gov/DOE2000/>.

OCR for page 125
Figure 9.2 Instrument control architecture for the telepresence microscopy instruments at Argonne National Laboratory.

OCR for page 125
and teams, internal and external.15 A Scientific Data Management (SDM) system was developed to manage EMSL's 20-terabyte robotic tape archive.16 SDM captures and stores metadata (information about the data) so that needed files can be located easily, even by scientists in other disciplines. Collaboratory tool development began in 1993 and is currently in its third generation, now part of the DOE2000 National Collaboratories program. User support is provided for new collaboratory tools as they are deployed. This is very important, as collaboratory tools operate in a complex environment and scientists are often not accustomed to using them. EMSL's in-house Instrument Development Lab (IDL) provides custom instrument electronics and software development capabilities. Remote operation, "fly by wire," automatic metadata capture, and automatic data archival capabilities are becoming a routine part of IDL instrument designs and upgrades. Together, these capabilities provide the facility computing infrastructure upon which collaboratories can be built. Although many collaboratory projects have begun in EMSL, there are three highly developed examples. The Virtual NMR Facility provides a set of extensions to the generic collaboratory tools for NMR spectroscopy. 17 This collaboratory provides secure remote access to operate EMSL NMRs, employing the commercial console software provided by Varian (VNMR). An NMR Spectroscopist's Notebook adds electronic notebook capabilities to view instrument parameters and three-dimensional molecular structures, and to capture NMR spectra from the spectrometers. In the custom instrument arena, EMSL developed an On Line Radio Frequency Ion Trap Mass Spectrometer.18 All of the primary instrument parameters can be controlled remotely, including sample injection. The instrument server ensures that only reasonable parameters are passed to the spectrometer. Two versions of the software are available, a Visual Basic version that is stand-alone and a Java version that permits shared remote operation as part of the CORE2000 tools, discussed in the next section. Educational uses of collaboratories are also being explored in undergraduate and graduate programs. For example, the RF Ion Trap is regularly used in remote lectures and experiments by undergraduate chemistry classes in the Pacific Northwest. The Collaboratory for Undergraduate Research and Education is a consortium of colleges and universities formed to explore opportunities for collaboratories to have an impact in education through workshops and pilot programs.19 It is already clear that collaboratories will have a substantial impact on curriculum enhancement, faculty development, and undergraduate and graduate research. Tools for Collaboration The major technologies that distinguish a collaboratory from remote computer/instrument access are the collaborative tools that permit scientists to share the scientific process. It's easy to see that tools 15   The capabilities of the EMSL are described on the Web site at <http://www.emsl.pnl.gov/>. 16   D.R. Adams, D.M. Hansen, K.G. Walker, and J.D. Gash, "Scientific Data Archive at the Environmental Molecular Sciences Laboratory," Proceedings of the Sixth Goddard Conference on Mass Storage Systems and Technology, GSFC/CP-1998-206850, pp. 409-417, March 1998; and D.M. Hansen and D.R. Adams. "A Database Approach to Data Archive Management," Proceedings of the First IEEE Metadata Conference, IEEE Computer Society Mass Storage Systems and Technology Technical Committee, IEEE Computer Society Press, Los Alamitos, Calif., 1996. 17   See the Virtual NMR Facility Web site at <http://www.mcs.anl.gov/DOE2000/>. 18   See the On Line Radio Frequency Ion Trap Mass Spectrometer Web site at <http://eol.emsl.pnl.gov/>, and J.M. Price, M.V. Gorshkov, J.A. Mack, and B. Rex, "An Internet Accessible, On-line Ion Trap Mass Spectrometer for Collaborative Research," Proceedings of the 44th ASMS Conference on Mass Spectrometry and Allied Topics, p. 1175, 1996. 19   J. Myers, N. Chonacky, T. Dunning, and E. Leber, "Collaboratories: Bringing National Laboratories into the Undergraduate Classroom and Laboratory via the Internet," Council on Undergraduate Research (CUR) Quarterly, 17(3), 116-120, March 1997.

OCR for page 125
that support real-time interactions will be important in building collaboratories—tools that let scientists carry on the research process when they are geographically apart, with all of the richness of the interactions they can have when they are together in the same room. However, such real-time tools cover only one aspect of scientific work. When we collaborate, we don't always do everything together, yet it is vital that we be able to share the record of what we have been doing (e.g., parameters, data, theories, simulations, analyses), our views on and additions to the contributions of others, and also the record of discussions that have occurred (especially when one or more of the collaborators was not present). Electronic Laboratory Notebooks Traditionally, the primary record of the scientific process has been the ubiquitous laboratory notebook. The lab notebook has existed through the ages—Leonardo da Vinci's notebooks are a great example. However, if one looks at notebooks through the ages, an interesting change can be observed. For centuries, the lab notebook was the complete record of a scientist's work. Everything was within the pages of the notebook: theories, proofs, equipment designs, experimental conditions and results, analyses, and conclusions. Today's notebooks often exclude the raw data because it would take up too much space and it's only read by computers anyway. Instead there are references to external data archives: files, tapes, floppies, etc. Tables, charts, and graphs are produced by computers, printed, and pasted in. Increasingly the lab notebook contains less and less of the full scientific record. On the other hand, computers can easily manage all of those data forms, and more. The concept of an "electronic lab notebook" has been around for a while. However, it has been hard to execute in software. Only in the last couple of years have we begun to have the tools to build an electronic notebook without heroic effort. Web standards provide a ubiquitous interface, and new object languages and distributed object standards support facile interoperability (the ability of different applications to exchange information and work together without custom protocols) and extensibility (the ability to add features to an application without getting into the "guts" of it). Lab notebooks have many roles: Science observations, Design notebook, Instrument log book, Experiment log book, Legal record, Notepad, and Group workspace. Most notebooks play more than one role. Our electronic lab notebook will need to hold lots of different kinds of data—from instruments, simulations, analysis results, and two-dimensional and three-dimensional visualizations. It also needs to be able to store data files in forms that preserve all of the information (or perhaps a link to the original data), as well as abridged summaries of data in the forms of tables, images, charts, etc. There's also information that individual scientists enter, such as text and sketches, and information that comes from group activities such as presentations, conversations, and planning sessions. An electronic notebook can capture these without a lot of effort. For each of these kinds of data, it's crucial to keep some metadata—information about the data, for example, who made it, when, the chemical system under study, and experimental parameters such as temperature, laser fre-

OCR for page 125
quency, or acceleration potentials. This metadata is the key to a real strength of electronic notebooks. We want the notebook software to be able to retrieve anything we need from the notebook without paging through it, and also organize that information into useful forms. Metadata makes that much easier and more accurate than "full text" searching (which can also be done). The collection of metadata should be as automatic as possible, relieving the scientist from the tedium of recording things like instrument parameters. The DOE2000 project is building technologies for just such a notebook, and prototypes are in use today.20 This is a collaboration between three national laboratories: Pacific Northwest National Laboratory (PNNL), Oak Ridge National Laboratory (ORNL), and Lawrence Berkeley National Laboratory (LBNL). Defining shared standards for notebook data has been a challenge, but the DOE2000 notebook has standard APIs for adding editors and viewers to any notebook, and for exporting and importing notebook data. The electronic notebook provides a secure, shared Web-based space, interactive input, and rich media types. It is modular and extensible. The prototypes are exploring additional features ranging from sophisticated querying and searching capabilities, to automated notification of new contents to the collaborators, to mobile, off-network use. Notebooks are also an essential repository of intellectual property. One of the most pressing issues in electronic notebooks is that of legal defensibility. Technically, signing and witnessing a page of an electronic notebook (or an object in the notebook) is not difficult. Authentication and digital signature technologies being developed for banking and commerce can handle the job nicely. However, there are aspects of the legal defensibility of electronic records that have not been tested in court. CENSA, the Collaborative Electronic Notebook Systems Association, is an industrial consortium promoting the development of commercial electronic notebook systems, with a large fraction of its partners from chemical and pharmaceutical companies. 21 CENSA aims to more rapidly advance the state of the art in electronic record keeping in ways suitable to large-scale deployment and preservation of intellectual property. One of CENSA's programs involves dialog with federal agencies and regulators around the issues of legal defensibility. When scientists put information into an electronic notebook, they would certainly like to be able to retrieve it later. That's not a problem on the short time scale, but what about 25 years from now? The issue is not the media. The information technology industry is good at managing the process of migrating data from one media to another, as disk and tape capacities grow. The issue is the format of the data. Will we have the programs to read the data in the future? Achieving a reasonable degree of longevity will require planning. Document storage software providers are going to need to guarantee that their software will continue to read today's formats many years from now. We'll also need methods for document translation, methods that preserve the digital signature and legal defensibility of a document. Market forces should drive document technologies toward a solution. Scientists are not the only ones who need these capabilities, so there's a lot of incentive to solve document problems. However, much of the contents of a notebook is data, so scientists have a responsibility, too. We will need to archive data specifications and/or software applications more scrupulously. Real-Time Interactions Mapping the many ways that scientists interact face to face into the Internet world is very challenging (Figure 9.3). When they talk about science, many different kinds of media come into play: speaking 20   See the DOE2000 Electronic Notebook Project Web site at <http://www.epm.ornl.gov/enote/>. 21   See the Collaborative Electronic Notebook Systems Association (CENSA) Web site at <http://www.censa.org/>.

OCR for page 125
Figure 9.3 Collaborative tools for chemistry research.

OCR for page 125
and gestures, notebooks, notes, sketches on a white board, physical models, and a myriad of computer applications. People may also come and go during the course of a discussion, according to the need for their expertise or as their availability changes. For people to collaborate electronically, all of these kinds of interactions need to be supported, with natural, fluid changes among them. Real-time collaboration tools work in a complex arena of computers and the Internet. They inherently involve many users, at many places, having many platforms, with many tools. The scientific computing environment is also very heterogeneous. Although there are “preferred” platforms (Macintosh, PC, or Unix) in some domains of chemistry, it is still seldom that a set of collaborators all have the same type. Hence, real-time collaboration environments need to support multiple platforms. To make matters even more complex, collaboration tools come from many sources, so each tool has its own user interface for connecting to other users. This is a level of complexity that nobody wants to deal with, especially busy scientists. Therefore one would like a common interface that knows how to start up any tool at the click of an icon. Finding your colleagues and discovering active collaborative sessions should also be simplified through user and session directories. Integrating all of the tools together, knowing who is collaborating, and keeping track of what tools are in use is usually referred to as session management. The session manager used in the EMSL Collaboratory is called CORE2000 (Collaborative Research Environment). It is based on the Habanero framework from the National Center for Supercomputing Applications, NCSA, at the University of Illinois.22 Habanero is written entirely in Java, a portable object-oriented language. Java Machine Code runs on PCs, Macintoshes, and many versions of Unix. However, there are enough quirks that Java code still needs to be tested on each platform. Habanero is designed to make it easy to add new tools, by providing an event-sharing model for an application to tell a remote copy what it is doing, and vice versa. CORE2000 uses several tools from the basic Habanero tool set, including a chat box, white board, voting tool, and molecule viewer. CORE2000 adds other general capabilities, including screen sharing (via the EMSL TeleViewer23 ) and Internet audio- and video-conferencing (via MBone vic and vat,24 and CU-SeeMe25). These have machine-specific code, because there is currently no way to implement them in portable Java, though that is expected to change. The EMSL TeleViewer is a general screen-sharing tool with many applications in collaboratories. TeleViewer lets users identify any window on their screen, or define any rectangle on their screen, and share it with anyone else, anywhere. As the contents of that area change, all of the remote copies are updated. Because only the compressed changes are sent, the network bandwidth required is typically much less than for video. With TeleViewer, other scientists can see exactly what is happening remotely, even if they don't have the same kind of computer. The application(s) being run do not need to be collaborative. One can share a spreadsheet, document, instrument console, etc. This is a powerful tool for activities like mentoring, consulting, support, and shared analysis. In the future, session managers will provide more sophisticated floor control. When just a couple of scientists are working together, it's not too difficult to see when the other person wants to talk or show you something. However, beyond three or four collaborators, additional mechanisms are needed to 22   See the NCSA Habanero Web site at <http://www.ncsa.uiuc.edu/SDG/Software/Habanero/>. 23   P. E. Keller and J.D. Myers, "The EMSL TeleViewer: A Collaborative Shared Computer Display," Proceedings of the Fifth Workshop on Enabling Technologies: Infrastructure for Collaboration Enterprises (WET ICE'96), pp. 16-20, IEEE Computer Society, Los Alamitos, Calif., 1996. 24   See the MBone Web site at <http://www-itg.lbl.gov/mbone/>, and M.R. Macedonia and D.P. Brutzman, "MBone Provides Audio and Video Across the Internet," IEEE Computer, 27(4), 30-36, April 1994. 25   See the CU-SeeMe Web site at <http://cu-seeme.cornell.edu/>.

OCR for page 125
mediate discussions and the use of tools, managing how control is passed around for driving the visualization, controlling the instrument, or marking up the document. There are important software engineering advantages to having a common collaborative tool framework. CORE2000 is a fairly comprehensive set of generic tools for collaboration in science. However, there are specialized tools that need to be constructed to reach "critical mass" in each particular chemistry domain, as described above for the Virtual NMR Facility. Shared, visualization, database access, and instrument control are just a few examples that are usually specific to the particular kind of chemistry one is doing. Common frameworks enable many development groups to contribute to a powerful collaborative toolkit. This approach is necessary to move collaboratories from a cottage industry to broad application. Frameworks available from academic sources include Habanero and Tango, a project from the Northeast Parallel Architectures Center (NPAC) at Syracuse University.26 Below the level of collaboration managers, there is the need for other "middleware" to support the distributed applications for collaboratories. One successful framework of this type is the Product Realization Environment from SNL.27 PRE is a lightweight, Common Object Request Broker Application (CORBA)-based, horizontal integration framework. CORBA is an industry software standard component technology for hardware and language-independent distributed applications. PRE defines how CORBA can be used to connect distributed design tools, databases, files, directory services, and user interfaces. Many collaborative tool developers are moving to CORBA to address interoperability requirements from a standards basis. Impacts of Collaboratories At this point, it's reasonable to ask how effective collaboratories are in getting chemistry work done. One study performed in our laboratory followed many groups and looked in detail at how two groups used the tools and what impacts the collaboratories had on their work. 28 One group involved intelligence analysts, and the other NMR spectroscopists. The NMR project is a typical peer-to-peer collaboration aimed at determining the detailed three-dimensional structure of a segment of a heat shock factor protein. The protein was expressed at LBNL and shipped to PNNL. Researchers at LBNL and PNNL then collaborated on experiments using EMSL's high-field NMRs and shared analysis of the three-dimensional protein structure. For each project, work activities were identified and followed, including experiment planning, experiment setup and monitoring, analysis, and reporting. Feedback was obtained through observation, interviews, discussions, and comments. Across the studied groups, four broad modes of collaboration were observed: Peer-to-peer, where researchers with a common background and vocabulary work closely together; Mentor-student, where knowledge and experience are unequal, e.g., one scientist is helping another scientist or a student to understand a new topic—lecture modes are common; Interdisciplinary, where researchers may share high-level concepts but not a common background, and therefore must translate results into terms each can understand; and Producer-consumer, where the producer provides information to address a need of the consumer, usually without much common knowledge. 26   See the Tango Web site at <http://trurl.npac.syr.edu/tango/>. 27   See the Product Realization Environment (PRE) Web site at <http://daytona.ca.sandia.gov/pre/>. 28   Anne Schur, Kelly A. Keating, Deborah A. Payne, Tom Valdez, Kenneth R. Yates, and James D. Myers, "Collaborative Suites for Experiment-Oriented Scientific Research," ACM Interactions, 3, 4047, May/June 1998.

OCR for page 125
The character of the work varied naturally during the collaborative sessions. As collaborative work progressed, scientists changed their mode of interaction to suit the task at hand, often several times during the same collaboration session. Thus, it is important that collaboratory tools support fluid transitions between modes of collaboration, as well as the many types of media used in science. Some feared that local researchers helping to operate the instruments would be relegated to technicians. However, this did not happen. There were ample opportunities for each of the NMR spectroscopists to contribute to the science problem. Collaboratory tools are designed to manage information to facilitate collaboration. As scientists used the collaboratory over a period of time, they noted a highly desirable shift in the distribution of their effort from data management to analysis. They also benefited from the impromptu and informal forms of interaction that the collaboratory supports. Screen real estate is a valuable commodity. As a team worked together for a while, the center of attention shifted from looking at each other (video conferencing) to concentrating on the data. Often the video was eliminated in favor of devoting more screen space to the data under discussion. Overall, the collaboratory supported non-linear work processes, which increased the productivity of the collaborations. These and other experiences provide a glimpse of the expected impacts of collaboratories on chemical science and technology. Unlike many technological advances, collaboratories affect both the techniques and the processes of science. This makes their impacts difficult to gauge, and easy to underestimate. At a minimum, collaboratories will change where components of research and analysis are done, and how experts are brought into a project. The ability to better share facilities across a company or a scientific community will change the equations governing their feasibility and viability. The collaboratory's ability to marshal facilities, information, and expertise across disciplines and nations will affect how quickly complex problems can be solved, and thereby what important problems are addressed. The roles of a scientist as researcher, mentor, and educator tend to blur in collaboratories. This creates new and exciting opportunities, and some problems. It would certainly be beneficial to expose more students to "real" science and science students to better mentors. However, time is one of the scientist's most precious resources; the nation's expert on NMR pulse sequences cannot field everyone's questions. However, techniques to help understand the new dynamics and strike the balance are in their infancy. Today, the applications of collaboratories are still within "sight" of current practice in research. The initial focus has been on implementing current work processes, at a distance, and making the logical extensions to them. However, as these techniques become more familiar, one would fully expect to move significantly beyond current practice, to new paradigms for scientific work. This will be very exciting. The Promise of Collaboratories Because of the work being done in universities and government labs, the chemical sciences are one of the first domains benefiting from collaboratories. However, there is a great deal to be done to achieve the promise of collaboratories throughout chemistry and the broader scientific community. Our experience is limited, and there are many technical hurdles to overcome. To succeed, chemical science collaboratories need to be developed by multidisciplinary partnerships of chemists and computer scientists.29 Collaboratories are not off-the-shelf products; within chemistry, each domain has particular 29   "National High Field Magnetic Resonance Collaboratorium," a report to the Committee for High Field NMR: A New Millennium Resource, published by the National High Field Magnet Laboratory, Tallahassee, Florida, August 1998.

OCR for page 125
kinds of data and ways of manipulating and analyzing it, creating the need for specific collaboratory capabilities. To meet the needs of these scientists, collaboratory frameworks must be flexible and extensible. In addition to making tools, we must learn to deploy and support collaboratories in the field, and evaluate collaboratory science in action. This requires "research by doing," the research and development projects that create and support pilot collaboratories in the chemical sciences. The nation's major scientific user facilities are a fertile ground for these initial projects, with the potential for performing new science while improving facility access and efficiency for many scientists. Advances in computers and networking, coupled with new developments like the World Wide Web and Java, have fueled collaboratory R&D; however, continued progress is needed to support the widespread development and use of collaboratories in the scientific community. There is much to be learned about the representation and use of shared knowledge. Standards and infrastructure for security and authentication are also important for distributed applications like collaboratories. The frameworks that form the foundation for collaboratories to share data, events, and programs are much more complex than normal Internet tools or client-server applications. The architecture and industry/community standards for these frameworks are still an open research issue. As in the past, the nation's research community has much to contribute to the development of the next generation of Internet standards. The needs of science and business communities differ, and so it will be important for the scientific community to be heard against the background of new Internet standards driven by commerce. Networking developments are also primary enabling factors for collaboratories. To scale up the deployment and use of collaboratories, we will need higher-performance networks, more scalable and capable network standards, and better network management capabilities. Conclusion Overall, collaboratories are an emerging capability that will remove many barriers of distance and time in the sciences. The present confluence of developments in computing, databases, and networking creates a unique opportunity to develop and deploy collaboratories. The impact promises to be great, not only on what science we do, but also for how we accomplish scientific endeavors. There are several important drivers for the development of collaboratories in the chemical sciences. The opportunity to make more progress on a project, the opportunity to employ expertise, data, experiments, or computations that would not otherwise be available, and the opportunity to be first to explore a research question or solve a problem, all represent competitive advantages of collaboratory use. Collaboratories can also affect the complexity and scale of chemical problems that can be considered. Although collaboratories can have value in any size collaboration, collaboratories may be crucial for projects that need large or multidisciplinary research teams. Collaboratories will expand opportunities for timely information exchange between basic and applied R&D efforts. Of course, collaboratories also provide opportunities to manage costs, by optimizing travel, equipment use, and information value. Acknowledgments Many people contributed information and concepts to this paper; however, special thanks are extended to my colleagues lames Myers, Elena Mendoza, Deborah Payne, Kelly Keating, Nestor Zaluzec, Larry Rahn, and Anne Schur. Much of the work described is supported by the DOE2000 program of the Mathematical, Information, and Computational Sciences Division of the Office of Science in the U.S. Department of Energy. A portion of the research was performed at the W.R. Wiley Environmental Molecular Sciences Laboratory, a national scientific user facility sponsored by the

OCR for page 125
Department of Energy's Office of Biological and Environmental Research and located at Pacific Northwest National Laboratory. Discussion Randy Collard, Dow Chemical Company: For industry to use collaboration and extranets effectively, it is critical for us to have security in place and to have flexible security as well as encryption. You spoke about that a little bit with respect to the diesel project. Could you talk a little more about the state of that as you see it and what you see as necessary? Raymond Bair: The state of security is not as good as I would like it to be. I think some of the capabilities that are coming out of what people classify as the next generation of Internet protocols and capabilities will make this a lot easier. A number of places have found reasonable success in point-to-point security by using extant tools like SecureShell and virtual private networks, but the virtual private networks are not trivial to set up and administer, and so it is not a technology that I would advocate, except perhaps for cases where the secure interactions are fairly static, as between one institution and another. William Winter, SUNY-ESF, Syracuse: I have two questions. Many of the applications and instruments that you are using have vendor-controlled software. The first question is, How do you deal with the issue of floating licenses across a network as opposed to just local floating? Raymond Bair: It depends ultimately on what that license says, and so one cannot predict beforehand. In an architecture with the server independent of the instrument, usually you are interfacing with proprietary instrument software through a meta language that is available for the instrument through serial ports. In that case you are not actually running that instrument's software. But in terms of sharing the screen of the instrument elsewhere, for example, by using a remote X Windows display on an instrument, I am not personally familiar with whether there is a legal intellectual property issue with that. It is a common enough practice, but the terms would, once again, depend on the particular license. William Winter: My other question is a bit more philosophical. Yesterday the comment was made that although industry is very much involved with multidisciplinary and team approaches, realistically the Ph.D. is going to remain an individual effort. Do you really think that has to be true? Can we talk about having multidisciplinary, integrated, team Ph.D.s where it still is compartmentalized enough that somebody can take credit—"I did this"—as an individual? Raymond Bair: I think so. I see multidisciplinary teams forming around a number of the environmental research areas where experiments in any one domain aren't going to be sufficient to address the issue at hand and where collections of Ph.D.s have become engaged in addressing a larger and more complex issue by working together in their respective disciplines and sharing information. Given the kinds of problems we are trying to solve, there is definitely encouragement from the funding agencies in the kinds of proposals being solicited in a number of these areas that almost virtually requires working together, and so we will see examples of this happening.