Sam Kounaves, Tufts University: I have two questions. First, a comment directed to the people sponsoring this conference. Since it is on computers I think it would be interesting, if it is possible, for the participants to make some of their slides available on a Web site for the rest of us to use. Most of us are going to go back to our departments and to our institutions, and some of us would like to present this information to a wider audience. Having a simple PowerPoint summary, credited appropriately, that we could integrate into our own talks would be very useful for distributing this information more widely.
My question deals with an issue that is going to arise again in this afternoon's talk, that is, archiving. When I did my Ph.D. thesis years ago, I did it on an Apple II and I even kept some of my lab notebook on an Apple H computer. When we wanted to go back to it one day to get some information, it was practically impossible. I had to find an Apple II computer, plug into it, and try to get the information out. Several of our speakers this morning, and I guess all of them in some ways, implied that there was software that they had been using to archive the information, Lab Notebook, Lotus Notes, etc. I am still wondering how this is going to work out in the future.
I know one way my thesis can still be available, for example, is through an institution that is dedicated to archiving. University of Michigan Microfilms, for example, is, in theory, still archiving that information and will eventually switch over to a digital format, and it will still be available. What are your thoughts on archiving information? How can we go about doing this so that the lab notebooks are useful years from now? Are there any ways that you can see this happening, or do you have any thoughts on this process?
Raymond Bair: There are a couple of approaches that people are taking, largely along the lines you might expect. One approach is to require the makers of the notebook or document management system to provide a certain degree of compatibility with future formats, and future versions of their product. This kind of requirement is coming out of some of the commercial electronic notebook efforts. This is going to be mandated by the people that are buying these notebooks, the large companies.
However, that doesn't solve all of the problems; it just addresses the document management
company's formats. There are some interesting challenges ahead. For example, what if you stuck another kind of document into this commercial notebook or document management system? They are not really responsible for all of the different kinds of documents and data you might use. There's also an issue of progressive conversions, for keeping file formats up to date as time passes. That brings in issues of fidelity—if we convert files, how can we assure ourselves that the converted objects are still correct? There are challenges with electronic signature systems, too, since you have signed the original binary file, which has not been translated. So, what does it mean legally when you convert that file in the future? How do you retain that authentication that you had in the past? So, in addition to the format issues there are also issues of authentication.
By the way, my slides are on the Web at <http://www.emsl.pnl.gov:2080/docs/collab/presentations/ppt/csr/>.
Bridget Carragher: I think this is a problem. You cannot even read a Microsoft document that is one version behind the version on your desktop. If you cannot do that with Microsoft—which is probably the most ubiquitous software around—we are in a lot of trouble. But I think we are not the only ones facing this problem, and I think again, the scientific community isn't going to drive this problem. This is a huge problem for the world as the world moves onto the Web, and I think there are going to become tools that do automated updating. But where is your thesis now? Probably a printout somewhere is your real evidence that you wrote it, and I think that in part will continue to be the case.
Clint Potter: In a sense I think you also have to throw stuff away because you cannot keep everything. Perhaps you don't have the original data for your thesis anymore. So, you have got to be smart about what you save and think about what the things you save are—the things that go into libraries or university microfilm services.
Bridget Carragher: You publish things that you want to keep. The publishing record is partly what is there.
David McLaughlin: I think you should try to save the information in a format that you can easily move forward. The more open the format, the better. If you store your Word documents in Rich Text Format, that may give you more forward viability than a binary Word document. We often store spectral data in an ASCII format that is easily readable. It takes more space to store it in ASCII, but we know we can move it forward. An important part of our plan is to convert the format of our information as future versions of software may require.
David Smith, DuPont: We said several times during the course of the meeting that software is important, and perhaps just as a sanity check for myself, I would like to ask Susan about the component-based, object-oriented paradigm for software development. From your viewpoint as a computer scientist, is this really a viable approach for the future development of software or is it just a fad that is going to disappear in the next 5 years?
Susan Graham: I think it is more than a fad, and the reason is that structure and structuring are very, very important, and this provides a structuring mechanism. Object-oriented programming is just a structuring mechanism, and if you get the structure wrong then it is going to be just as bad as any other disorganization. But I think it is an approach that is only now possible because it requires more heavy-duty computing and in particular image sizes. Storage is used much more, and so the ideas are actually
quite old. The ideas of object-oriented programming are from the 1960s, and the technology has now caught up to the point where it is viable.
So, I think it is going to evolve, but I actually think it is a step forward.
David Smith: In the presentations we have heard the word "complexity" used quite often, and yet I haven't heard anyone mention, for example, the work on complexity that is going on in the Santa Fe Institute, such as concepts like autonomous intelligent agents in the software field. Does the panel believe that they will have any real impact on the class of problems that confront us?
Susan Graham: Clearly they have impact. There are opportunities there, and there are risks, and the risk with autonomous agents is that you no longer have control, and so, with the best of intentions the agent may be getting in the way, particularly if it is imperfect.
Those are all ways of managing intellectual complexity, and that is one of our biggest limiting factors—that it is hard for a person to get his head around everything that is going on, and the more some of those issues can be compartmentalized, the better off we are going to be.
Raymond Bair: There is no question that agents are going to have value in doing a number of useful things. However, Tom Finholt's hype curve comes to mind (from his talk last night). There is a considerable gap between the reality of what can be done now with intelligent agents, and some of the talk about them.
Thomas Finholt: The digital library projects are a good illustration of the gap between reality and science fiction, if you will, with regard to intelligent agents. Particularly in the Michigan digital library projects, the strategy has been to use an intelligent agent architecture for organizing bodies of information. I don't do that work, but I have followed those projects closely. Today, there is a huge gap, in my opinion, between the prototype applications that have been demonstrated and systems that will stand up to the rigors of everyday use (i.e., operational production systems). I think we can say that with respect to intelligent agents, we may be where we were with object-oriented coding in the 1960s and 1970s. That is, the software architectures are not there yet to truly implement the idea, but the further development of intelligent agents is definitely something to monitor for the future.
Stephen Heller, National Institute of Standards and Technology: Just a couple of comments about archiving, which I think is more a red herring than anything else. In fact there is no obvious ultimate solution, because of the changes in technology, so I would like to ask a question of the panel. How many of you actually have a real printout on a piece of paper of your bank statement or actually have physical stock certificates as opposed to all this stuff being stored somewhere electronically?
My feeling is that between the stock market and the bank accounts in the world, most people have fairly significant concerns about their resources, and concern about some of these scientific lab notebooks probably pales in comparison to the amount of concern people would have with problems with those financial resources.
I don't think people walk into their banks and ask for proof that their bank accounts are being properly archived, and their dividends and stock certificates are properly recorded.
So, it is a questionable issue to bring up at this point and for the foreseeable future, I think.
Susan Graham: I have a question for the people who were talking about electronic notebooks. One of the purposes of an electronic notebook is to have a historical record that is used, among other things, for
establishing priority and for integrity concerns in science. Once the record is electronic, what are the safeguards that you are using to make sure that you have the benefit of binding and the benefit of the fact that you chemists can analyze the page to see whether it has been altered and things like that?
Raymond Bair: I am not quite clear on what you mean by binding.
Susan Graham: Traditionally the notion was you didn't use a loose-leaf notebook; you used one with a binding so that you knew the order in which the record had been kept.
Raymond Bair: The approach that we have taken in our notebook conforms to the traditional model. If you would like to remove an item in your notebook from view you may delete it, but it doesn't go away. It becomes an icon, and it says, "Deleted," but you can retrieve what was there. There is a genuine need to be able to mark out stuff that was wrong, for example, so you do not get confused in future searches. However, that doesn't solve all the problems scientists have. There is a genuine need for something we haven't fully developed a concept for, a scratch pad of sorts: temporary information that exists for some intermediate time before it is canonized in a notebook. People are still working on concepts like this.
Participant: How do you prevent altering of the notebook?
Raymond Bair: You prevent alteration with the same kinds of technologies that electronic commerce is adopting to prove that you are an owner of a transaction. You can compute a hash code of an object of any size, and use public/private key technology to validate that the document has not been changed. This is your digital signature. You can also use a trusted time authority, along with your document hash and public/private keys, to establish an unchangeable date for the signature.
Susan Graham: But my question was actually prompted by something David said in which he explained how beneficial it was to have links. If you have links and particularly if you have URLs, then how do you know that the document you are referencing hasn't been changed?
Raymond Bair: If you are really going to have this for a record, for example, to determine priority, you cannot put a link to something that is temporary in the notebook.
David McLaughlin: Before devising a solution to this problem, I think you must give some consideration to the amount of effort it will take versus the need to prove the case you propose. For example, I have heard of cases where scientists have published fabricated results. From a scientific perspective, results are not considered valid until somebody else has repeated them.
Patents are used to protect intellectual property. With the exception of the United States, the critical date is when a patent application is filed, not the date the discovery was made.
In the United States, the date of invention is often established using laboratory notebooks. The primary requirement is that the pages be dated, signed, and witnessed. It is fine to keep the notebook pages in a loose-leaf binder. One pragmatic way to deal with the legal issues of electronic laboratory notebooks is to print out each page, including all the links, sign it, date it, witness it, and put it in a binder. Most of the lawyers I have spoken with believe that this is not really necessary. They believe that the Patent Office would accept an electronic lab notebook when a log is kept of every modification that is made. The logs can be written to optical disks with a date and time stamp and, if warranted, a digital signature.
If every change you make to your notebook is written into a log that you do not have access to, then it becomes very hard to fake entries. Deception would probably require a conspiracy, more than one person. I see no need to make an electronic notebook any more tamperproof than a paper system. Restricting unauthorized access to the information is of greater concern.
Stanley Sandler, University of Delaware: I am concerned about remotely operated or Web-operated equipment, because we had problems in our department with, a word I haven't heard here yet, a hacker.
There is great potential for a hacker to unknowingly cause equipment damage or real safety problems. No matter what degree of security we have, there is always a hacker that is going to be able to get through. How does one protect oneself and one's equipment?
Bridget Carragher: You cannot. You can do the best you can, and again, adopt all the tools that are available. We password-protect our instruments and we take various precautions like that, but in truth you cannot guarantee protection. But most of the interfaces we can build using Web browsers are pretty fail-safe. We have had kindergartners using these instruments, and they do not obey the rules. They bang on all the buttons and hit everything at once, and maybe the high schoolers are even worse. They treat an instrument as a video game. So, you can protect your instrument by your user interface and disallow things that would be dangerous. I think that is the more important thing—to build in those safeguards in the software.
Clint Potter: I think you could take the essentially same steps taken in the security world for workstations and computers. As that technology gets better, it can be incorporated into remote instrument technology. I don't think we should be inventing new security mechanisms.
Bridget Carragher: No, we will take advantage of whatever is out there, but we have 100 machines in our facility. They get hacked into all the time. There is nothing we can do about that except deal with the problem when it comes up.
John Pfeiffer, Air Products and Chemicals, Inc.: Let us take that question one step further if we can. One of the things that you at Illinois and at Kodak are doing is making very sophisticated tools and very sophisticated methodologies accessible to knowledgeable but maybe not expert users. So, a risk is that the knowledgeable user will abuse the capability unknowingly. How do you reflect on that? How do you, if you will, put some bounds on that as you provide these tools via Web interfaces or whatever easy-to-use interface?
Bridget Carragher: You mean they can gather data and misinterpret it?
John Pfeiffer: Exactly. One can complete a statistical analysis that is invalid, extending beyond the assumptions built into the technique, and then the scientists may draw incorrect conclusions.
Bridget Carragher: You can do that right now. You can sit in front of an electron microscope and twirl those knobs and get it completely wrong and yet believe the data you are getting. I don't think that is any different whether you put an intelligent or unintelligent user interface in the front of it. We have had this argument many times in my community—you know, that if you make it too easy to use, everybody will come along and misuse it. I think that is not so. I think that if you make it easy to use, you can help people understand what they are doing. You can make it much easier to repeat things using different
parameters. You know, if you are sitting in front of that instrument, and it takes you 3 days to get the data, you are much more inclined to believe the results and not try to repeat them than if you can just say, "Oh, I will just rerun this experiment and check it again with three different parameters."
So, I absolutely don't agree that making things easy to use necessarily lets them be more abused. I think you can abuse data any way that you gather it.
Clint Potter: I think the same thing is shown in molecular simulation packages in the sense that you don't have to write your own code anymore, and you don't have to understand the exact details of all the algorithms, but people are using these things, I guess. I don't know anything about chemistry.
Raymond Bair: Also, I think that distance from the user of the instrument doesn't absolve you from doing some training with that person. You're trying to accomplish the same kinds of things electronically that you would do if you had that person visiting in your lab. The training requirement doesn't go away just because the instrument is remote.
Bridget Carragher: But we are not paternalistic about it. If people want to use the instrument, they should use the instrument. It is ultimately the scientist's responsibility, just as it is now. I don't think we know how remote access changes the way data are gathered except to maybe make it easier.
David McLaughlin: From a walk-up lab perspective, the question could be stated as, Is it appropriate to take an expert analytical chemist out of the loop given that then the end scientist could misinterpret the data? I believe that scientists have a vested interest in not misinterpreting the data that they obtain, because proper interpretation helps them continue with their work and meet their goals. One practice that we follow is to place the walk-up facilities right next to the analytical experts who are working on the more difficult problems. There is always someone generally available during regular working hours to answer questions. We also require training before anybody can use the instruments and offer training classes on how to interpret the data, usually once a year. While the training helps meet some of that need, I think the solution still is having an expert available and approachable. This approach is similar to the examples of collaboratories discussed here that allow you to send e-mail and establish working sessions with an expert. In our case, the expert is physically nearby, making it very easy for a person to ask a question. That is how we try to avoid misinterpretation of data.
William Winter, SUNY-ESF, Syracuse: I wish it was that simple, but I don't really believe that it is. I remember that when the first PC versions of things like MM2 came out, an organic chemist in my department came running up to me with a picture he had drawn on his PC plotter clearly showing a planar cis-peptide linkage that he had obtained and claiming this must be right. Actually, it was an N-acetyl glucose linkage but it was the same idea, and it should have been trans. My colleague's conclusion was that because it came from a program from a respected person the result had to be right, and that was the end of it as far as he was concerned. Similarly, we have to do something to make people question these things and not think that just because it comes from an instrument on the Web it is right.
David McLaughlin: Really, I think that some people will always believe that if the computer tells you it is so, it must be so. For these people, I agree that the problem is very much an educational issue. It is also an issue of the quality of the software or instrumentation and its use. We attempt to make all of the techniques in the walkup lab environment quite robust.
Bridget Carragher: But not kindergartners. They do not believe anymore. They live on the Web and they don't trust any of it.
Clint Potter: I guess an issue is people using software now that they would never have been able to use before because it wasn't on the PC, and that they know about these mistakes and have learned about these mistakes, so maybe it is just an education issue.
Bridget Carragher: It is an education issue, yes.