Challenges of Web Publication
The challenges facing chemists and chemical engineers in publishing their work on the Internet were discussed. Preprint servers, authors’ rights, distribution, refereeing, search engines, and the amount of material on the web were all topics of this day and a half workshop. Some of the challenges that were mentioned were an increased burden on the reader to find relevant information, the need for special software for some of the enhanced features, adjustments in the publishing processes, the need for a system to tally web hits, a new technology for creating and storing structures, increased investments to meet expectations, and the demands for more rapid and enhanced publishing. A number of existing electronic platforms were discussed.
Searching and search engines are ways to find scientific information, but they do not guarantee results. “How do we find relevant information?” Robert Bovenschulte asked. The commercial exploitation of science is a very important factor in chemistry and determines the way ACS approaches the functions it provides. A chemist could, for example, be seeking the latest research on a topic or be exploring developments in an emerging area. Patent attorneys, on the other hand, might need the complete historical coverage of specific reactions and processes.
Bovenschulte discussed a number of web-based search engines and databases that provide content and the tools to access, analyze, and manage research information. Scientists often use Google and Yahoo for first-approximation searches, Bovenschulte said, but these engines deliver too many hits, and not enough specific information. In contrast, Web of Knowledge from Thompson ISI is a much more complete and broad-coverage database with very effective search tools, Bovenschulte added. Chemical Abstracts Service has very broad coverage in chemistry and an interface with many aspects of what has not traditionally been regarded as chemistry. The CAS databases are nearly complete, but they are not perfect according to Bovenschulte. Although scientists have many research tools, using one may not always be easy. “To use it often requires some expert knowledge—or, let’s say, to use it expertly requires a great deal of knowledge,” Bovenschulte said. STN and SciFinder are other research tools that are available to the scientist. Non-expert searchers more easily use SciFinder, Bovenschulte said. Another ACS product such as STN, by contrast, is much more difficult to use, and one needs some technical training, but it has access to more databases and is favored by those who are doing patent searches, he said.
As search functions become more sophisticated, there is the potential to facilitate the quest for digital information, according to Bovenschulte. Interesting functions such as clustering and taxonomies are being developed and expanded. Search clustering locates articles and can lead to new discoveries, he said. Article linking allows easy access to the full text of cited references. Citation maps provide a visualization of those articles that an individual may be citing, and topic maps provide a way to browse through hierarchies of subjects, Bovenschulte explained.
Advanced approaches to searching are on the horizon—to give some examples: improved filtering of search results based on user profile or user history, automated analysis of document collection, and visual and graphical presentation of search results. Many of these approaches are still very experimental and not widely used because they may require some computer or network reconfiguration. This may act as a barrier because the reader has to decide to invest time and energy, and perhaps some money, Bovenschulte said.
Nonetheless, most of the participants were optimistic about the future of electronic search engines. Stephen Berry
said he was very optimistic about the capabilities of enhanced electronic search tools, provided students are trained to use them properly. Wider use of improved search engines can make the publication of scientific papers much more scholarly, because it would be possible to go back and “find who really had the first idea 40 years ago or 60 years ago, not just who published two years ago.” Berry pointed to possible future problems due to a changing technical vocabulary, but these are solvable challenges, he said. Different access rules for different journals, variable distribution rules for different kinds of files and the pricing of simultaneous access are other problems the scientific community will face, Berry said.
One of the valuable features in the electronic search process is citation linking. Citation linking is highly desirable for finding older literature, according to Andrea Twiss-Brooks. For engineers, linking should probably not only connect journal articles, but also interweave other literature, such as technical reports and “gray literature.” This information is not easy to find either in print or in the electronic world, making the task all the more challenging. Standards for consistent and reliable linking are a major concern. “There is nothing worse than saying that it is out there, here is the link, they click on it, and it doesn’t get them to the article,” she said. Twiss-Brooks added that there is a need for additional tools, such as a graphical table of contents, to help sort out important information. Table-of-contents services for large journals are not particularly useful, since it does not save time to wade through multiple screens of such tables of contents to find the information being sought.
Structure searching is one of the challenges of chemical e-publishing. “A special problem in chemistry is extracting the science from an article,” Bovenschulte said. Although it is possible to incorporate structure searching of articles, it is not easy. Extracting structure information from the submission in digital form is possible. It then can be stored in a standard format and rendered by a viewer. Problems can arise, however, because chemist authors often submit a nonstandard view highlighting important features, not realizing that the chemical integrity has unwittingly been sacrificed. Yet if only the chemical information is stored, there is no guarantee that the reader will see exactly what the author intended, and furthermore the reader’s viewing software may render information different from the author’s original idea. It may therefore be advisable to store a view of the data, as well as the data themselves, Bovenschulte said.
TALLYING WEB HITS
Many participants commented on the need for a tally of web hits or downloads of articles. The two methods that were mentioned were hits per article and COUNTER compliancy. Gordon Hammes said hits per article, which is easily tabulated by all web systems, is the method that will work, but there may be alternative technologies. Patrick Jackson introduced a new initiative called the COUNTER compliancy model that measures the way downloads are tallied and sets certain trade standards. It is essential that downloads are being measured in the same way, he said. Jackson cautioned against tabulating hits alone. COUNTER compliancy levels the playing field, Bovenschulte said. He also encouraged all librarians to begin cost-per-use studies at each site, to determine whether a hit or a download occurred.
The evolution of electronic journals might spawn new ways of drawing and storing data. At the moment, structures are designed to fit within the standard print environment, according to Carol Carr, managing editor of Organic Letters. If there were a way of drawing structures so that “a dendrimer can be a dendrimer, and not have to fit into a one-column space, that would help,” Carr said. Carr also said that authors often add color and boxes to their structures, much of which is lost in structure searching. Hammes noted that standard electronic tools in the hands of the authors could simplify the layout process for scientists themselves, giving them control over it, and further contribute to reducing editing and publishing costs.
As the web is making journals more accessible to readers, Christopher Reed pointed out that it is also making editors’ lives more difficult. On-line submission and other features have made handing in a manuscript a click of a mouse away. Consequently, editors are receiving more manuscripts each year, and are struggling to find appropriate reviewers, Bovenschulte added. Not only are more manuscripts handed in, they often contain more data—crystallographic data or protein database material—so reviewers have to review not only the manuscript, but also the digital interactive images and digital interactive data, he said. This means editors will have to process more manuscripts with more detailed reviews. “One has to worry—and I think it is a very serious worry—that this whole system could collapse of its own weight,” Bovenschulte said.
A solution to the growing number of e-submissions and extra data could be automation. The burden could be reduced through automated methods, Bovenschulte said. These would check whether the structure and data files are valid, whether they correspond. There might also be a need to collaborate with other organizations that have greater subject expertise—for example, Cambridge Crystallographic or the National Institute of Standards and Technology (NIST), Bovenschulte said. These connections could be financed through shrinking layout costs. If the print version of a text was to be abandoned and, with it, the need to control page layout carefully, some of the costs associated with publishing would disappear, Bovenschulte said. He said he hopes
that there will be continued movement toward a truly seamless, highly integrated electronic publishing process, from the author’s submission all the way to the output, whether in print or on the web.
Giving up print, however, comes with its own set of problems. Martin Blume discussed how some subscribers are concerned that an electronic-only system without an optional print subscription will have nothing to show for it. Being limited to electronic access can make it difficult to keep track of the issues, whereas “If you take print, you always have print,” he said. This statement relates to the discussion on access to back issues—journal archives. The American Physical Society (APS) will still arrange for print distribution, but not print, Blume pointed out. Readers will be able to self-print, by using a version of DocuTech, where a file is downloaded, printed, and stored. He added that APS offers a CD collection at the end of the year. Conversion to electronic format for back-files is important because chemists draw substantially on the back-file literature.
Perpetual access will be a question for libraries, according to Twiss-Brooks—especially since the use of e-journals is rapidly increasing. According to a study1 Twiss-Brooks referred to, “It is not so much a migration from print to electronic as it is a stampede.” She also referred to a 1999 study,2 where it was estimated that one-quarter to one-third of readings came from electronic sources. Not only do current publications have to switch to electronic methods, but there also seems to be a growing need for electronic data repositories. Reed suggested that ACS establish repositories for important data that “need to be added to the storehouse of knowledge, but are not conceptually novel enough or important enough to make a whole piece of paper out of and go through all that process of using the reviewers and everything.” This would be an electronic-only repository and therefore cheaper. It would also take away the “bread and butter” of low-quality, high-cost journals, which Reed said he thought, are ruining and exploiting university budgets. This process could also use professional referees, perhaps employees of the professional societies, not practicing scientists. There is no need he said, other than to establish that the work is well done—a peer-review decision on the importance and significance of the work. Several participants agreed with this vision.
However, there was doubt that electronic-only publishing could absolutely guarantee access in the future. Steven Heller, National Institute of Standards and Technology, questioned whether Chemical Abstracts had a guarantee of continuing in the future, considering recent advances in technology. “There were a number of abstracting and indexing operations … in Germany and England that have disappeared,” he said.
Some participants compared the lack of electronic archives in the chemical community to the wealth of preprint archives in the physics world. Berry noted that unrefereed archives started in physics when high-energy physicists circulated preprints, unrefereed for discussion. Berry mentioned how Paul Ginsparg, professor of physics at Cornell University, set up his first preprint archive because it seemed cheaper to do things electronically than to continue to have one full-time secretary in each institution and a very large budget for photo-copying. “The circulation of the preprints went to the 400 and if you weren’t a member of the 400, you didn’t get one,” Berry said. Ginsparg was for democratic circulation, Berry continued. Berry said he thought biology was very conservative with respect to circulating unrefereed articles; whereas physics and astronomy are more open, with chemistry occupying a middle ground—a statement with which a number of participants agreed. Biologists are very concerned that unrefereed articles would be dangerous if the public accessed and used them to try to cure their own diseases, Berry said.
Chemical preprint servers have existed, and some participants pointed out intellectual property and other problems that may arise with the use of preprint servers. Philip Barnett said that there was a small preprint on ChemWeb sponsored by Elsevier, with fewer than a thousand papers, which subsequently died. He thought one reason for this was that some publishers like the ACS do not accept papers for publication that have been in a preprint server. Ned Heindel said he had a citation from an abstract in a regional meeting cited against him as prior art for a patent application. He said this could mean that a preprint is citable as prior art.
There may be additional reasons that other disciplines have been quicker to give open access to their publications. Jeremy Berg pointed to the public as the big driver for NIH policy. He said that many people get health information initially over the Internet and then end up at published articles on research that is often paid for by NIH. When they cannot get access to it, they complain to their representative in Congress, Berg said.
EXISTING ELECTRONIC PLATFORMS
Several participants discussed existing electronic platforms. Jackson talked about the Elsevier platform ScienceDirect. He said it is an easy, stable, intuitive inter-
face to increase the speed and efficiency of searching. It offers about 24 percent of the world’s scientific, technical, and medical (STM) literature on a single platform, and around 30 percent of the world’s chemistry literature. Also, Elsevier has pioneered a concept called the “author gateway” that lets authors keep track of their papers from submission to publication. “This actually adds up to what we could call an end-to-end electronic workflow,” Jackson said. Not only are authors great submitters in electronic form, but also great users of electronic material, he said.
Bovenschulte cited ACS “ASAP articles”—as soon as publishable—which are generally mounted within 24 to 72 hours after the author has submitted final corrections. However, these efforts require investments on the part of publishers in the whole information technology (IT) infrastructure. The rising number of submissions from around the world has also driven up publication costs, he said.
Michael Keller, Stanford University Press, talked about HighWire Press. HighWire offers HTML and PDF formats, multiple resolutions of images and figures, and easy downloads to citation managers. Many publishers who work with HighWire put up manuscript PDFs on acceptance, and as the articles are edited, they go in to the mainstream. He said faculty members are beginning to use images on HighWire instead of other images; teachers add the URLs for the objects to their course syllabi. HighWire has begun to link articles. For those that are not on-line, a link to a document delivery service is provided. Keller showed some of HighWire’s features, like the topic map. The topic map is a graphical navigation device that allows users to move around the 1.8 million full-text articles supported by HighWire and about 15 million articles abstracted in MEDLINE. Keller demonstrated a search: One could choose the search term “genetics” and then sequentially “molecular genetics,” “gene regulation,” “transcription,” “gene expression,” and finally “gene networks.” A click will pull up a list of relevant articles from MEDLINE and HighWire. A tool called MatchMaker can show the principal ideas, the principal taxonomic terms in the article that created the article’s signature. These are the notions that are most important in an article. By clicking on a term—for example, “physiology homeostasis”—the value of that term in the signature is changed, so that a new search can be done for articles with the reweighted term. Keller said there were various limiters—by time, by date, and so forth. Articles in HighWire can be indexed by more than 54,000 terms, which makes it possible to search by idea, rather than just by keyword. He also discussed citation maps and alerts.