National Academies Press: OpenBook

Proceedings of the International Conference on Scientific Information: Two Volumes (1959)

Chapter: Information Handling in a Large Information System

« Previous: A Proposed Information Handling System for a Large Research Organization
Page 1203 Cite
Suggested Citation:"Information Handling in a Large Information System." National Research Council. 1959. Proceedings of the International Conference on Scientific Information: Two Volumes. Washington, DC: The National Academies Press. doi: 10.17226/10866.
×

Information Handling in a Large Information System

P.R.P.CLARIDGE

The work described1 in this paper was undertaken in response to a need felt at the Low Temperature Research Station for a collection of information on chemical compounds found in edible plants. Enquiry showed that there would be widespread interest in a collection of this type. That the need was generally felt was confirmed by a reference made to the lack of collected information by the Development Commission (a).2 The last comprehensive publications of this type (b), (c) were published about 25 years ago, and many of the later publications contain little information of this kind.

Within the last few years, newer methods of analysis, such as chromatography, have facilitated the separation and identification of individual chemical compounds. In consequence, the volume of data being published is showing a rapid increase. In 1955 for example, about 2000 papers reporting definite chemical compounds (as distinct from more indefinite substances such as starch) in flowering plants were abstracted in Chemical Abstracts and Biological Abstracts.

Some of the papers [e.g., (d)] contain data for a number of plants. On the basis of this sample it was estimated that there would be of the order of 7 million entries (one chemical compound in one plant) in a collection of all the published data on the subject and that the annual rate of addition would exceed 100,000. If the definition of “plant” was to be widened to cover the whole vegetable kingdom and to include the lower plants (e.g., bacteria, yeasts, fungi), the collection would be even much larger than this. It was considered desirable to organize the collection so that this extension would be possible.

The collection was required not only to provide information on named

P.R.P.CLARIDGE Low Temperature Station for Research in Biochemistry and Biophysics, Department of Scientific and Industrial Research, University of Cambridge, Cambridge, England. Crown Copyright Reserved 18th March 1958

1  

The work described in this paper was carried out as part of the programme of the Low Temperature Research Station of the Department of Scientific and Industrial Research.

2  

Lower-case letters in parentheses are references.

Page 1204 Cite
Suggested Citation:"Information Handling in a Large Information System." National Research Council. 1959. Proceedings of the International Conference on Scientific Information: Two Volumes. Washington, DC: The National Academies Press. doi: 10.17226/10866.
×

compounds in named plants, but also for various generic searches, such as for all compounds having specified groupings in common, and for partial specifications. Correlations between factors such as that between chemical composition and palatability were important. For this purpose the type of indexing to be used would have to be very detailed and constructed in such a way that the maximum amount of information could be displayed on the relations between the entries. A list was made of the various headings under which it was thought information should be indexed, and this was tested on a sample of approximately 2000 entries. The headings were revised (Fig. 1) on the basis of this trial. Subsequent smaller trials have suggested further improvements. There are other items which could advantageously be recorded even though they are not used for indexing the entry, such as language of original paper and type of study (e.g., experimental, comparative). A list of headings including such extra entries and a more adequate selection of types of source has been prepared for use in a larger scale trial of the whole system.

Economic retrieval of all the relevant information in the collection, in answer to an enquiry, was an essential requirement. The ability to select entries showing relationships not suspected at the time of entry of the data was desirable. No system can give out more information than has been put into it, so that the indexing system had to be such that as many implicit relationships as possible would be included when an entry was made. With these considerations in mind, a survey was made of possible indexing systems.

An alphabetical arrangement using plain language entries obviates the need for code books and provides implicit relationships through related meanings of the words used. Urquhart has shown (e) that information in alphabetical subject indexes can become “lost” in the sense that it is not retrieved by a searcher using the subject headings and guides. In his study more of the references looked for could be found by means of the author indexes than were retrieved by a subject approach. In this collection, not only would there be a large number of items to be indexed, but each of these items would have to be indexed from many aspects and at a number of levels of generality. Even the Index to Chemical Abstracts does not attempt to provide this last facility to any great extent; it is often necessary particularly for non-chemical entries to look up each member of the class when making a generic search. In an alphabetical subject index also it is difficult to search for information demanded under a partial specification and it was concluded that an alphabetical subject index to a collection of this nature and size was impracticable.

The need for generic searches suggested a classified system. Any form of generic relationship can be taken as the basis of the classification, but once this is chosen, generic searches on other bases have become impossible.

Page 1205 Cite
Suggested Citation:"Information Handling in a Large Information System." National Research Council. 1959. Proceedings of the International Conference on Scientific Information: Two Volumes. Washington, DC: The National Academies Press. doi: 10.17226/10866.
×

FIGURE 1. Headings for indexing entries.

Page 1206 Cite
Suggested Citation:"Information Handling in a Large Information System." National Research Council. 1959. Proceedings of the International Conference on Scientific Information: Two Volumes. Washington, DC: The National Academies Press. doi: 10.17226/10866.
×

In the collection, there are four main aspects: botanical, chemical, functional (e.g., palatability), and miscellaneous (e.g., cultivation). Each of these could be indexed by its own classification. For the functional and miscellaneous aspects any classification would be arbitrary, but the number of headings is small enough for this to be unimportant. There would be no need to develop botanical and chemical classifications. Well-established classifications exist. If a classified system were to be used, however, the freedom to make generic searches by any criterion, which was a main requirement of the collection, would be lost, and it was concluded that an alphabetical, classified system would not be suitable.

Alphabetical systems not being suitable, the entries must be encoded in some way. This would facilitate machine handling which the size of the collection also indicated might be desirable.

In choosing a suitable notation3 in which to express the entries, the need for showing implicit relationships was kept well in mind. In entries made in plain language, the relationships between the words of the entry are expressed partly by special words of relation, partly by the order in which they are recorded, and partly by inflexion (i.e., special modifications) of the words used. If the “words” are reduced to single symbols, the last of these methods becomes synonymous with the use of a special “word” of relation. It should be possible to construct an artificial language, or notation, in which synonyms are rigorously excluded and in which the redundancy is reduced to a controlled amount. A separate sequence of symbols, or “word,” will be required for each concept to be expressed and for each relationship between the concepts. The idea of expressing relationships in this way has been proposed by Farradane (o), who employs special symbols for the purpose, and it is also included in the colon classification (p). If every “word” is to be expressed by a single digit, an impracticably large number of characters will be required. “Words” consisting of two or more characters must be made so as to reduce the number of characters to a usable level. Reduction in number of characters has to be paid for by the complication that the meaning of a character has become dependent on its context. To reduce this complication to a minimum, the number of characters used should be as large as possible. The characters used should be easily distinguishable and reproducible and be adaptable to manuscript writing. The range of characters found on a typewriter keyboard meets these requirements. Customarily these comprise upper and lower case alphabets, one range of numerals, a set of punctuation marks, and some special characters. A more suitable modification for chemical use is a set comprising upper and lower case

3  

The term “notation” is used here to describe the system in which information is recorded in the collection, and the term “code” is reserved for the pattern of holes or spots in which the symbols of the notation are represented for machine handling.

Page 1207 Cite
Suggested Citation:"Information Handling in a Large Information System." National Research Council. 1959. Proceedings of the International Conference on Scientific Information: Two Volumes. Washington, DC: The National Academies Press. doi: 10.17226/10866.
×

alphabets, numerals, subscript numerals, and punctuation marks. If the typewriter is fitted with accents, accented characters increase the range available, although these might be considered as double characters since two key movements are required to reproduce them.

The subject to be indexed would be described in this notation, element by element, by taking the elements in a standard order. The resulting cipher would be a unique representation of the subject and the same subject would always be represented by the same cipher. Common sequences in the two ciphers would represent common elements and relationships. However, the position of these sequences in the ciphers will in general not be the same, owing to the “grammatical rules.” The sequences will also owe their individuality to the order of the characters of which they are composed. In consequence, any system used for retrieval must be able to recognise that, for example “bad” is not equivalent to “dab.”

The problem of depicting chemical compounds so that they might be indexed properly and uniquely has been under study by a number of workers during the last decade. In 1949 the International Union of Pure and Applied Chemistry invited submission of codes for chemical compounds which satisfied their requirements (z). Of the systems submitted in response to this invitation, the Dyson system (aa) has been selected after extensive testing as the Proposed International System (bb). This notation uses a large number of characters (162 in all) and in general satisfies the desiderata set out above as desirable for an indexing notation. Some of these characters, (e.g., overlined characters and fractional subscripts) could perhaps be dispensed with for normal indexing and if this is done, the notation can be expressed in approximately 107 characters.

No similar notation has been developed for plants. These have been traditionally classified in a linear order according to their main features. There are a number of anomalies in the order, and constant changes are being made in an effort to minimise these. For the flowering plants, two classifications (f, g) have found general acceptance. Another (h) based on a somewhat different starting point has been proposed more recently. Sporne (l, m) has suggested an alternative system based on the probable evolution of the plants. Two proposals have been made to describe plants by a fixed serial number (j, k). For a study of the relationship between chemical composition and the taxonomic characters of plants, it would be necessary to express these characters in a notation of the type proposed above. The outline of such a notation has been developed and this will be developed further when the problems of machine handling have been brought nearer solution.

A number of the indexing and classification systems which have been pro-

Page 1208 Cite
Suggested Citation:"Information Handling in a Large Information System." National Research Council. 1959. Proceedings of the International Conference on Scientific Information: Two Volumes. Washington, DC: The National Academies Press. doi: 10.17226/10866.
×

posed as solutions to the problem of documenting large systems for information retrieval were examined to see whether they could be employed in this collection. All the coordinate systems, for example, are unsuitable since their principle of operation is conjunction of a number of headings of equal rank. In so doing, all order of the elements is destroyed.

Punched card machines of the conventional type also are not suitable for use with such notations for two reasons. First, they can be adapted to read more than 35–40 symbols only at considerable cost. Secondly, they normally read the cards broadside on, and it is necessary to specify the column in which a given code appears or to search the cards column by column. This repeated operation seriously reduces the rate of searching. What is required is a machine which reads each code in turn and selects only those items which contain the desired sequence. Such a machine was made for the purpose of handling Dyson’s chemical notation (n) and this was demonstrated by IBM in 1950. Unfortunately this prototype has not been developed further.

High operating speeds and great flexibility in operation are the characters which distinguish electronic computers. A computer is able to handle any type of notation and can be set to search for a complete specification in one pass. In addition, the calculating facilities built in enable a computer to work to alternative specifications in a way in which no other system can. Disadvantages of computers for information retrieval are their great cost and complexity and the large amount of detailed programming necessary before inserting or retrieving information. Also, in many machines input and output speeds are low in comparison with those of the calculating units. The basic needs of an information system are for a large store of data on which relatively little work will be done and for an output speed comparable with the rate at which the data searching is done, whereas the computer is most efficient when performing a number of sequential operations on the same data. A collection of the size and nature described above was likely to exceed the storage capacity of any computers existing when the survey was begun.

Microfilm rapid selectors, that is, electronic selectors in which the selection media are in microfilm form, have the advantage that they can produce the information corresponding to the search specification rather than the reference numbers of the documents which contain it. The film media are small and comparatively cheap to make and to store and are readily replicated. It seemed that further investigation of their possibilities was merited. A number of systems have been described. The Shaw rapid selector (u) redesigned and tested at the U.S. Department of Agriculture Library (t) was one of the earliest. The photoscopic information storage system (q) makes use of computer type circuits to analyse the information in the system and has the greatest density of

Page 1209 Cite
Suggested Citation:"Information Handling in a Large Information System." National Research Council. 1959. Proceedings of the International Conference on Scientific Information: Two Volumes. Washington, DC: The National Academies Press. doi: 10.17226/10866.
×

information in its storage. It is, however, difficult as yet to amend information once it has been recorded. The Minicard system (r, s) has a large coding area, and special attention has been given to reproduction of copies, addition of codes to later prints and to the inclusion of the maximum amount of information in clear. The system also includes the fullest range of handling machines and represents the most comprehensive attempt to provide for the needs of an information system. Unfortunately it appears (s) that the codes are not to be read from end to end of the card on each pass but row by row as is done in the case of IBM cards. Another selector is in course of development at Western Reserve University (v). Unfortunately, none of these systems is currently available in the United Kingdom, and it was impracticable to base a system on any of them. There was, however, one machine of this type, the Filmorex (w, x), which is of moderate price and is currently available. This system was thought worthy of further investigation. It was accordingly chosen for trial.

One of the problems which arises whenever non-redundant codes or notations are used is that of detecting and eliminating errors. If the data are reproduced mechanically once they have been recorded and checked, errors in transcription are minimised. This can be suitably done by recording the information on punched paper tape. The Flexowriter automatic typewriter (y), which in its simplest form is an electric typewriter with a tape-punch and tape-reader attached to it, enables this to be done. This model can punch selected portions of the information onto the paper tape as it is typed. The resulting tape can afterwards be used to operate the typewriter. The codes can be punched in a strip along one edge of a card, by a modified version of the punch, and these cards used instead of the tape to operate the reader. There is also a more complex model of the machine (the Programatic). In this model, codes can be punched in the tape which will cause the machine to switch the punch and/or the typewriter on and off, enabling extracts to be made of predetermined parts of the information recorded. Another model, intended primarily for personalised letter writing and similar uses, was provided with two readers and could be switched from one reader to the other by codes in the tapes. It was thought that a combination of these two features, the two inputs and the ability to control the operation of the machine by codes in the tape, would result in a very powerful and flexible machine. It was already standard for a tape in the reader to be used as a “programme tape” to instruct the machine to move to the position required, for the next fill-in on a form for example, and whether to punch that block of information into the output tape. The second reader and the “reader switch” facility would enable a “programme tape” to be used also for determining what action should be taken on each block of information bounded by two “reader switch” codes without this

Page 1210 Cite
Suggested Citation:"Information Handling in a Large Information System." National Research Council. 1959. Proceedings of the International Conference on Scientific Information: Two Volumes. Washington, DC: The National Academies Press. doi: 10.17226/10866.
×

action being predetermined before the information was first recorded. Not only can this “programme tape” determine whether the information is to be typed, punched, or ignored, but even the arrangement of the blocks of information on the page can be changed. For example, insertion of carriage return codes from the programme would arrange blocks of information one below the other, whereas previously they had followed one another on the same line.

Insertion of the currect notation would be ensured by preparing and checking in advance a master set of unit cards into which was “strip-punched”4 both the plain language and the notational equivalent of all the subject headings to be used. To make an entry, the prepunched card from the master set corresponding to the correct subject heading would be chosen and read in the first reader of the Flexowriter, which would be controlled in Duplex working by a programme tape in the second reader. The cards from the master set would be refiled immediately after use so as to be available for re-use when required. This procedure would ensure: (1) that only approved terms were used since unit cards are added to the master set only for approved subject headings; (2) that the plain language entry would be accompanied by the correct notational equivalent. If therefore a typescript and a record tape were made simultaneously, and the typescript were proofread to ensure that the correct plain language headings had been entered, the notation in the record tape must be correct. At a later stage, after any necessary further editing, a run of the record tape through the machine under control of an appropriate programme tape would result in a tape containing only the code entries of the notation. There was evidence that such a system could achieve good reliability.

A machine was made to this specification, but on trial it was found to have minor shortcomings which diminished its usefulness. For example it was found that in “non-print” the machine would respond only to a “print restore” code, with the result that it was impossible to type an extract of a record tape under the control of a programme tape since the “reader switch” codes (which switched the input from one reader to the other) were ignored. Another difficulty was that the “reader switch” codes were punched from both readers so that the number in the output tape was doubled with each pass through the Flexowriter under the control of a “programme tape” in the other reader. This made it impossible to use a “programme tape” to control the machine unless it was known beforehand how many times the information had already been through the machine. These and other minor points have been rectified, and a trial is in progress on the lines indicated.

One other modification to the standard machine deserves mention. The

4  

The term “strip-punched” is used to distinguish this type of card (which has the punched tape code punched along one side) from the Keysort type “edge-punched” card and the IBM type of “field-punched” card.

Page 1211 Cite
Suggested Citation:"Information Handling in a Large Information System." National Research Council. 1959. Proceedings of the International Conference on Scientific Information: Two Volumes. Washington, DC: The National Academies Press. doi: 10.17226/10866.
×

standard codes for the paper tape are 5-unit, 6-unit, and 8-unit. The 5-unit code does not contain enough combinations. In the 6-unit tape, the upper and lower case characters differ only in that the former are preceded (at any distance back along the tape) by an upper case code, while a lower case code precedes the latter. In the standard Flexowriter 8-unit code, which was designed for ease of conversion of information on tape into IBM punched cards, the 6-unit code is used with an added “parity” punch in the 5th channel so that all the codes have an odd number of holes: the 8th channel is used only for the “carriage return” code. These codes were modified so that there should be a difference between upper and lower case characters in the combination being read by the reader. A punch in the 8th channel was added to all upper case characters, while this channel was blank for the lower case. This gave an 8-unit code, one channel of which was used only for parity check. The parity channel can be omitted later and a 7-unit code results. The number of non-zero combinations (127) is enough to provide a separate code for each character on the Flexowriter keyboard and one for each control signal. If the control codes were eliminated, there is accommodation for a further alphabet which could be represented in typescript by accented letters.

The products of the above procedure are: (1) an output tape in which is punched the notational equivalent of all the indexing entries to be made for a particular paper, together with (2) a typescript on which is typed the plain language of all these entries. These are to be passed to the Filmorex for use as follows: the tape to produce the perforated mask from which the code pattern is photographed; and the typescript, together with a suitable abstract of the paper, to be photographed as the pictorial portion of the Filmorex “fiche.”

A special conversion unit is required to link the Flexowriter and the Filmorex. This unit reads the codes in the Flexowriter output tape, recodes them as appropriate, and punches the new codes into the Filmorex perforated mask. By varying the connections in this unit, the coupling can be made flexible. The principle of operation of the Filmorex selector, of passing the cards in turn through a beam of light, in which is placed also a search specification card bearing the inverse of the pattern sought and of using the momentary “black out” of all light which occurs when a wanted card is read to operate the selection shutter by means of a photocell, imposes a limitation on the codes which can be used. Each pattern (in the standard Filmorex, one line of the coding area) read by a single photocell must have the same number of black spots (and of white spaces). With this limitation, the coding area, 30 units wide, can be divided into 6 fields each 5 units wide allowing a 6-digit number to be represented (2 punches out of 5 give 10 characters). The possible vocabulary size is 106 wrods. Alternatively a larger vocabulary (3.2×106 words) can be used

Page 1212 Cite
Suggested Citation:"Information Handling in a Large Information System." National Research Council. 1959. Proceedings of the International Conference on Scientific Information: Two Volumes. Washington, DC: The National Academies Press. doi: 10.17226/10866.
×

by dividing the line into 5 fields of 6 units each (punching 3 out of 6 gives 20 characters and 205 possible “words”). Since the output of the Flexowriter is 7-unit alphanumerical, it was thought preferable to divide the line into 4 digits of 7 units each, allowing 35 characters to be represented (punching 3 out of 7). The vocabulary is reduced somewhat below the maximum (to approximately 1.5×106) but greater flexibility is achieved. The 35 characters chosen are the numerals plus an alphabet, omitting I as likely to conflict with 1.

As the equipment stands, the system is simple and flexible. The codes are read in order on successive lines of the fiche, and the extensive presorting which the small, cheap fiche makes economic, speeds up the search by making it unnecessary in the majority of cases to search more than a small fraction of the file. The present selector has 5 reading heads and can be set to select various logical combinations of 5 code lines at one pass. It cannot distinguish the order in which these code lines occur.

If, however, the reading mechanism of the Filmorex were altered by the addition of further photocells, it would be possible to remove the restriction mentioned above on the code combinations which can be used, and the full theoretical total of 127 non-zero combinations possible for a 7-unit code could be used to accommodate 127 different characters or control signals. The versatility of the selector could be further increased by adding more logical circuits. By using the two spare units to enable the number of lines to be counted, this logical circuitry could distinguish the order of codes.

In order to allow a trial of the collection to be begun, it was decided to accept the present limitations of the Flexowriter and Filmorex and to try them in combination before attempting any further modifications, such as those outlined above. These can be worked on as the above trial is in progress. For work to commence, a botanical and chemical notation must be worked out which does not need more than 35 characters.

For the botanical entries, a system has been worked out. Zero was reserved for generality, and it was found that the whole of the plant kingdom could be accommodated (Table 1). For the flowering plants a more detailed classification has been made by using Willis’s system (i) as a basis. Nine alphanumeric digits are used. Of these the first three designate the family (Table 2), the next two the genus, and the remaining four the species and variety. So far some 5,000 species of flowering plants have been satisfactorily coded. For ease in recognition, a space is left after the third digit (the family) and a decimal point is inserted after the fifth (the genus), e.g., 365 . (Zeros are barred to distinguish them from the letter O.)

For the chemical compounds, several notations have been proposed which can be expressed within the limits of 35 characters. The Wiswesser system

Page 1213 Cite
Suggested Citation:"Information Handling in a Large Information System." National Research Council. 1959. Proceedings of the International Conference on Scientific Information: Two Volumes. Washington, DC: The National Academies Press. doi: 10.17226/10866.
×

TABLE 1. Plant classification: major groups

CRYPTOGAMIA

Thallophyta

 

Bacteria

A

Myxomycetes

B

Algae

 

Chlorophyceae

C-D

Xanthophyceae

E

Bacilliarophyceae

F

Euglenineae

G

Phaeophyceae

H

Rhodophyceae

I

Cyanophyceae

J

Fungi

 

Phycomycetes

 

Oomycetes

K

Zygomycetes

L

Ascomycetes

 

Endomycetales

M-N

(Yeasts as such)

N

Plectomycetes

O

Discomycetes

P

Pyrenomycetes

Q

Basidiomycetes

 

Ustilaginales

R

Uredinales

S

Hymenomycetes

T

Gasteromycetes

U

Fungi imperfecti

V

Lichenes

 

Ascolichenes

W

Bryophyta

 

Hepaticae

X

Muscineae

Y

Pteridophyta

Z

PHANEROGAMIA

Spermaphyta

 

Gymnospermae

110–140

Angiospermae

 

Monocotyledonae

170–200

Dicotyledonae

 

Archichlamydeae

300–700

Sympetalae

860–900

(cc, dd, ee) was originally adapted to the slightly greater range of characters which a punched card machine can handle but has been developed with more characters into a notation such as was envisaged by IUPAC (z) and is designed for correlation and searching procedures. The Chemical-Biological Coordination Center developed a code (ff) for use with its work which is of a completely different character. In this the various component groupings are enumerated and no attempt is made to designate the complete compound with a unique cipher. The Centre National de la Recherche Scientifique has developed a

Page 1214 Cite
Suggested Citation:"Information Handling in a Large Information System." National Research Council. 1959. Proceedings of the International Conference on Scientific Information: Two Volumes. Washington, DC: The National Academies Press. doi: 10.17226/10866.
×

TABLE 2. Families of flowering plants and ferns (Willis)

Pteridophyta

 

Cyatheaceae

Z12

Equisetaceae

Z31

Gleicheniaceae

Z16

Hymenophyllaceae

Z11

Isoetes

Z99

Ligulatae

Z50

Lycopodiaceae

Z41

Marattiaceae

Z23

Marsiliaceae

Z21

Matoniaceae

Z15

Ophioglossaceae

Z24

Osmundaceae

Z18

Parkeriaceae

Z14

Polypodiaceae

Z13

Psilotaceae

Z71

Salviniaceae

Z22

Schizaeaceae

Z17

Gymnospermae

 

Cycadaceae

111

Ginkgoaceae

121

Gnetaceae

141

Pinaceae

132

Taxaceae

131

Monocotyledons

 

Alismaceae

185

Amaryllidaceae

275

Aponogetonaceae

183

Araceae

241

Bromeliaceae

259

Burmanniaceae

291

Butomaceae

186

Cannaceae

283

Centrolepidaceae

253

Commelinaceae

261

Cyanastraceae

263

Cyclanthaceae

231

Cyperaceae

212

Dioscoreaceae

278

Eriocaulaceae

256

Flagellariaceae

251

Gramineae

211

Haemodoraceae

274

Hydrocharitaceae

187

Iridaceae

279

Juncaceae

271

Lemnaceae

242

Liliaceae

273

Marantaceae

284

Mayacaceae

254

Musaceae

281

Najadaceae

182

Orchidaceae

292

Palmae

221

Pandanaceae

172

Philydraceae

264

Pontederiaceae

262

Potamogetonaceae

181

Rapateaceae

258

Restionaceae

252

Scheuchzeriaceae

184

Sparganiaceae

173

Stemonaceae

272

Taccaceae

277

Thurniaceae

257

Triuridaceae

191

Typhaceae

171

Velloziaceae

276

Xyridaceae

255

Zingiberaceae

282

Dicotyledons

 

Acanthaceae

951

Aceraceae

657

Achariaceae

741

Achatocarpaceae

487

Actinidiaceae

712

Adoxaceae

974

Aextoxicaceae

664

Aizoaceae

488

Akaniaceae

663

Alangiaceae

778

Amarantaceae

482

Anacardiaceae

645

Ancistrocladaceae

746

Anonaceae

524

Apocynaceae

915

Aquifoliaceae

649

Araliaceae

791

Aristolochiaceae

461

Asclepiadaceae

916

Balanophoraceae

458

Balanopsidaceae

361

Balsaminaceae

667

Basellaceae

492

Batidaceae

391

Begoniaceae

745

Berberidaceae

518

Betulaceae

421

Bignoniaceae

934

Page 1215 Cite
Suggested Citation:"Information Handling in a Large Information System." National Research Council. 1959. Proceedings of the International Conference on Scientific Information: Two Volumes. Washington, DC: The National Academies Press. doi: 10.17226/10866.
×

Bixaceae

729

Bombacaceae

686

Boraginaceae

924

Bretschneideraceae

659

Brunelliaceae

559

Bruniaceae

563

Brunoniaceae

993

Burseraceae

622

Buxaceae

641

Byblidaceae

558

Cactaceae

751

Callitrichaceae

639

Calycanthaceae

522

Calyceraceae

995

Campanulaceae

991

Canellaceae

732

Capparidaceae

532

Caprifoliaceae

973

Caricaceae

742

Caryocaraceae

717

Caryophyllaceae

494

Casuarinaceae

311

Celastraceae

651

Cephalotaceae

555

Ceratophyllaceae

512

Cercidiphyllaceae

515

Chenopodiaceae

481

Chlaenaceae

682

Chloranthaceae

323

Cistaceae

728

Clethraceae

861

Cneoraceae

618

Cochlospermaceae

731

Columelliaceae

939

Combretaceae

779

Compositae

996

Connaraceae

574

Convolvulaceae

921

Coriariaceae

643

Cornaceae

793

Corynocarpaceae

648

Crassulaceae

554

Crossosomataceae

572

Cruciferae

533

Crypteroniaceae

772

Cucurbitaceae

981

Cunoniaceae

561

Cynocrambaceae

484

Cynomoriaceae

787

Cyrillaceae

646

Daphniphyllaceae

638

Datiscaceae

744

Desfontainiaceae

913

Diapensiaceae

866

Dichapetalaceae

636

Diclidantheraceae

896

Didieraceae

661

Dilleniaceae

711

Dipsacaceae

976

Dipterocarpaceae

723

Droseraceae

543

Dysphaniaceae

493

Ebenaceae

892

Elaeagnaceae

765

Elaeocarpaceae

681

Elatinaceae

724

Empetraceae

642

Epacridaceae

865

Ericaceae

864

Erythroxylaceae

616

Eucommiaceae

565

Eucryphiaceae

713

Euphorbiaceae

637

Eupomatiaceae

525

Fagaceae

422

Flacourtiaceae

735

Fouquieraceae

727

Frankeniaceae

725

Garryaceae

341

Geissolomataceae

761

Gentianaceae

914

Geraniaceae

611

Gesneriaceae

938

Globulariaceae

942

Gomortegaceae

527

Gonystilaceae

683

Goodeniaceae

992

Grubbiaceae

454

Guttiferae

722

Gyrostemonaceae

486

Haloragidaceae

785

Hamamelidaceae

564

Hernandiaceae

52x

Heteropyxidaceae

766

Himantandraceae

513

Hippocastanaceae

658

Hippocrateaceae

652

Hippuridaceae

786

Page 1216 Cite
Suggested Citation:"Information Handling in a Large Information System." National Research Council. 1959. Proceedings of the International Conference on Scientific Information: Two Volumes. Washington, DC: The National Academies Press. doi: 10.17226/10866.
×

Hoplestigmataceae

734

Humiriaceae

615

Hydnoraceae

463

Hydrocaryaceae

783

Hydrophyllaceae

923

Hydrostachyaceae

553

Icacinaceae

656

Juglandaceae

381

Julianaceae

411

Labiatae

926

Lacistemaceae

324

Lactoridaceae

523

Lardizabalaceae

517

Lauraceae

529

Lecythidaceae

775

Leguminosae

575

Leitneriaceae

371

Lennoaceae

863

Lentibulariaceae

941

Limnanthaceae

644

Linaceae

614

Lissocarpaceae

895

Loasaceae

743

Loganiaceae

912

Loranthaceae

457

Lythraceae

771

Magnoliaceae

521

Malesherbiaceae

738

Malpighiaceae

631

Malvaceae

685

Marcgraviaceae

718

Martyniaceae

936

Medusagynaceae

716

Melastomataceae

782

Meliaceae

623

Melianthaceae

666

Menispermaceae

518

Monimiaceae

528

Moraceae

432

Moringaceae

536

Myoporaceae

953

Myricaceae

351

Myristicaceae

526

Myrothamnaceae

562

Myrsinaceae

872

Myrtaceae

781

Myzodendraceae

451

Nepenthaceae

542

Nolanaceae

931

Nyctaginaceae

483

Nymphaeaceae

511

Nyssaceae

777

Ochnaceae

714

Octoknemataceae

456

Olacaceae

455

Oleaceae

911

Oliniaceae

763

Onagraceae

784

Opiliaceae

453

Orobanchaceae

937

Oxalidaceae

612

Pandaceae

581

Papaveraceae

531

Passifloraceae

739

Pedaliaceae

935

Penaeaceae

762

Pentaphylacaceae

647

Phrymaceae

954

Phytolaccaceae

485

Piperaceae

322

Pittosporaceae

557

Plantaginaceae

961

Platanaceae

571

Plumbaginaceae

881

Podostemaceae

551

Polemoniaceae

922

Polygalaceae

635

Polygonaceae

471

Portulacaceae

491

Primulaceae

873

Proteaceae

441

Punicaceae

774

Pyrolaceae

862

Quiinaceae

719

Rafflesiaceae

462

Ranunculaceae

516

Resedaceae

535

Rhamnaceae

671

Rhizophoraceae

776

Rosaceae

573

Rubiaceae

971

Rutaceae

619

Sabiaceae

665

Salicaceae

331

Salvadoraceae

653

Santalaceae

452

Sapindaceae

662

Sapotaceae

891

Page 1217 Cite
Suggested Citation:"Information Handling in a Large Information System." National Research Council. 1959. Proceedings of the International Conference on Scientific Information: Two Volumes. Washington, DC: The National Academies Press. doi: 10.17226/10866.
×

Sarraceniaceae

541

Saururaceae

321

Saxifragaceae

556

Scrophulariaceae

933

Scytopetalaceae

688

Simarubaceae

621

Solanaceae

932

Sonneratiaceae

773

Stachyuraceae

736

Stackhousiaceae

654

Staphyleaceae

655

Sterculiaceae

687

Strasburgeriaceae

715

Stylidiaceae

994

Styracaceae

894

Symplocaceae

893

Tamaricaceae

726

Theaceae

721

Theophrastaceae

871

Thymelaeaceae

764

Tiliaceae

684

Tovariaceae

534

Tremandraceae

634

Trigoniaceae

632

Tristichaceae

552

Trochodendraceae

514

Tropaeolaceae

613

Turneraceae

737

Ulmaceae

431

Umbelliferae

792

Urticaceae

433

Valerianaceae

975

Verbenaceae

925

Violaceae

733

Vitaceae

672

Vochysiaceae

633

Zygophyllaceae

617

system on somewhat the same lines as the CBCC for use in its Filmorex installation (gg) and it is proposed to adopt this for the first trial run.

Other properties, e.g., palatability, medicinal effects, texture, conditions of growth, susceptibility to diseases, are of importance in determining the economic value and use which can be made of plants. These properties are difficult in many cases to describe on a numerical or other linear scale. The number of possible headings for each is small, however, and coding in a maximum of two digits is possible in a number of ways, within the range of 35 characters.

At the time when this paper was proposed, it was thought that all the above experimental work could be reported on. Owing to delays in delivery of equipment and, in particular, to a serious accident which befell the author, it has not proved possible to include the results. The present status of the work is: the Flexowriter has been tried on all the procedures, and, subject to the modifications outlined, has proved itself satisfactory; the detailed design of the conversion unit has been completed and construction is due to begin. The delivery of the Filmorex equipment is expected in the near future. By autumn 1958 some results should be available and these will be reported in due course.

Summary

Some serious limitations of existing methods of indexing and cataloguing scientific information became apparent when the possibility was being explored of setting up a large detailed system which would answer enquiries on the pub-

Page 1218 Cite
Suggested Citation:"Information Handling in a Large Information System." National Research Council. 1959. Proceedings of the International Conference on Scientific Information: Two Volumes. Washington, DC: The National Academies Press. doi: 10.17226/10866.
×

lished information on the chemical compounds found in plants. It was estimated that this system would need to contain of the order of 107 items, each of which might be sought from any one of four aspects, namely chemical, botanical, functional, and miscellaneous. By functional is meant such characters as palatability, pharmaceutical effects, and toxicity while the miscellaneous aspect includes such factors as cultivation and place of growth. The problem was complicated by a need for retrieval when given partial specifications, for example plants containing chemical compounds which have certain groupings in common. Such a system cannot be handled by any of the existing classical methods.

The solution proposed, which is applicable to any system of large size which has to be indexed in detail, is to express each of the factors in a notation in which a linear series of symbols expresses the factor element by element and in which the relationship between the elements is expressed partly by the order of these symbols and partly by special symbols of relationship. In the example mentioned above, notations for expressing the chemical aspect have been proposed, the botanical aspect has been extensively classified by taxonomic characters although no notation of this type has been formulated, while the functional aspect is not classified.

Use of such a notation on the scale envisaged presents problems of machine design. Although some machines work satisfactorily on a binary system, a range of symbols expressed in binary form is not convenient for manual handling (e.g., compilation of codes and entry of the information into the system). For this, use of the maximum number of symbols is desirable. A compromise has been adopted with a range of symbols as large as can be accommodated on a typewriter keyboard, that is to say, two alphabets, two ranges of numerals, and a full set of punctuation marks, which can be converted by a modified model of a tape punching typewriter into a seven-bit code.

Having recorded the information in this way and having checked it for accuracy, further handling of the information can be done by machine and further checking should be unnecessary. Various devices can be used to facilitate this initial checking.

Selection again presents a problem. The standard punched card machine, which immediately comes to mind as a possible way of mechanical selection, is unsuited to seven-bit codes and even more unsuited to notations of the type proposed in which the position of the group to be searched for cannot be specified. On the other hand, electronic computers, which are ideal for handling binary information, have memories which are several orders too low and have the additional disadvantage of being unnecessarily expensive.

The selection can be done by a rapid selector type of equipment modified so

Page 1219 Cite
Suggested Citation:"Information Handling in a Large Information System." National Research Council. 1959. Proceedings of the International Conference on Scientific Information: Two Volumes. Washington, DC: The National Academies Press. doi: 10.17226/10866.
×

as to accept punched tape (the output from a tape punching typewriter) as its input. In this machine the indexing entries recorded in binary code are scanned term by term so that the position of the group for which search is being made is immaterial.

There are three points of novelty in the process. First is the development of suitable notations and their adaptation to machine limitations and use. Second is the modification to the tape punching typewriter to give it the necessary versatility and flexibility. Third is the adaptation of the Filmorex to accept codes of this type and the design and construction of the converters needed to transfer the information from one machine to the next.

REFERENCES

(a) Survey of Agricultural, Forestry and Fishery Products in the United Kingdom. Development Commission, London, 1953.

(b) KLEIN, G. Handbuch der Pflanzenanalyse. Springer, Wien, 1931–2.

(c) WEHMER, C. Die Pflanzenstoffe. Verlag Fischer, Jena, 1929–30.

(d) WALL, E.M., FENSKE, C.S., WILLAMAN, J.J., CORRELL, D.S., SCHUBERT,B. G. and GENTRY, H.S. Steroidal sapogenins XXVI. Supplementary table of data for steroidal sapogenins XXV. September 1955. U.S. Dept. Agr., Agric. Research Service, ARS-73–4.

(e) URQUHART, D.J. Unanswered Questions No. 5. Dept. Scientific Industrial Research, London, October 1951, pp. 1–3.

(f) BENTHAM, G. and HOOKER, J.D. Genera Plantarum, London, 1862.

(g) ENGLER, A., and PRANDTL. Die Natürlichen Pflanzenfamilien, Leipzig.

(h) HUTCHINSON, J. Families of Flowering Plants: Vol. I, Dicotyledons; Vol. II, Monocotyledons. Macmillan, London, 1926/34.

(i) WILLIS, J.C. Flowering Plants and Ferns, 6th revised edition. Cambridge University Press, 1951.

(j) MULLINS, L.J., and NICKERSON, W.J. A proposal for serial number indentification of biological species. Chronica botan., 12, 4 (1951).

(k) GOULD, S.W. Permanent numbers to supplement the binomial system of nomenclature. Am. Scientist, 42, 269–74 (1954).

(l) SPORNE, K.R. Statistics and the evolution of dicotyledons, Evolution, 8 [1], 55–64 (1954).

(m) SPORNE, K.R. The phylogenetic classification of the Angiosperms. Biol. Revs., 31, 1–29 (1956).

(n) DYSON, G.M. Studies in chemical documentation III. Mechanized documentation. Chem. & Ind., 1954 (April 17), 400–9.

(o) FARRADANE, J.E.L. A scientific theory of classification and indexing and its practical applications. J. Document. 6, 83–99 (1950); A scientific theory of classification and indexing: further considerations. J. Document., 8(2), 73–82 (1952).

(p) RANGANATHAN, S.R. Annals of Library Science.

Page 1220 Cite
Suggested Citation:"Information Handling in a Large Information System." National Research Council. 1959. Proceedings of the International Conference on Scientific Information: Two Volumes. Washington, DC: The National Academies Press. doi: 10.17226/10866.
×

(q) KING, G.W. A new approach to information storage. Control Eng., August 1955.

KING, G.W., BROWN, G.W., and RIDENOUR, L.N. Photographic techniques for information storage. Proc. I.R.E., 41(10), 421–8 (1953).

ANON. Photoscopic information storage. International Telemeter Corporation, Los Angeles, Calif., Publication R-77, March 15, 1955.

(r) TYLER, A.W., MYERS, W.L., and KUIPERS, J.W. The application of the Kodak Minicard system to problems of documentation. Am. Document., 6(1), 18–30 (1955).

KUIPERS, J.W., TYLER, A.W., and MYERS, W.L. A Minicard system for documentary information. Am. Document., 8(4), 246–68 (1957).

(s) Minicard demonstration. Am. Document., 6(4), 258–9 (1955).

(t) Report for the microfilm rapid selector. Eng. Research Assoc., Inc., No. PB 97313. U.S. Dept. Commerce, 1949.

(u) SHAW, R.R. The rapid selector. J. Documentation, 5, 164–71 (1949).

(v) The Western Reserve Searching Selector. Am. Document, 8(3), 237–8 (1957).

(w) SAMAIN, J. Progres du classement et de la selection mechanique des documents. 17 Conf. FID Berne, August 1947, pp. 22–26.

(x) SAMAIN, J. The organization of documentation by the Filmorex technique. Filmorex, Paris, 1956.

(y) BROWN, R.HUNT, Editor, Office Automation. Automation Consultants, Inc., New York, 1955, pp. 51–7.

(z) Codes invited. Chem. Eng. News, 27, 2998 (1949).

(aa) DYSON, G.M. A New Notation and Enumeration System for Organic Compounds, 2nd edition. Longmans Green, London, 1949.

(bb) DYSON, G.M. Private communication.

(cc) WISWESSER, W.J. Simplified chemical coding for automatic sorting and printing machinery. Willson Products Inc., Reading, Pa., 1951.

(dd) WISWESSER, W.J. The Wiswesser line formula notation. Chem. Eng. News, 30, 3525–6 (1952).

(ee) WISWESSER, W.J. A Line Formula Chemical Notation. Crowell Co., New York, 1954.

(ff) CHEMICAL-BIOLOGICAL COORDINATION CENTER. A method for coding chemicals for correlation and classification. National Research Council, Washington, D.C. 1950.

(gg) SAMAIN, J. Personal communication.

Page 1203 Cite
Suggested Citation:"Information Handling in a Large Information System." National Research Council. 1959. Proceedings of the International Conference on Scientific Information: Two Volumes. Washington, DC: The National Academies Press. doi: 10.17226/10866.
×
Page 1203
Page 1204 Cite
Suggested Citation:"Information Handling in a Large Information System." National Research Council. 1959. Proceedings of the International Conference on Scientific Information: Two Volumes. Washington, DC: The National Academies Press. doi: 10.17226/10866.
×
Page 1204
Page 1205 Cite
Suggested Citation:"Information Handling in a Large Information System." National Research Council. 1959. Proceedings of the International Conference on Scientific Information: Two Volumes. Washington, DC: The National Academies Press. doi: 10.17226/10866.
×
Page 1205
Page 1206 Cite
Suggested Citation:"Information Handling in a Large Information System." National Research Council. 1959. Proceedings of the International Conference on Scientific Information: Two Volumes. Washington, DC: The National Academies Press. doi: 10.17226/10866.
×
Page 1206
Page 1207 Cite
Suggested Citation:"Information Handling in a Large Information System." National Research Council. 1959. Proceedings of the International Conference on Scientific Information: Two Volumes. Washington, DC: The National Academies Press. doi: 10.17226/10866.
×
Page 1207
Page 1208 Cite
Suggested Citation:"Information Handling in a Large Information System." National Research Council. 1959. Proceedings of the International Conference on Scientific Information: Two Volumes. Washington, DC: The National Academies Press. doi: 10.17226/10866.
×
Page 1208
Page 1209 Cite
Suggested Citation:"Information Handling in a Large Information System." National Research Council. 1959. Proceedings of the International Conference on Scientific Information: Two Volumes. Washington, DC: The National Academies Press. doi: 10.17226/10866.
×
Page 1209
Page 1210 Cite
Suggested Citation:"Information Handling in a Large Information System." National Research Council. 1959. Proceedings of the International Conference on Scientific Information: Two Volumes. Washington, DC: The National Academies Press. doi: 10.17226/10866.
×
Page 1210
Page 1211 Cite
Suggested Citation:"Information Handling in a Large Information System." National Research Council. 1959. Proceedings of the International Conference on Scientific Information: Two Volumes. Washington, DC: The National Academies Press. doi: 10.17226/10866.
×
Page 1211
Page 1212 Cite
Suggested Citation:"Information Handling in a Large Information System." National Research Council. 1959. Proceedings of the International Conference on Scientific Information: Two Volumes. Washington, DC: The National Academies Press. doi: 10.17226/10866.
×
Page 1212
Page 1213 Cite
Suggested Citation:"Information Handling in a Large Information System." National Research Council. 1959. Proceedings of the International Conference on Scientific Information: Two Volumes. Washington, DC: The National Academies Press. doi: 10.17226/10866.
×
Page 1213
Page 1214 Cite
Suggested Citation:"Information Handling in a Large Information System." National Research Council. 1959. Proceedings of the International Conference on Scientific Information: Two Volumes. Washington, DC: The National Academies Press. doi: 10.17226/10866.
×
Page 1214
Page 1215 Cite
Suggested Citation:"Information Handling in a Large Information System." National Research Council. 1959. Proceedings of the International Conference on Scientific Information: Two Volumes. Washington, DC: The National Academies Press. doi: 10.17226/10866.
×
Page 1215
Page 1216 Cite
Suggested Citation:"Information Handling in a Large Information System." National Research Council. 1959. Proceedings of the International Conference on Scientific Information: Two Volumes. Washington, DC: The National Academies Press. doi: 10.17226/10866.
×
Page 1216
Page 1217 Cite
Suggested Citation:"Information Handling in a Large Information System." National Research Council. 1959. Proceedings of the International Conference on Scientific Information: Two Volumes. Washington, DC: The National Academies Press. doi: 10.17226/10866.
×
Page 1217
Page 1218 Cite
Suggested Citation:"Information Handling in a Large Information System." National Research Council. 1959. Proceedings of the International Conference on Scientific Information: Two Volumes. Washington, DC: The National Academies Press. doi: 10.17226/10866.
×
Page 1218
Page 1219 Cite
Suggested Citation:"Information Handling in a Large Information System." National Research Council. 1959. Proceedings of the International Conference on Scientific Information: Two Volumes. Washington, DC: The National Academies Press. doi: 10.17226/10866.
×
Page 1219
Page 1220 Cite
Suggested Citation:"Information Handling in a Large Information System." National Research Council. 1959. Proceedings of the International Conference on Scientific Information: Two Volumes. Washington, DC: The National Academies Press. doi: 10.17226/10866.
×
Page 1220
Next: Tabledex: A New Coordinate Indexing Method for Bound Book Form Bibliographies »
Proceedings of the International Conference on Scientific Information: Two Volumes Get This Book
×
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

The launch of Sputnik caused a flurry of governmental activity in science information. The 1958 International Conference on Scientific Information (ICSI) was held in Washington from Nov.16-21 1958 and sponsored by NSF, NAS, and American Documentation Institute, the predecessor to the American Society for Information Science. In 1959, 20,000 copies of the two volume proceedings were published by NAS and included 75 papers (1600 pages) by dozens of pioneers from seven areas such as:

  • Literature and reference needs of scientists
  • Function and effectiveness of A & I services
  • Effectiveness of Monographs, Compendia, and Specialized Centers
  • Organization of information for storage and search: comparative characteristics of existing systems
  • Organization of information for storage and retrospective search: intellectual problems and equipment considerations
  • Organization of information for storage and retrospective search: possibility for a general theory
  • Responsibilities of Government, Societies, Universities, and industry for improved information services and research.

It is now an out of print classic in the field of science information studies.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!