Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
THE NATIONAL RESEARCH COUNCIL CHEMICAL CODE Harriet A. Geer Chemical-Biological Coordination Center National Research Council Washington, D. C.
128 The National Research Council Chemical Code was developed by the Chemical Codification Panel under Dr. C. Chester Stock as Chairman. The primary purpose in developing this code was to provide a means of describing the structure of a chemical compound linearly for transcrip- tion to a punched card and thus permit the selection of types of compounds by machine methods. Cataloguing of compounds by empirical formula and chemical name has been used extensively, and certain systematic classifications such as Beilstein and the Wiselogle classification have also been employed. More recently the Dyson system as well as several other methods of classifying chemicals have been developed and, in some instances, applied to punched cards. To locate a single compound, the empirical formula offers a simple and sure method; to locate types of compounds, the chemical name or any other of the various classification systems permits the location of many compounds. Oftentimes, however, a group is obscured because of the presence of another group or groups which take precedence in the classification scheme. By expressing the component parts of a compound by individual numbers or code designations and transferring them to a punched card any desired component or components may be searched for. The NRC Chemical Code describes a chemical structure by assigning code designations to the component parts without showing specifically how the parts are attached to one another. According to the principles of this Code, the component parts or groups are divided into four main divisions (organic, organoheteroid, inorganic and indeterminate structures), which are further divided into so-called families. Within each family three numbers or letters describe the various groups in these categories. A fourth digit is used to record how many times the group occurs in the structure coded. In all families, space has been left for future expansion. The classification in Division I or the organic group which includes Families 0, 1,2. . .9, A, B . . . N,0* is based upon a coding system developed by Dr. Frear and his associates at Pennsylvania State College. The families in Division I are listed in order of decreasing complexity with respect to the number of elements present. Groups containing N, O, S and halogen in addition to C and H are in Family 0-- and so on through Family 0-- containing only noncyclic C. Cyclic groups are classified in separate families from the noncyclic, e.g. , Family 6-- classified noncyclic groups containing N and O (in addition to C and H) whereas Family 7-- codes ring structures containing C, N and O. In Division I, the specificity with which the individual group is defined has been regulated to a large degree by the frequency with which the group in question occurs. For example, dithiocarboxylic acids and esters are coded by the same code designation. Carboxylic acids and esters on the other hand are not only separated from each other but are further divided according to the nature of the acid and the alcohol forming the ester. Carboxylic acids are listed as RCOOH and separate code designations assigned depending upon the nature of R, i.e. whether it is heterocyclic, aromatic carbocyclic or either alicyclic or aliphatic. Likewise esters are listed as RCOOR1 and separate designations assigned depending upon the nature of both R and R1. The organoheteroid family (P--) classifies elements attached to C other than C, H, N, O, S and halogen according to their combining power. Combining power is defined as the number of electrons the atom in question furnishes for sharing or transfers to other atoms. Inorganic compounds are coded as central atoms (Family R-- or T--) with other groups coordinated to them (Family S-- or U--). In the case of ionic structures, the cation and anion are coded as separate units; the fourth digit of the code designation records the number of times the group occurs in the ion rather than the number of times it occurs in the compound. The nonmetallic elements have been arranged in a series in order of increasing electronegativity proceeding from selenium through fluorine. The elements so listed are coded in Family T-- if they occur as free elements or central atoms of neutral molecules. Other elements are coded in Family R-- when free or central atoms of neutral molecules. Simple cations or central atoms of complex cations are coded in Family R--; simple anions or central atoms of complex anions in Family T--. The second and third digits of the code designations classify the elements of these families according to their oxidation state, which is calculated in the conventional way by assuming that enough shared electrons to fill the outer shell of the more electron-attracting atom belong to that atom. Families S--, U-- and V-- classify coordinate or solvate groups which act as units. Groups coordinated to an R-group are coded in Family S--; those coordinated to a T-group in Family U--; groups coordinated to a P-group in Family S-- or U-- depending upon the electro- negativity of the P-element. * 0 is used to distinguish the letter O from zero.
129 Hydrates and solvates are coded in Family V--; indeterminate structures in Family Z--. In the code for a compound, code designations are listed in the order in which they appear in the List of Group Numbers, i.e. 0-9 and A-Z. The coding of a compound is accomplished by selecting from the List of Group Numbers the first group which describes a component part of the structure. If no cotie designation is found which describes an entire group, it may be split into two or more groups for coding purposes. One systematically proceeds through the List of Group Numbers until all parts of the structure have been coded. The examples shown below illustrate the procedure. D-Glucose OH H4M. 1 -H8A. 3-H8K. 1 -IH2. 1 -099. 1 (H4M = H2C(OH)2; H8A = R-CHOH; H8K = RCH2OH, R is heterocyclic; 1H2 = tetrahydropyran; 099 = Cl) Since only C, H and O appear in this compound, the first family which codes a group in this structure is the H or (CH)O noncyclic family. H4M, shown in the code as H2C(OH)2, is the first group in this family which describes a part of the structure. The hydrogens in this group may be replaced by R. CH 6-Ouinoxalinol, 7-methoxy- F50. 2-GFR. 1 -H67. 1 -H74. 1 -NYI. 1 -099. 1 (F50 = R-j-JJ (: may be resonating double bond); GFR = pyrazine; H67 = ROR1, R and R' are aromatic carbocyclic and alicyclic or aliphatic; H74 = ROH, R is aromatic monocarbocyclic; NYI .â¢ benzene fused to a heterocyclic structure; 099 = Cj) C, N and O are all present in this compound, but only CN and CO groups occur as component groups indicated in the code. F50. 2 codes the ring nitrogens and GFR. 1 the heterocyclic ring. Fused heterocyclic rings are always coded as component monoheterocyclic rings. With the exception of the C5N ring, no distinction is made between fused and unfused heterocyclic rings. Heterocyclic rings of varying degrees of unsaturation are indicated by separate code designations. Because of the frequency of occurrence of r .]"'^ rings, these have been separated into pyrazine, Pyrimidme and pyridazine, but in all other cases the location of the heterocyclic atoms is not indicated, only the size of the ring, the number of heterocyclic atoms and the degree of unsatura- tion. Polycarbocyclic rings are coded as a unit and are not separated into their component structures. With the exception of a 6-membered ring fused to a heterocyclic structure, the same code designation is used for carbocyclic structures fused to a heterocyclic ring as for those which are not. As in the heterocyclic structures, the degree of unsaturation is indicated.
130 Benzene, arso- C6H5As02 NYR. 1-P1J. 1-U63. 2 (NYR = benzene, P1J = As5, U63 = (:O) or -O-) Since the organic groups precede the organoheteroid and inorganic in the List of Group Numbers, the benzene ring (NYR) is coded first. P1J. 1 shows arsenic with a combining power of 5 attached to carbon. The two oxygens coordinated to the arsenic are coded in Family U-- since As is listed in the electronegativity series. Potassium sodium nitrocobaltate (III), monohydrate K2NaCo(N02)6 H2O RD0. 1-RG0. l-T60. 1-U7V. 6-V61. 1 (RD0 = K+l, RG0 = Na+1. T60 = Co+3, U7V = nitro group, V61 = H2O) Only one K+ is indicated because in ionic structures the fourth digit shows the actual number of atoms present in the ion. Co is the central element of a complex anion and is therefore coded in Family T--. h o â¢ u Â£ h "c i h v 1 â¢ â¢ mpi rorr .i :al h& 0-3 o a Â£ Structural Formula Groups Present E la Â« .o : nx. i ffl ill Z *a 'f, v 00 u c 'Â« x C _. if> v E i n 4 -Â»- X V h V i C uÂ« + * b,Z 0 â i o S 0, i O Illlllllllllll m h D lllll ^> J iniiaoo noi i i 1 i i 1 1 o i 1 i M 1 1 1 i i 1 i i i i i i 1 i M i i i i i i 9 llii 1 II1 lllll 1 1 1 * â¢ i * â¢ 1 1 1 1 1 111111111111111111111111111111111111 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1222222122 3333333333 4444444444 SiSSSSSSSS (66S6B S (S6 7777777777 llllllllll S993S9! J3S 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 lit 11111111 11111 11111111172m: 1111111111111111111: 2121 333] l44| ;;;;;;;-;;;;;; 33333) 44444444 333 33333 4*444 SSSSS ii6i6 77777 333333333333333333333333333333333333 444444444444444444444444444444444444 33333333333333 44444444444444 S5S5SS5S / i4 S55 B E ( 777 til S3) SSSS5SSSSSS5SSSSS5SS5S5SSSS5SSSSSSl SSl6S6SSSS66S6S6 S S S66SSSSS S S BBBS68S 77777777777777 777777777777777777 777 J ssss SS5SS S S S SS 5 Si S 666 BB 6 66 B668S6 77777777777777 Illlllllllllll SS3333 llll Sll9 BS66 E ! 77777777 i11!!! mil iiiiiiiiiiiiiiiniiiiiiiiiiiiiiiiiii iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiri 666S 7771 lltl S39S9' 111*9111 UM SO 9999S a i1 * i1 n 999S lÂ«iiliB TM , , , ,1 i' â¢ "nf ill Fig. 1 Chemical Card
131 The Chemical Code is placed on the punched card as shown in Fig. I. The serial number of the compound is in the first 8 columns. This allows for a six digit number with two additional numbers to express salts of the parent compound. The code for the chemical structure is placed in columns 17-52. Structures which contain more than nine groups are given a designating punch, and the additional groups are punched on second cards which are kept in a separate file. The average number of groups in the code for a compound is between five and six. To facilitate sorting, the families present (Col. 9-11) and the empirical formula (Col. 53-66) are also given on the card. The empirical formula is not a true empirical formula inasmuch as the actual number of atoms of every element present in a molecule is not recorded. In columns 52 and 53, the number of carbon atoms up to 99 is shown; in subsequent columns, the number of Br, Cl, F, I, N and S are indicated up to six and oxygen to nine. In columns 59 to 66, the presence of any other element is shown by a single unique punch. The remainder of the space on the card is unassigned and may be used for physical properties when a physical property code is developed. A serial number as well as a so-called rotated file is prepared for all chemical punched cards. In the rotated file as many cards are prepared as there are groups in the chemical code of a compound. The groups are so rotated that each one in turn appears in the first field (Col. 17-20), which is used for filing purposes. In this way hand selection of all compounds containing any simple specified group may be made directly. To locate types of structures by machine methods it is important to define the question carefully. The definition of the term analogs often proves troublesome. Since only definite component groups may be searched for, it is necessary to decide what component groups fulfill the definition of analogs. Although the chemical code was developed for the express purpose of selecting types of structures in conjunction with certain biological activities, in the illustrations given here only the chemical code is employed. As was stated earlier the chemical code is not specific for a chemical structure and the specific structure can seldom be reconstructed from the code. As a result when sorting for types of structures a certain number of structures which do not fulfill the desired requirements will be obtained. It is necessary to check the answers obtained to determine whether or not they fulfill the original conditions laid down. The method of searching the files for all aminophenols is described in order to illustrate the methods used in answering a typical question. The term aminophenol was arbitrarily defined as including those compounds which contained both an amino and a hydroxy group attached to the same benzene ring. The first digit of the serial number has been assigned according to the elements present in a compound and hence only serial numbers beginning with 5 or 9 need to be used here. No others contain C, N and O in the same compound. Since no group specifically defines an amino group attached to a benzene ring, it was necessary to include all amino groups attached to an aromatic carbocyclic structure. An amino group of this type may be indicated by any one of the following groups: F51 (R-NR1); F53, F55, F57, F58, (RR'R"N with one or more R groups aromatic carbocyclic); F5C, F5E, F5G, (RR'NH with one or both R groups aromatic carbocyclic) and F5L (RNH2, R is aromatic carbocyclic). The code designation, H74, represents only hydroxy groups attached to a monocarbocyclic ring. From the rotated file the amino or F cards listed above are selected, merged in serial number order and collated with the cards filed under H74. At the time that this sort was made, there were approximately 15, 000 coded structures in the file. Three hundred twenty-seven contained at least one of the F groups as well as H74. It was then necessary to ascertain how many of these compounds were answers to the question. The 327 serial numbers were listed in numerical order by the tabulator, and visual examination of these structures in the serial number chemical card file was carried out. Two hundred ninety-nine proved to fulfill the conditions and 28 did not. Of the 28 structures, two were quinoline compounds with a hydroxy and amino group on the carbocyclic ring, the other 26 contained the hydroxy and amino groups on different rings. The NRC Chemical Code has been published and may be purchased from the National Research Council Publications Office 2101 Constitution Avenue, N. W. Washington 25, D. C.