The Use of the Social Security Number as the Basis for a National
Statement of the Problem
Although there is little disagreement that some type of unique universal citizen identifier is necessary for creating a complete, lifetime, patient-centered, computer-based health record, there is considerable disagreement over what that number should be. This paper makes the argument that a number derived from the Social Security number (SSN) and administered by the Social Security Administration (SSA) is the best and most economical solution to this problem. Arguments against the SSN are, for the most part, arguments about any identifier that might be universally used to identify individuals for bringing together all data relating to their health care.
New models for health care delivery, particularly managed care, can be fully supported only through an integrated, electronic information system. The concept of a lifetime, patient-centered health record containing, at least logically, all data from all sources are key to delivering high quality, cost-effective care. Patients receive that care from a variety of providers in a variety of settings. The information system must be able to aggregate data about a person into a single, logical record. To do this integration, the identity of a person must be unequivocally established in the sending and receiving systems.
There are two different problems in establishing patient identity. The first problem is to establish the identity of a person with respect to a presented identification number. This process is called authentication, and several options are available. In the past, authentication has usually been accomplished by a person presenting a card with an identification number. Biological identifiers, such as a thumb print reader, are becoming affordable and can establish a person's identity with a high degree of certainty. The other problem occurs when data are being transferred between two systems, and the patient is not available.
Some people propose the use of demographic data such as a person's name, date of birth, mother's birth name, and/or address. Inconsistency in these data parameters are a source of trouble. In the case of a name, comparison of databases shows inconsistency in the use of name order, full names versus initials, nicknames, and the occasional omission of suffixes, such as Jr. or Sr. Many people have multiple addresses, mailing addresses, home addresses, and incomplete entries. The listed date of birth, particularly in the medical records setting, may be in error. The literature and my own experience suggest that approximately 30 percent of the entries in two databases that are being merged require resolution by a human.
The Social Security Act was signed into law on August 14, 1935, by Franklin D. Roosevelt. The Social Security Board recommended the adoption of a nine-digit numbering system for identification purposes and was granted authority by the Treasury Department on November 1936 for the assignment of numbers to people who were employed. The SSN is a nine-digit number broken into three groups. Its form is 999-99-9999. The first three digits, called the area number, are determined by the address shown on the application for the SSN (now based on Zip code). Initially, the United States was divided into 579 areas numbered from 001 to 579. Each state was assigned certain area numbers based on the number of people in the state expected to be assigned an SSN. At present, area numbers 001 through 647, with the exception of 588, have been assigned. In addition,
area numbers 700 through 728 were assigned to railroad workers until 1963, at which time the practice was discontinued. The area number has little meaning today due to the mobility of people. The next two-digit group, called the group number, has no special significance except to break the numbers into convenient blocks. The last four-digit group, called the serial number, is assigned sequentially within each group. Note that no group contains only zeroes.
In a study done at Duke University, examining the SSNs of approximately 150,000 individuals, the last six digits of the SSNs were uniformly distributed. This uniform distribution is particularly valuable for certain hash-code indexing techniques.
In the 1960s, the use of the SSN spread to the Internal Revenue Service for tax purposes, the Department of Defense for military designation, and the Civil Service Commission for employee identification. In 1976, states were authorized to use the SSN for tax purposes, public assistance, and for driver's license or motor vehicle registration. A number of states use the SSN on the driver's license.
Analysis and Forecast
Value of a Universal Citizen Identifier
Simply put, the most reliable method of integrating data from multiple sources is to have a unique identification number known to all sources. In the absence of such a number, combining data from multiple sources or even reliably identifying a person within a single source is difficult. If we fail to identify a person in the health care environment, that person's data are split into multiple records and valuable data are misplaced.
Community health care information networks (CHINs) and statewide alliances are becoming popular in which health care information about a person is available, with proper safeguards, to those people responsible for a patient's care. Failure to associate known health care data about a patient can lead to serious consequences. For example, if the patient is allergic to a certain drug and he or she is misidentified and that information is not available, that important point could be missed. If, in fact, we believe that information about the patient's health, medications, allergies, problems, and treatment plans is important, then we must be sure that the information is available to the proper health care providers. The highest probability of making that happen is through the use of a unique universal identifier.
Requirements for a Universal Citizen Identifier
The universal citizen identifier (UCI) must be unique. Each person must possess one and only one identification number. A UCI number, once assigned, can never be reassigned. A UCI should be assigned at birth or when a person becomes a resident of this country.
The UCI should be context free. The UCI is a pointer to data about a person. It should not attempt to convey any information about gender, age, or geographical area where a patient was born or now lives. Its sole purpose is to link the number to one or more data banks.
A system must be established for creating an identification number for foreign visitors and illegal aliens. Such a number must also possess the characteristic of uniqueness and must never be reassigned. We now have international telephone numbers that use a country code. These numbers are of various lengths and format. We might use a similar scheme for personal identifiers. The popularity of international travel and the availability of the Internet make it particularly feasible to transmit a person's health record to any country. A known identification number would make that process more reliable.
One of the commonest errors that results in the misidentification of a patient, even with the use of a patient identification number, is the transposition of two numbers. The use of a check digit would provide a solution. There are several check digit algorithms. Generally the check digit is generated by multiplying each digit of the identifier, in order, by a weighted multiplier. The resulting product is divided by some number and the remainder is taken as the check digit. This digit becomes part of the identification number and is entered into the
computer. The computer, in turn, calculates the check digit and compares it to the entered number. If they match, the entered number is assumed correct. If it is different, the number is rejected. ASTM recommends the use of a lookup table to determine the check digit.
The UCI should use both letters of the alphabet and numerals to make up the identification number. Certain letters, which might be mistaken for numerals, should be omitted. Examples are the letters "O" and "Q," which might not only be mistaken with each other but also with the numeral "0." If lower case letters are used, the letter "l'' might be mistaken for the numeral "1." In any case the number of unique combinations of some 30 elementsand with lower case some 62 elementswould more than handle the population of the world for a long time. For economic reasons, I recommend that numerals be used as long as unique combinations are available, and that letters then be added one position at a time. Most legacy systems could accommodate numerals without a problem, and there would be ample opportunity to plan for the accommodation of letters.
Validation of the UCI
The biggest problem with any personal identifier system is establishing and maintaining an error-free link between the actual person and the associated number. The Internal Revenue Service recently reported 6.5 million cases of missing, invalid, or duplicate Social Security numbers (Fix, 1995). Most of these errors were the result of recording errors. Other duplications occurred in connection with an attempt to defraud the IRS. In one case, an SSN was used more than 400 times. There is no question that duplicate SSNs exist. One story suggests that when the announcement of the SSN program was published in the newspapers, a sample SSN was included. Many people apparently thought that this number was what they were supposed to use and accepted that published number as their SSN. Another story is that many people, in purchasing a new wallet that included a dummy SSN card, accepted that number as their SSN. In some cases, the SSA apparently reissued SSNs. In other duplications, people have simply made recording errors and have been using incorrect numbers for many years. Increased use of the SSN has resulted in a significant reduction in these duplications for a number of years.
Validation of the UCI will require the creation of a database containing demographic and identifying data about every resident of the United States. Considerable thought is required to define this database, and it will ultimately be a trade between what is required to identify an individual uniquely and what should not be included to protect the rights of the individual. This database could be used for other purposes as well. Certainly, the existence of such a database would reduce the effort of producing a census and of being able to do population-based statistics. Many citizens would not be concerned about the existence of such a database; others would consider any database an invasion of privacy. Nonetheless, everybody is already in many databases and the anonymity of these databases permits easy abuse. Legislation would be required to protect the contents and use of such a database. This topic is explored below.
Keeping a UCI database up to date would be a difficult challenge. Some items should never change, others might change infrequently, and others might change with some frequency. Elements in the database would include a person's name, gender, marital status, race or ethnicity, date of birth, and address. Persons would be responsible for informing the agency of change, perhaps as part of some annual event.
Arguments for Advantages of Using the SSN Over Other Proposals
Under the assumption that a personal identifier system is selected, that system would have to be administered by some agency. One possibility is that a private, trusted authority could be given the responsibility of assigning the UCI and maintaining the accompanying database. Another possibility is that a new government agency could be created to administer the UCI. Another option is to use the existing SSA to administer the UCI program. Setting up a new agency with the accompanying bureaucracy would take longer and cost more than using an existing agency.
There are over 1300 Social Security offices distributed around the United States where a person can apply for and receive a Social Security number. Evidence of identity, age, and U.S. citizenship or lawful status is required. All applicants over the age of 18 must apply in person. Individuals under age 18 or those seeking replacement cards may apply in person or by mail. Nonwork SSNs may be assigned to illegal aliens if they receive benefits payable in some part from federal funds.
SSNs are assigned at the SSA's central headquarters in Baltimore. Key data elements are a person's full name, date and place of birth, mother's maiden name, and father's name. These elements are used to screen the SSA database to prevent the assignment of more than one number to the same person. If no match occurs, a new SSN is assigned. If a significant match occurs, a replacement card is issued. The current system assigns an SSN within 24 hours of receipt of the application. Cards are sent by mail and usually require 7 to 10 days for delivery.
Beginning in 1989, the SSA began a program in which an SSN can be assigned to a child as part of the birth registration process. This procedure currently requires a parent's approval. The percentage of birth registrations including a request for an SSN is more than 75 percent and is increasing.
As of March 3, 1993, 363,336,983 SSNs had been issued. The number of currently active SSNs (of living people) is estimated to be approximately 250 million. It is estimated that approximately 4 million individuals may have more than one number.
The Privacy Act of 1974 (5 U.S.C. 552a) states that it is unlawful for any federal, state, or local government to deny an individual any legal rights or benefits because the individual refuses to disclose his/her SSN. There is no legislation concerning the use of the SSN by nongovernment entities.
The SSA recently announced that the agency was undergoing a reorganization. My recommendation is to give the SSA the tasking authority and the required funding to administer a UCI program.
The Case for a Single Identifier for All Purposes
The increasing use of the SSN for identification purposes supports the argument that a universal, unique identifier has value. An individual's having only one number that he/she would use for any identification purpose would represent a considerable savings for federal agencies, vendors, health care agencies, and any other organization that creates a database. The suggestion that a single number could be used to access patient data in any of these databases or to join data from any database regardless of purpose or owner is frightening. Yet, in this age of connectivity and computerization, it is a trivial problem to link any number system, particularly if 100 percent accuracy is not sought. Anyone who thinks that confidentiality is preserved by requiring different numbers is misinformed. I would argue the opposite. Given a single number, it would be possible to provide more positive controls in making sure that the number is not misused. I therefore recommend that the UCI be permitted to be used in any legal operation subject to the individual's approval.
Confidentiality, Privacy, and Security
In a recent opinion poll conducted by the Louis Harris organization, 85 percent of those polled agreed that protecting the confidentiality of people's medical records is essential (Louis Harris and Associates, 1993). In that same pool, 67 percent indicated a preference for the SSN as the preferred national health care ID number.
There can never be any security in a publicly known personal identifier. Security and protection of an individual's privacy must be provided through each database and the supporting applications. All individuals have certain rights relating to who sees data about them, how those data are used, and the opportunity to review and correct errors in the database. In the case of health care data, the patient should be able to define, in writing, by whom and under what circumstances those data may be used. On the other hand, a health care provider should be told when data are being withheld and, except in emergency situations, should be able to refuse treatment. In an emergency situation, if the provider makes an incorrect decision due to lack of complete information, that provider should be protected from malpractice lawsuits. Individuals should be able to request a list of all persons who have accessed their data.
The inability to correctly identify a patient's data from some type of patient ID might actually result in less, rather than more, protection of confidentiality. For example, if a patient indicated that "my primary care physician" could see the record and the patient's ID did not match the record, such a discrepancy would permit inappropriate access to data. Overly strict rules and computer-enforced rules are risky where patient care is involved. Blocking health care providers from access could lead to serious consequences in the case of an emergency. Proper education of users of data and emphasis on the need to preserve confidentiality are of particular importance.
Federal legislation must be passed making it illegal to acquire data of any type against a person's written wishes. If legally or illegally acquired data are used for purposes for which they were not intended, the individual acquiring the data should be punished by law. Such action should be considered to be as serious as a bank robbery, and punishment should be similar. Individual confidentiality can only be assured through legal constraints. It cannot be achieved through confusing identifiers that might prevent databanks from being accessed or linked.
Federal legislation should also spell out the security requirements required for each organization that would use the UCI as the pointer to data contained within the databank. Each of those organizations should be required to have an information security officer who would ensure that confidentiality and security requirements were met.
I recommend that legislation be passed that will task and fund the SSA to be the administrator of a universal citizen identifier, which may be used for a variety of purposes as a patient identifier. Use of this number for a databank must be requested by an organization and approved by the SSA. Access to data must be logged by individual and organization, date and time, and purpose. The UCI would be based on the SSN and would be the currently assigned SSN plus a check digit. The SSA, in establishing the validating databank, would eliminate duplicates. An added advantage of this approach would be eliminating errors in calculating and paying Social Security benefits.
New UCIs would be issued electronically to newborns and to individuals moving to this country, either as citizens or as legal entrants. Illegal aliens would be assigned a number from a selected and identifiable set. Foreign visitors would also be assigned a permanent number. Legislation protecting the use of the UCI and guaranteeing protection of the rights of an individual would be simultaneously introduced.
Electronic access to a regional office would be by Internet, a state information network, or even by modem. Information would be transmitted electronically. That information would be verified before the assignment of the UCI was made permanent. Special efforts would be made to avoid fraud. SSN cards would be coded to make creation of false cards very difficult.
The American College of Medical Informatics, of the American Medical Informatics Association (ACMI, 1993), the Computer-based Patient Record Institute, and the Working Group for Electronic Data Interchange have all recommended the use of the SSN as a UCI. Several states are now using the SSN for identification purposes, including in the management of health care benefits. Many third-party payers use the SSN as the basis for the subscriber identification.
We recognize the emotional issues associated with the use of a UCI (Donaldson and Lohr, 1994; Task Force on Privacy, 1993). Those emotions are correct and understandable. Unfortunately, the suggested solution of not having a universal identifier, or even of restricting such an identifier to use only in the health care setting, will provide little protection. Instead, open use of an identifier with safeguards and audits will provide greater protection. The advantages of being able to integrate personal health care data over a variety of settings and systems far outweigh the risks of such a system. The important thing is to recognize that the use of a universal health care identifier, and specifically the SSN, does not in itself mean a lack of concern for patient confidentiality or an inability to preserve that confidentiality.
Already we are paying a penalty for the lack of such an identifier. Time is important. Now is the time for action.
Much of the information relating to the SSA was taken from an early draft of an ASTM document (ASTM, 1995), "Guide for the Properties of a Universal Healthcare Identifier," written by Dr. Elmer R. Gabrieli and provided by Andrew J. Young, deputy commissioner for programs, Social Security Administration.
American College of Medical Informatics (ACMI). 1993. "Resolution for the Use of the SSN as a Universal Patient Identifier," ACMI, Bethesda, Md., February.
ASTM. 1995. "Guide for the Properties of a Universal Healthcare Identifier," draft proposal developed by ASTM, Philadelphia, Pa., January.
Donaldson, Molla S., and Kathleen N. Lohr (eds.). 1994. Health Data in the Information Age: Use, Disclosure and Privacy. Institute of Medicine, National Academy Press, Washington, D.C.
Fix, Janet L. 1995. "IRS Counts 6.5 Million Errors So Far," USA Today, April 5.
Louis Harris and Associates (in association with Alan Westin). 1993. Health Information Privacy Survey 1993. A survey conducted for EQUIFAX Inc. by Louis Harris and Associates, New York.
Task Force on Privacy. 1993. Health Records: Social Needs and Personal Privacy. Task Force on Privacy, Office of the Assistant Secretary for Planning and Evaluation and the Agency for Health Care Policy and Research, Washington, D.C., February 11–12.
Work Group on Computerization of Patient Records. 1993. Toward a National Health Information Infrastructure. U.S. Department of Health and Human Services, Washington, D.C., April.