National Academies Press: OpenBook
« Previous: Chapter 13 A Checklist for Evaluating Record Linkage Software
Suggested Citation:"Software Demonstrations." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×

Software Demonstrations

■ MatchWare Product Overview

Matthew A.Jaro, MatchWare Technologies, Inc.

Probabilistic linkage technology makes it feasible to link large data files and achieve results governed by mathematical principles which adhere to statistically valid standards. The problem addressed by this methodology is that of matching two data files under conditions of uncertainty. The objective is to identify and link records which represent a common entity whether that entity is an individual, a family, an event, a business, an institution, or an address. As an alternative the goal might be to unduplicate a single data file or to group records by categories of commonality. Each field participating in the linkage comparison is subject to error which is measured by the probability that the field agrees given a record pair matches versus the probability of chance agreement of its values. Thus, when one calculates the likelihood of a correct match or link while allowing for incomplete and/or error conditions within the records, the process is said to be probabilistic. I.P.Fellegi andA. B.Sunter pioneered record linkage theory in the late 1950s. The first practical implementation of probabilistic linkage methodology in the United States was originally designed, programmed, and tested by Matt Jaro on behalf of the U.S. Census Bureau in 1985, while conducting research into establishing a model to support census coverage undercount evaluation and analysis.

Probabilistic record linkage methodology is imperative if computers are to consistently and effectively replicate the evaluation and judgment process of human clerks attempting to link common records. The ideal goal is to have the computer emulate the intuitive thought process of a human being as they might review, judge, evaluate, measure, and score linkage qualifications of records representing commonality.

MatchWare's development, systems design, and programming staff rigorously and strictly adhere to ANSI-C programming language standards for all software implementations. As a result, MatchWare software has achieved an exceptional level of cross-platform portability and can be integrated into a wide range of application solution specific systems. Following are the products currently offered by the company:

AutoStan is an intelligent pattern recognition parsing system which conditions records into a normalized/standardized fix fielded format. AutoStan optimizes the performance of any linkage or matching system which utilizes consumer or business names and/or address data as identifiers during a match comparison. AutoMatch is a state-of-the-art software implementation of probabilistic record linkage methodology for matching records under conditions of uncertainty. AutoMatch simulates the thought process a human being might follow while examining and identifying data records representing a common entity or event. AutoMatch 's comparative algorithms manage a comprehensive range of data anomalies and utilize frequency analysis methodology to precisely discriminate weight score values.

AutoStan and AutoMatch are stand-alone, self-contained software systems which include numerous support utilities and require no other ancillary software. Both systems are generalized and support a wide range of mission critical record linkage applications. AutoStan and AutoMatch adhere to widely accepted standards of statistical methodology to ensure valid results and the highest levels of data integrity. Users have ready access to Rule/Table Portfolios in order to calibrate the software for their particular requirements. MatchWare/CL is a callable library (API) version of AutoStan and AutoMatch functionality in

Suggested Citation:"Software Demonstrations." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×

executable module form. MatchWare/CL utilizes AutoStan and AutoMatch Rule/Table Portfolios, weight scoring formulae, and statistical algorithms. MatchWare/CL is compatible with any database management system or user interface, and has been integrated into a variety of application solution specific systems.

Both AutoStan and AutoMatch are generalized and support a wide range of mission critical health data registry, geocoding, and database marketing applications.

For more information, contact Max Eveleth Jr., Executive Vice President, MatchWare Technologies, Inc., 153 Port Road—2nd Floor, Kennebunk, ME 04043–5135; Phone: (207) 967–2225; Fax: (207) 967–8362; or e-mail: meveleth@matchware.com.

■ µ- and t-ARGUS: Software Packages for Statistical Disclosure Control

Anco J.Hundepool, Agnes Wessels and Lars van Gemerden, Statistics Netherlands

In recent years, Statistics Netherlands has developed a prototype version of a software package, ARGUS, to protect microdata files against statistical disclosure. The launch of the SDC-project within the 4th framework of the European Union had enabled us to make a new start with the development of software for Statistical Disclosure Control. More information on the SDC-project can be found at http://www.cbs.nl/sdc.

This prototype has served as a starting point for the development of µ-ARGUS, a software package for the SDC of microdata files. The aim is to produce a data file for which the risk of disclosure has been minimized and which can be supplied to researchers and other users. The basic principle of µ-ARGUS is that frequency tables of combinations of identifying variables are inspected. If the frequency in a cell is too low, it means that a certain combination does not occur frequently enough in the population and that the corresponding records, therefore, can easily be identified by an intruder. Techniques used in µ-ARGUS to solve these problems are global receding (using less detailed code lists) and local suppression (imputing missing values in these combinations).

This SDC-project, however, also plans to develop t-ARGUS—software devoted to the SDC of tabular data. t-ARGUS takes the dominance-rule as a starting point to identify the unsafe (primary) cells, although other rules could be used, as well. Global receding is applied to reduce most of the unsafe cells and optimization techniques are used to find a optimal set of secondary cells, which must be suppressed to protect the primary unsafe cells.

Both µ- and t-ARGUS have been developed for Windows 95 PC's. However, we have developed ARGUS using Borland C++, which raises the possibility of easily generating modules (the parts of ARGUS accessing large datafiles) to be used on other platforms like UNIX.

Further information can be obtained from Anco Hundepool, Department for Statistical Methods, Statistics Netherlands, P.O. Box 4000,2270 J.M.Voorburg, The Netherlands; tel: +31–70–3375038; fax: +31–70– 3375990; or e-mail: argus@cbs.nlofahnl@cbs.nl.

Suggested Citation:"Software Demonstrations." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
■ OX-LINK: The Oxford Medical Record Linkage of the PC Version

Leicester E.Gill, University of Oxford, UK

The micro-computer version of OX-LINK is being used to match a dataset containing 150,000 hospital discharge and vital records. The matching and linking process is undertaken in three stages:

  • The creation of an ONCA header, which is attached to every record on thedataset.

  • Sorting the file on the keys which are stored in the ONCA header.

  • Running OX-LINK to create a file of potential match pairs. A number of output files are produced which are used for verification of the match by clerical staff. The threshold weight matrix can be edited using Microsoft EDIT, and the whole of this stage can be rerun to demonstrate the changes in acceptance weight.

For more information, write to:

L.E.Gill

University of Oxford

Unit of Health-Care Epidemiology

Institute of Health Sciences,

Old Road, Headington, Oxford, OX37LF or e-mail: leicester.gill@clinical-epidemiology.ox.ac.uk or lester@pgme.warwick.ac.uk.

■ Software for Record Linkage of Primary Care Data

John R.H.Charlton, Office of National Statistics, UK

The UK Royal College of General Practitioners collected data on all consultations in sixty practices in England and Wales over a one-year period 1991/92. In addition, socio-economic data were collected by survey from all patients registered with these practices. Each practice was sent a copy of its own data and the data from all the practices were combined into one dataset containing information on about 1.5 million consultations and about half a million patients.

The software demonstrated was written so that individual practices could easily access their own data, without specialised database software, or knowledge of the data structures and codes. Later, a modified program was written so that the Royal College of General Practitioners could extract data from the combined data from all practices. An anonymized version of the dataset was made available to other researchers and a further modified version of the program was produced for use with thisdataset.

The program has two main functions. Firstly, to enable researchers to link different parts of the dataset, particularly patients and diseases, and secondly, to provide data summaries such as frequencies and rates. It is based on the Paradox database software and written in PAL, the language provided with Paradox for DOS. An installation program is provided to convert the ASCII files provided into the Paradox tables used by the program. The program can be run under either DOS or Windows.

Suggested Citation:"Software Demonstrations." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×

For more information, contact Judith Charlton, 195 Warren Road, Orpington, Kent, BR6 6ES, U.K.; e-mail: 100025.1356@compuserve.com.

■ GRLS—Record Linkage

Kathy Zilahi, Statistics Canada

This product addresses the problem of trying to link records where no unique identifiers exist. Our Generalized Record Linkage System (GRLS) was developed to enable such problem linkages to be successfully accomplished. GRLS improves both the quality and the ease of your linkage.

Features

Based on statistical decision theory, GRLS breaks a linkage operation into three steps:

  • Search: Using comparison rules and associated linkage weights, the files are matched and a database of potential links is created.

  • Decide: Linkage weights are refined and by using threshold weights, the potential links are divided into sets of possible and definite links.

  • Group: Records which pertain to the same entity (person, business, etc.) are grouped together (the output of GRLS).

The GRLS record linkage system:

  • provides a convenient framework for testing linkage parameters;

  • allows concurrent users for each linkage project;

  • allows background or interactive linkage;

  • eliminates confusion (and paper!) with on-line help;

  • makes your final linkage fast, cheap and accurate.

Applicability

GRLS handles one-file (internal) and two-file linkages such as:

  • unduplicating mailing address lists (one-file);

  • bringing hospital admission records together to build “case histories” (one-file);

  • epidemiology studies: e.g., linking a file of workers exposed to potential health hazards, to a mortality database for the purpose of detecting health risks associated with particular occupations (two-file).

Suggested Citation:"Software Demonstrations." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Platform Specifications

GRLS uses a client-server architecture, where a PC is the client and a UNIX box is the server. The ORACLE relational database management system Version 7.3 with SQL*PLUS, PL*SQL, PRO/C, FORMS 4.5 runtime, GRAPHICS 2.5 runtime and a “C” compiler are also required. With ORACLE Version 7.3, distributed processing can easily be achieved by using either a remote or local host from a mainframe, mid-range computer, or PC.

Contact Information

For more information, contact Ted Hill, by phone: (613) 951–2394; fax: (613) 951–0607; or e-mail: tedhill@statcan.ca; or Bonnie Rideout, by phone: (613) 951–1714; fax: (613) 951–0607; or e-mail: bburges@statcan.ca.

Suggested Citation:"Software Demonstrations." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
This page in the original is blank.
Suggested Citation:"Software Demonstrations." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 489
Suggested Citation:"Software Demonstrations." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 490
Suggested Citation:"Software Demonstrations." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 491
Suggested Citation:"Software Demonstrations." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 492
Suggested Citation:"Software Demonstrations." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 493
Suggested Citation:"Software Demonstrations." National Research Council. 1999. Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition. Washington, DC: The National Academies Press. doi: 10.17226/6491.
×
Page 494
Next: Appendix: List of Attendees »
Record Linkage Techniques -- 1997: Proceedings of an International Workshop and Exposition Get This Book
×
MyNAP members save 10% online.
Login or Register to save!
  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!