Information Softworks, Inc.

StatJoin - Master Person Index System

Modern Information Systems projects and Data Warehouse implementations will often try to leverage external data sources, in addition to internal ones. This is especially true in health care, where patient information comes from an overwhelming number of disconnected sources.

The biggest barrier to mining actionable information from these heterogeneous sources is that they all exist in their own "silos", with their own keys, which are useless for addressing that person's relationship to an equivalent record in a separate system. Software designed to address this gap is sometimes referred to as a "Record Linkage", "De-duplication", or "Record Match" system. The output of such a system would be a list of those records pairs identified as representing the same person. A Master Person Index.

"without an MPI, 10% of matching records will not be found"

The simplest solution to the problem is to simply compare the field or fields that describe that record (typically a person). However, typos, mis-spellings, abbreviations, nicknames, married names, or even simple formatting differences can deny the ability to identify matching records. We recently measured the impact of these issues and found that if one attempts to identify matching records using first name, last name, and birth date without an MPI, 10% of the matching records will not be found.

The purpose of StatJoin is to use the properties of data from each of the source systems to statistically identify that last tenth of the available records which would otherwise be unmatchable, and make them usable.

Features

Statistics-Based

No automated system can say authoritatively that two data records represent the same person. The computer is limited to only the data it has. The best it can do is provide you with a probability based on the number of cases where records with similar properties. StatJoin performs an actuarial analysis of the idiosyncrasies of your own data to report a logarithmic likelihood-estimate of whether two records represent the same person.

Flexible

The system is designed around user-configurable tests. The system has already successfully evaluated the following field types

Each data type can have an unlimited number (including zero) of any testable value. A person has 12 different married names? 35 cell phone numbers? No SSN? All of these are built in to the core of StatJoin.

Powerful

Some of the features built-into the matching system:

Fast

The match system will typically achieve over 1,000 record-pair evaluations per second, using only commodity-level "desktop" hardware.

Portable

The system is designed to run entirely within your database server. Production instances of the system have been successfully deployed using:

Database ServersOperating Systems
  • Oracle
  • Microsoft SQL/Server
  • Postgresql
  • Microsoft Windows
  • Linux

Turn-Key

You connect us with your data source(s), and we supply a complete indexing system in your native environment.

Open Source

When you engage a contract to provide a Master Patient Index service, you get the full source code and documentation. You have full control of how, where, and when the system is used, and can modify it to adapt to changes in the future.

Guaranteed

The Master Person Indexing system is guaranteed to out-perform your existing system. Give us a chance to show you the duplicate records you are already missing.