StatJoin - Master Person Index System
Modern Information Systems projects and Data Warehouse implementations will often try to leverage external data sources, in addition to internal ones. This is especially true in health care, where patient information comes from an overwhelming number of disconnected sources.
The biggest barrier to mining actionable information from these heterogeneous sources is that they all exist in their own "silos", with their own keys, which are useless for addressing that person's relationship to an equivalent record in a separate system. Software designed to address this gap is sometimes referred to as a "Record Linkage", "De-duplication", or "Record Match" system. The output of such a system would be a list of those records pairs identified as representing the same person. A Master Person Index.
The simplest solution to the problem is to simply compare the field or fields that describe that record (typically a person). However, typos, mis-spellings, abbreviations, nicknames, married names, or even simple formatting differences can deny the ability to identify matching records. We recently measured the impact of these issues and found that if one attempts to identify matching records using first name, last name, and birth date without an MPI, 10% of the matching records will not be found.
The purpose of StatJoin is to use the properties of data from each of the source systems to statistically identify that last tenth of the available records which would otherwise be unmatchable, and make them usable.
Features
Statistics-Based
No automated system can say authoritatively that two data records represent the same person. The computer is limited to only the data it has. The best it can do is provide you with a probability based on the number of cases where records with similar properties. StatJoin performs an actuarial analysis of the idiosyncrasies of your own data to report a logarithmic likelihood-estimate of whether two records represent the same person.
Flexible
The system is designed around user-configurable tests. The system has already successfully evaluated the following field types
- Name (including first, middle, last/maiden, suffix)
- Mailing address
- Phone number
- SSN
- Medicare ID
- Blood Type
Each data type can have an unlimited number (including zero) of any testable value. A person has 12 different married names? 35 cell phone numbers? No SSN? All of these are built in to the core of StatJoin.
Powerful
Some of the features built-into the matching system:
- Test across fields: Were the first and last name fields inadvertently swapped?
- Built-in integration of adaptive text-match algorithms such as Levenshtein, Soundex
- Recognition of common nicknames
- Integrate real-world information, e.g. females are more likely to have different/hyphenated surnames than males
- Frequency-based weighting -- two "Smiths" are less likely to be the same person than two "Berlingos"
- Systematic identification of twin siblings with slightly different names, and Junior/Senior relationships with identical names
Fast
The match system will typically achieve over 1,000 record-pair evaluations per second, using only commodity-level "desktop" hardware.
Portable
The system is designed to run entirely within your database server. Production instances of the system have been successfully deployed using:
Database Servers | Operating Systems |
---|---|
|
|
Turn-Key
You connect us with your data source(s), and we supply a complete indexing system in your native environment.
Open Source
When you engage a contract to provide a Master Patient Index service, you get the full source code and documentation. You have full control of how, where, and when the system is used, and can modify it to adapt to changes in the future.
Guaranteed
The Master Person Indexing system is guaranteed to out-perform your existing system. Give us a chance to show you the duplicate records you are already missing.