The Crime Lab is working to strengthen public safety datasets by developing Name Match – an open-source, customizable tool to link records from different datasets.

Challenge

Linking together the criminal records of the same person across multiple data sources is complicated by data inconsistencies such as typos, name changes, and nicknames and by the lack of a unique identifier, like a Social Security number, to distinguish otherwise similar people. However, linking data across datasets is necessary for research projects that identify and analyze trends in improving criminal justice outcomes.

Opportunity

To solve this problem, the Crime Lab’s team of researchers developed the Name Match tool, which uses machine learning to identify which criminal records belong to the same person within and across datasets.

Outcome

Name Match allows users to link datasets with high accuracy and provides the flexibility to define and enforce project-specific data constraints – improving researchers’ ability to study how public safety interventions and criminal justice reforms are helping improve outcomes in the real world.

Project overview

Good data is essential to developing successful interventions to reduce crime and reform our criminal justice system. But too often, available datasets do not provide the necessary level of detail to “talk to” one another across systems; for example, many lack a unique identifier such as a Social Security number, while others have unreliable information caused by typos, name changes, and nicknames. These issues make linking an individual’s criminal record across multiple datasets—which is crucial for tracking how well an intervention works to improve outcomes—difficult, if not impossible.

To solve this problem, Crime Lab researchers developed Name Match, which uses machine learning—tools that continually leverage data to “learn” and improve performance—to make criminal records compatible in identifying individuals across systems. Our algorithm works by comparing identifying information such as name, date of birth, home address, gender, and race to find overlap with personal information that is consistent across records.

Years Active

2015

Project Leads

Eddie Tzu-Yun Lin

Eddie Tzu-Yun Lin

Data Engineer

Melissa McNeill

Melissa McNeill

Data Science Manager

Zubin Jelveh

Zubin Jelveh

Assistant Professor, College of Information Studies at the University of Maryland

Since development began in 2016, we’ve used the Name Match tool to perform record linkage for several of our evaluation projects, including READI Chicago and One Summer Chicago. In October 2023, we released Name Match on GitHub as an open-source Python package so other researchers, non-profits, and public sector agencies could benefit from the tool and contribute to its ongoing development.

Related Resources
Customizable probabilistic record linkage with Name Match | PyData NYC 2022
Presentation

Customizable probabilistic record linkage with Name Match | PyData NYC 2022

Nov 2022