Nilani Algiriyage

Ministry of Business, Innovation, and Employment | Data Scientist

Nilani is a Data Scientist with the Data Science team at the Ministry of Business, Innovation and Employment (MBIE). Coming from an academic background she has leveraged Python to solve a variety of statistical, analytics, and data science problems. Nilani is committed to making sophisticated data science concepts accessible to non-technical stakeholders, enabling organizations to leverage data for measurable business outcomes.

Abstract

Connecting the Dots: From Gazette Automation to Probabilistic Entity Resolution

Organizations across both the public and private sectors frequently manage datasets that lack unique identifiers, such as client numbers or IDs. This absence creates significant challenges in accurately linking records that belong to the same individual or entity. Without reliable linkage, duplicated records, fragmented profiles, inconsistent reporting, and reduced data quality can hinder both operational efficiency and decision‑making.  

Probabilistic record linkage provides a powerful method to address these challenges by estimating the likelihood that two records refer to the same entity based on non-unique attributes such as name, gender, and date of birth. Unlike deterministic methods, which require exact matches, probabilistic techniques allow for flexibility in handling typographical errors, missing values, and variations in data entry.

A practical example comes from Immigration New Zealand’s work monitoring accredited employers who receive liquidation notices published in the New Zealand Gazette. Historically, this was a fully manual and error‑prone process that required manual text extraction and spreadsheet‑based matching. We automated the workflow using APIs, Power BI, R (RegEX), SharePoint, and Power Automate, turning hours of human processing into a near real-time data pipeline. However, once automated, a deeper issue surfaced: directors listed in New Zealand Companies Office data do not have unique identifiers, making it difficult to track individuals across multiple companies. In this talk, we will showcase the Gazette Workflow and how Python, and specifically the Splink library, can be leveraged to design and implement scalable probabilistic record linkage systems. We will highlight how this linkage layer builds on earlier automation efforts, sharing practical lessons from real‑world implementation across both automation and probabilistic matching.