Schedule | R Exchange 2026

8:15am

Doors open

9:00am

Welcome

Uli Muellner | Epi

9:15am

Positron: Data Science Without Constraints

Nick Strayer | Posit

There's no shortage of explanations of what Positron does, but the why behind those features is just as important. Positron builds on lessons from RStudio and other data science IDEs, with the goal of supporting exploratory, collaborative, and multi-language data science work without switching tools. Loved your workflow in RStudio but suddenly found yourself on a Python-based team? Positron lets you bring your favorite tools, like the plots viewer and variables pane, into a Python-based workflow. That same philosophy extends to how you work with notebooks. Positron's new notebook editor is designed to work across languages and notebook formats while taking full advantage of the surrounding IDE, bringing tools like the data explorer and Positron Assistant right into your notebook workflow. We'll end by highlighting what's new and upcoming in Positron as it continues to pursue this vision.

9:40am

A Look Inside GeoNet

Sam Taylor-Offord | Earth Sciences New Zealand

GeoNet is an operational science programme at Earth Sciences New Zealand (ESNZ; previously GNS Science and NIWA) that provides free and open natural hazard data for research and scientific advice to government. Coming up on its 25 year anniversary GeoNet today is a familiar name in the world of earthquake, volcano, tsunami and landslide information. Many will be familiar with GeoNet's app and website, but what about the sensor network and data pipeline that underpins it?

In this talk I will share an overview on the inner workings of GeoNet from data collection in the field through to its open data archive and emergency response role. From here, I will then explore some of the challenges GeoNet faces and how it addresses them using Python-based data science with a focus on how we present and communicate outputs to decision-makers whether colleagues, data users or stakeholders.

10:05am

Overcoming technical bottlenecks in data sharing - the eDNAbridge R package

Kiel Hards | Epi

Fiona Hodge | Ministry for the Environment

Environmental DNA (eDNA) is a powerful new tool for biomonitoring, biosecurity, ecology and conservation. However, the process to publish eDNA datasets into openly accessible repositories presents a challenging process that impedes open data sharing for the greater good. This talk introduces eDNABridge, an R package developed for the Ministry for the Environment in collaboration with GBIF New Zealand, Wilderlab and New Zealand data owners that automates the flow of eDNA results from laboratories directly into the global GBIF data repository. We present results from a pilot study, including over 4500 sampling results, and show how standardisation and automation can enable the timely reporting of complex data at scale to the benefit of a diverse group of stakeholders.

10:35am

Session Q&A

10:45am

Morning tea break and networking

11:30am

Workshops

Taking the plunge into Positron - an introduction and discussion of Posit's R & Python friendly Data Science IDE

Hosted by Nick Snellgrove and Ben Rhodes (Epi)

Since its beta release in 2024, Positron - the latest Data Science IDE from Posit and the spiritual successor to RStudio, has taken the open source data science community by storm. The IDE combines the many well loved features of RStudio with the modern interface and functionality of VS Code, and provides support for both R and Python out-of-the box to improve data exploration, processing and code outputs. Join us for a discussion into the features Positron offers, explore if Positron is right for you, and let us show you how our team at Epi switched successfully from RStudio to Positron.

Low cost & low stress approaches to getting started with open-source tools

Hosted by Chris Knox (NZ Herald) and Uli Muellner (Epi)

Let's be honest it can be easy to feel overwhelmed by the possibilities of open-source tools like R and some of the advanced work that is being showcased. But it doesn't have to be this way - the beauty of open-source tools is that they can serve anyone - from single user to large corporations. In this workshop we will showcase the basic principles of open-source data science and discuss how to use them alongside your organisation's existing processes. Chris will share how he puts these into practice in his daily work as a journalist and how you can build a basic data science platform for yourself that will enable you to make progress on version control, data sharing, data analysis and visualisation without expensive licenses or commercial support.

Integrating AI in your dashboard

Hosted by Kiel Hards and Petra Muellner (Epi)

This workshop provides a targeted and practical view on integrating AI into existing R & Shiny applications to reduce manual effort, improve interpretability, and support better decision‑making. We explore how AI can assist users with data exploration, summaries, and parameter choices, while dashboards retain their critical role in providing structure, context, and repeatability.

1:00pm

Networking lunch

2:30pm

QuPath Spatial Analysis and Visualisation tool

Cynthia Morgan | Malaghan Institute

Multiplex immunofluorescence (mIF) is a useful technique that allows the quantification and spatial analysis of cellular characteristics within 2D tissue sections. While many spatial analysis pipelines are technically challenging or require proprietary software/hardware, the free and open-source software QuPath is a novel resource for quantifying and spatially profiling multiplex IF.

A major issue with QuPath single-cell data frame outputs is the extremely large data frame outputs that QuPath scripts deliver. This creates issues when trying to observe and analyze outputs, as applications normally used for these data frames, such as Microsoft Excel, struggle to run efficiently when handling such large data. One option to improve this issue would be to filter data within QuPath, but data is then being removed that may be useful for later analysis.

To combat this issue in a way that preserves the raw data generated by QuPath, I have created a light weight, browser-based tool named QuPath Spatial Analysis and Visualization as an extended protocol using Python. This tool aims to not only process large data frames extremely fast, but also break down the data into a small, digestible data frame of important metrics. Additionally, the tool recreates images as scatter plots using spatial coordinates collected through the QuPath pipeline and creates heatmaps showcasing the processed data automatically. These images allow the user to flexibly explore their uploaded and processed data rapidly, while also allowing for image export in several file types.

2:45pm

Building Pipelines and Deploying Models with targets, tidymodels, and vetiver in R

James Bristow | Massey University

This talk introduces a practical workflow for developing and deploying machine learning models using the targets and tidymodels frameworks in R. Our approach brings structure and reproducibility to the entire modelling process, from data preprocessing and feature engineering to model tuning and evaluation. We show how workflowsets simplifies model comparison, stacksenables ensemble learning, probably facilitates model-agnostic uncertainty quantification, and DALEX supports model interpretation and explainability.

To integrate modelling with production, we further cover essential MLOps concepts using vetiver for deployment, plumber for building web APIs, and Shiny for interactive visualisation. We also demonstrate how model cards can document model performance, assumptions, and intended use to support transparent and responsible deployment. Docker and Kubernetes are used for scalable, containerised deployment.

A grapevine yield prediction case study ties these elements together, showing how R’s modern tools can deliver reproducible, interpretable, and production-ready machine learning pipelines.

3:00pm

{taxmate}: An R-Based Solution to Streamline Treasury Processes

Yang Hu | The Treasury

{taxmate} is an internal R-based business solution developed to replace SAS workflows from January 2026. Built in close collaboration with end users, the package is designed for production use with a strong focus on usability, reliability, and auditability, without requiring technical expertise. Its implementation is expected to reduce software licensing costs by approximately $58,000 annually.

The solution is delivered as a golem-based Shiny application with a Bootstrap 5 front end. Core business logic is implemented using the R6 framework, and all data are stored in DuckDB databases. The application operates as a standard point-and-click tool, with commonly used functions accessible via action buttons. For advanced users, a SQL interface provides flexible data extraction and analysis.

Behind the user interface, the application automates previously manual processes, including file renaming, data extraction, transformation, database updates, and archival. Workflow progress is reported dynamically, data quality issues are surfaced through automatically generated diagnostic reports, and all user actions are audit-logged to ensure transparency and support compliance review.

3:15pm

Connecting the Dots: From Gazette Automation to Probabilistic Entity Resolution

Jane Li & Nilani Algiriyage | Ministry of Business, Innovation, and Employment

Organizations across both the public and private sectors frequently manage datasets that lack unique identifiers, such as client numbers or IDs. This absence creates significant challenges in accurately linking records that belong to the same individual or entity. Without reliable linkage, duplicated records, fragmented profiles, inconsistent reporting, and reduced data quality can hinder both operational efficiency and decision‑making.

Probabilistic record linkage provides a powerful method to address these challenges by estimating the likelihood that two records refer to the same entity based on non-unique attributes such as name, gender, and date of birth. Unlike deterministic methods, which require exact matches, probabilistic techniques allow for flexibility in handling typographical errors, missing values, and variations in data entry.

A practical example comes from Immigration New Zealand’s work monitoring accredited employers who receive liquidation notices published in the New Zealand Gazette. Historically, this was a fully manual and error‑prone process that required manual text extraction and spreadsheet‑based matching. We automated the workflow using APIs, Power BI, R (RegEX), SharePoint, and Power Automate, turning hours of human processing into a near real-time data pipeline. However, once automated, a deeper issue surfaced: directors listed in New Zealand Companies Office data do not have unique identifiers, making it difficult to track individuals across multiple companies. In this talk, we will showcase the Gazette Workflow and how Python, and specifically the Splink library, can be leveraged to design and implement scalable probabilistic record linkage systems. We will highlight how this linkage layer builds on earlier automation efforts, sharing practical lessons from real‑world implementation across both automation and probabilistic matching.

3:30pm

Smarter research with R: Tools to save time and boost impact

Steven Thomas | Ministry for Primary Industries

Over the past five years, I’ve been trying to make my research and evaluation work at New Zealand Food Safety more efficient, transparent, and reproducible. Like many social scientists, I started learning R for analysis. But I quickly realised that the real power of R lies in its ecosystem for workflow management and reporting.

In this talk, I’ll share how I’ve progressively adopted tools like R Markdown, targets, and Quarto to help streamline the research process: from data wrangling and analysis to producing polished reports.

I’ll walk through practical examples of how these tools have helped me reduce duplication, improve version control, and create outputs that are both dynamic and easy to share. Along the way, I’ll highlight some challenges, lessons learned, and tips for anyone looking to move beyond ad hoc scripts toward a more structured and reproducible workflow.

Whether you’re a researcher, analyst, or policy professional, this session will show how embracing these tools can save time, improve quality, and make your work more impactful.

3:45pm

Session Q&A

3:55pm

Conference wrap-up

4:05pm