Schedule | R Exchange 2026

8:15am

Doors open

9:00am

Welcome

Uli Muellner | Epi

9:15am

Posit speaker

To be announced

9:40am

Invited speaker

To be announced

10:05am

Overcoming technical bottlenecks in data sharing - the eDNAbridge R package

Kiel Hards | Epi

Environmental DNA (eDNA) is a powerful new tool for biomonitoring, biosecurity, ecology and conservation. However, the process to publish eDNA datasets into openly accessible repositories presents a challenging process that impedes open data sharing for the greater good. This talk introduces eDNABridge, an R package developed for the Ministry for the Environment in collaboration with GBIF New Zealand, Wilderlab and New Zealand data owners that automates the flow of eDNA results from laboratories directly into the global GBIF data repository. We present results from a pilot study, including over 4500 sampling results, and show how standardisation and automation can enable the timely reporting of complex data at scale to the benefit of a diverse group of stakeholders.

10:35am

Session Q&A

10:45am

Morning tea break and networking

11:30am

Workshops

Taking the plunge into Positron - an introduction and discussion of Posit's R & Python friendly Data Science IDE 

Hosted by Nick Snellgrove and Ben Rhodes (Epi)


Since its beta release in 2024, Positron - the latest Data Science IDE from Posit and the spiritual successor to RStudio, has taken the open source data science community by storm. The IDE combines the many well loved features of RStudio with the modern interface and functionality of VS Code, and provides support for both R and Python out-of-the box to improve data exploration, processing and code outputs. Join us for a discussion into the features Positron offers, explore if  Positron is right for you, and let us show you how our team at Epi switched successfully from RStudio to Positron. 

Low cost & low stress approaches to getting started with open-source tools 

Hosted by Chris Knox (NZ Herald) and Uli Muellner (Epi)


Let's be honest it can be easy to feel overwhelmed by the possibilities of open-source tools like R and some of the advanced work that is being showcased.  But it doesn't have to be this way - the beauty of open-source tools is that they can serve anyone - from single user to large corporations. In this workshop we will showcase the basics principles of open-source data science. Chris will share how he puts these into practice in his daily work as a journalist and how you can build a basic data science platform for yourself that will enable you to make progress on version control, data sharing, data analysis and visualisation without expensive licenses or commercial support. 

Integrating AI in your dashboard 

Hosted by Kiel Hards and Petra Muellner (Epi)


This workshop provides a targeted and practical view on integrating AI into existing R & Shiny applications to reduce manual effort, improve interpretability, and support better decision‑making. We explore how AI can assist users with data exploration, summaries, and parameter choices, while dashboards retain their critical role in providing structure, context, and repeatability. 

1:00pm

Networking lunch

2:30pm

QuPath Spatial Analysis and Visualisation tool

Cynthia Morgan | Malaghan Institute

Multiplex immunofluorescence (mIF) is a useful technique that allows the quantification and spatial analysis of cellular characteristics within 2D tissue sections. While many spatial analysis pipelines are technically challenging or require proprietary software/hardware, the free and open-source software QuPath is a novel resource for quantifying and spatially profiling multiplex IF.

A major issue with QuPath single-cell data frame outputs is the extremely large data frame outputs that QuPath scripts deliver. This creates issues when trying to observe and analyze outputs, as applications normally used for these data frames, such as Microsoft Excel, struggle to run efficiently when handling such large data. One option to improve this issue would be to filter data within QuPath, but data is then being removed that may be useful for later analysis.

To combat this issue in a way that preserves the raw data generated by QuPath, I have created a light weight, browser-based tool named QuPath Spatial Analysis and Visualization as an extended protocol using Python. This tool aims to not only process large data frames extremely fast, but also break down the data into a small, digestible data frame of important metrics. Additionally, the tool recreates images as scatter plots using spatial coordinates collected through the QuPath pipeline and creates heatmaps showcasing the processed data automatically. These images allow the user to flexibly explore their uploaded and processed data rapidly, while also allowing for image export in several file types.

2:45pm

Building Pipelines and Deploying Models with targets, tidymodels, and vetiver in R

James Bristow | Massey University

This talk introduces a practical workflow for developing and deploying machine learning models using the targets and tidymodels frameworks in R. Our approach brings structure and reproducibility to the entire modelling process, from data preprocessing and feature engineering to model tuning and evaluation. We show how workflowsets simplifies model comparison, stacksenables ensemble learning, probably facilitates model-agnostic uncertainty quantification, and DALEX supports model interpretation and explainability.

To integrate modelling with production, we further cover essential MLOps concepts using vetiver for deployment, plumber for building web APIs, and Shiny for interactive visualisation. We also demonstrate how model cards can document model performance, assumptions, and intended use to support transparent and responsible deployment. Docker and Kubernetes are used for scalable, containerised deployment.

A grapevine yield prediction case study ties these elements together, showing how R’s modern tools can deliver reproducible, interpretable, and production-ready machine learning pipelines.

3:00pm

{taxmate}: An R-Based Solution to Streamline Treasury Processes

Yang Hu | The Treasury

{taxmate} is an internal R-based business solution developed to replace SAS workflows from January 2026. Built in close collaboration with end users, the package is designed for production use with a strong focus on usability, reliability, and auditability, without requiring technical expertise. Its implementation is expected to reduce software licensing costs by approximately $58,000 annually.

The solution is delivered as a golem-based Shiny application with a Bootstrap 5 front end. Core business logic is implemented using the R6 framework, and all data are stored in DuckDB databases. The application operates as a standard point-and-click tool, with commonly used functions accessible via action buttons. For advanced users, a SQL interface provides flexible data extraction and analysis.

Behind the user interface, the application automates previously manual processes, including file renaming, data extraction, transformation, database updates, and archival. Workflow progress is reported dynamically, data quality issues are surfaced through automatically generated diagnostic reports, and all user actions are audit-logged to ensure transparency and support compliance review.

3:15pm

Connecting the Dots: Overcoming Data Linking Challenges with Python

Jane Li & Nilani Algiriyage | Ministry of Business, Innovation, and Employment

Organizations across both the public and private sectors frequently manage datasets that lack unique identifiers, such as client numbers or national IDs. This absence creates significant challenges in accurately linking records that belong to the same individual or entity. Without reliable linkage, organizations face issues which can negatively impact decision-making and service delivery, such as duplicated records, fragmented profiles, inconsistent reporting, and reduced data quality.

Probabilistic record linkage provides a powerful approach to address these challenges by estimating the likelihood that two records refer to the same entity based on a combination of non-unique attributes, such as name, gender, and date of birth. Unlike deterministic methods, which require exact matches, probabilistic techniques allow for flexibility in handling typographical errors, missing values, and variations in data entry.

In this talk, we will showcase how Python, and specifically the Splink library, can be leveraged to design and implement scalable probabilistic record linkage systems. We will walk through the core principles behind Splink, demonstrate its capabilities for handling large datasets, and share practical lessons learned from real-world implementation.

3:30pm

Smarter research with R: Tools to save time and boost impact

Steven Thomas | Ministry for Primary Industries

Over the past five years, I’ve been trying to make my research and evaluation work at New Zealand Food Safety more efficient, transparent, and reproducible. Like many social scientists, I started learning R for analysis. But I quickly realised that the real power of R lies in its ecosystem for workflow management and reporting.

In this talk, I’ll share how I’ve progressively adopted tools like R Markdown, targets, and Quarto to help streamline the research process: from data wrangling and analysis to producing polished reports.

I’ll walk through practical examples of how these tools have helped me reduce duplication, improve version control, and create outputs that are both dynamic and easy to share. Along the way, I’ll highlight some challenges, lessons learned, and tips for anyone looking to move beyond ad hoc scripts toward a more structured and reproducible workflow.

Whether you’re a researcher, analyst, or policy professional, this session will show how embracing these tools can save time, improve quality, and make your work more impactful.

3:45pm

Session Q&A

3:55pm

Conference wrap-up

4:05pm

Networking