Markus Pettersson: Debiasing AI predictions for causal inference without fresh ground truth data

På RISE Learning Machines Seminar den 28 augusti 2025 ger Markus Pettersson, Chalmers University of Technology, sin presentation: Debiasing AI predictions for causal inference without fresh ground truth data. Seminariet är på engelska

Detta seminarium är ett samarbete mellan RISE och Climate AI Nordics.

Så här deltar du:

När: 28 augusti 2025, 15:00 CET
Var: Online via Zoom.

Anmäl dig här

Abstract

Machine learning models trained on Earth observation data, particularly satellite imagery, have recently shown impressive performance in predicting household-level wealth indices, potentially addressing chronic data scarcity in global development research. While these predictions exhibit strong predictive power, they inherently suffer from shrinkage toward the mean, resulting in attenuated estimates of causal treatment effects and thus limiting their utility in policy evaluations. Existing debiasing methods, such as Prediction-Powered Inference (PPI), require additional fresh ground-truth data at the downstream causal inference stage, severely restricting their applicability in data-poor environments.

In this paper, we introduce and rigorously evaluate two novel correction methods—linear calibration correction and Tweedie's correction—that substantially reduce prediction bias without relying on newly collected labeled data. Our methods operate on out-of-sample predictions from pre-trained models, treating these models as black-box functions. Linear calibration corrects bias through a straightforward linear transformation derived from held-out calibration data, while Tweedie's correction leverages empirical Bayes principles to directly address shrinkage-induced biases by exploiting score functions derived from predicted outcomes.

Through analytical exercises and experiments using Demographic and Health Survey (DHS) data, we demonstrate that both proposed methods outperform existing data-free approaches, can achieve significant reductions in attenuation bias and thus providing more accurate, actionable, and policy-relevant estimates. Our approach represents a generalizable, lightweight toolkit that enhances the reliability of causal inference when direct outcome measures are limited or unavailable.

Om talaren

Markus B. Pettersson is a PhD student working at the intersection of machine learning and earth observation, with a focus on large-scale poverty mapping and its applications in development research.

His work explores how satellite imagery and data-driven models can be used to estimate socioeconomic conditions in data-scarce regions, and how these maps can support causal analysis in policy and intervention design.

Kommande seminarier

Kontaktperson

Olof Mogren

Senior Researcher

+46 73 023 56 09

Läs mer om Olof

Kontakta Olof