I am a postdoctoral researcher at Mila - Quebec AI Institute and a postdoctoral fellow at McGill University in Montréal 🇨🇦. Before that, I did my PhD in Computer Science at Saarland University in Saarbrücken 🇩🇪.
The goal of my research is to enable reliable, controllable, and adaptable AI systems, particularly the Large Language Models (LLMs) that millions interact with daily. LLMs require significant adaptation (also known as fine-tuning or post-training) to become specialized, safe, and aligned with specific requirements after pre-training. My research program centers on building a fundamental, scientific understanding of this crucial adaptation stage.
In addition to model adaptation, I am broadly interested in the interpretability of LLMs — and in particular in how to make interpretability research more actionable.
For our work, my collaborators and I have received a Best Paper Award 🏆 at COLING 2022, the Best Theme Paper Award 🏆 at ACL 2023, and the Most Interesting Paper Award 🏆 at the BabyLM Challenge 2023.
Latest News
-
Three papers accepted to ICML 2026 🇰🇷: LatentLens, Operationalising the Superficial Alignment Hypothesis, and the position paper Interpretability Can Be Actionable.
-
Two papers accepted to ACL 2026 🇺🇸: Do Generalisation Results Generalise? and CLaS-Bench: A Cross-Lingual Alignment and Steering Benchmark.
-
Two new preprints: LLM2Vec-Gen proposes a self-supervised method that produces embeddings directly in an LLM's output space, and The Illusion of Superposition? investigates whether language models actually leverage superposition when reasoning with latent chain-of-thoughts.
