Max Nadeau

mnadeau [at] college [dot] harvard [dot] edu

I'm Max Nadeau. I just graduated from Harvard College, where I studied computer science and did research on AI robustness and interpretability. Working with my collaborators, I developed techniques for altering (e.g. detoxifying) LLM behavior, localizing mechanisms for specific skills within models, and uncovering flaws in an image classifer that a human can put into words. These papers, respectively, are linked below. 

I also helped lead the Harvard AI Safety Team, a student group supporting students in conducting research to reduce risks from advanced AI. Here's an article in the school paper about us. 

Now I live in Berkeley. I'm interested in ML research, like these neat papers, on what LLMs "know" under the surface and how that knowledge is converted into outputs.

I also enjoy mathematics, philosophy, and forecasting. 

Send me an email and say hello, or connect on LinkedIn.


Li, M.*, Davies, X.*, & Nadeau, M.* (2023). Circuit Breaking: Removing Model Behaviors with Targeted Ablation. In 2023 ICML Workshop on Deployment Challenges for Generative AI.

Davies, X.*, Nadeau, M.*, Prakash, N.*, Shaham, T, & Bau, D. (2023). Discovering Variable Binding Circuitry with Desiderata. In 2023 ICML Workshop on Deployment Challenges for Generative AI.

Casper, S.*, Nadeau, M.*, Hadfield-Menell, D., & Kreiman, G. (2022). Robust Feature-Level Adversaries are Interpretability Tools. In NeurIPS 2022 (Advances in Neural Information Processing Systems 35).