Curt Tigges

I do LLM interpretability research and engineering at Decode Research, parent organization of Neuronpedia. My research involves a number of areas, including feature representation, sparse autoencoders (SAEs), circuit discovery, the study of world modeling, and developmental interpretability. I’ve also done some work with model training and fine-tuning.

Technical Foci

Mechanistic Interpretability

Most of my research focuses on mechanistic interpretability for large language models. I find discovery of the internal patterns, features, and workings of these models quite exciting, and am also very interested in their application for AI risk reduction.

Developmental Interpretability

In addition to other topics in mechanistic interpretability, I have significant research interest in the evolution of circuitry and capabilities over the training process, with a focus on phase transitions and the sensitivity of models to the order of training data.

MLOps & Software Engineering

I maintain a focus on best practices for software engineering in my machine learning projects, and have focused on developing a range of skills surrounding model management, training and deployment.

Selected Projects

[Mech Interp Tooling] Crosslayer Coding: Crosslayer Transcoder Training for LLMs

[Mech Interp Tooling] Crosslayer Coding: Crosslayer Transcoder Training for LLMs

Deep Learning, Highlighted, Mechanistic Interpretability


[Mech Interp Tooling] Probity: A Toolkit for Neural Network Probing

[Mech Interp Tooling] Probity: A Toolkit for Neural Network Probing

Deep Learning, Highlighted, Mechanistic Interpretability


[NeurIPS 2024 Paper] LLM Circuit Analyses Are Consistent Across Training and Scale

[NeurIPS 2024 Paper] LLM Circuit Analyses Are Consistent Across Training and Scale

Deep Learning, Highlighted, Mechanistic Interpretability, NLP


[Blackbox NLP Paper] Linear Representations of Sentiment in Large Language Models

[Blackbox NLP Paper] Linear Representations of Sentiment in Large Language Models

Deep Learning, Highlighted, Mechanistic Interpretability, NLP