Publications

(2025). The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret. ICML 2025.
(2024). Evaluating Superhuman Models with Consistency Checks. SaTML 2024.