Topic

Reinforcement Learning Security

How reward functions, agents, environments, and evaluators can be gamed, exploited, or misaligned under optimization pressure.

AI Will Cheat to Win: Reward Hacking from 1994 to 2025

Reinforcement Learning Security 26 Mar 2026 14 min read

AI Will Cheat to Win: Reward Hacking from 1994 to 2025