RewardSpy debugger detects reward hacking in RL training

AnalysisDevelopers

4 hours ago

RewardSpy debugger detects reward hacking in RL training

RewardSpy wraps reward functions to detect exploitation during RL training. Built for GRPO, it helps distinguish genuine policy improvement from reward hacking.

4 hours ago