LaunchDevelopers
16 hours ago
RewardSpy: debugger detects reward function exploitation in RL training
A new library wraps reward functions to detect reward hacking during training. It alerts when the policy is exploiting the reward function rather than genuinely improving.
·
16 hours ago
