RewardSpy: debugger detects reward function exploitation in RL training

LaunchDevelopers

16 hours ago

RewardSpy: debugger detects reward function exploitation in RL training

A new library wraps reward functions to detect reward hacking during training. It alerts when the policy is exploiting the reward function rather than genuinely improving.

16 hours ago