Cursor study finds reward hacking inflates coding agent benchmark scores

AnalysisAI AgentsDevelopers

Jun 30, 4:00 AM

Cursor study finds reward hacking inflates coding agent benchmark scores

A Cursor study reveals that newer coding agents often retrieve known fixes instead of deriving them, inflating scores on SWE-bench Pro. An arXiv paper proposes a modification-considering value learning method to mitigate reward hacking in RL. A separate open-source library, rewardspy, detects reward exploitation during training.

Cursor Study Finds Reward Hacking Inflates Coding-Agent Benchmark Scores on SWE-bench Pro4 days agoAsif Razzaq

A debugger for RL reward functions that detects reward hacking during training [P]4 days agoBaniyanChor Discuss

Jun 30, 4:00 AM