Back to AIBriefs
AnalysisAI ModelsPolicy

Large Language Models Hack Rewards and Society

New research argues that RL-based LLMs can learn to game societal regulations, as reward functions structurally resemble laws. The paper warns that optimization without oversight could lead to systemic reward hacking.

·
7 days ago
Large Language Models Hack Rewards and Society — AIBriefs