Back to AIBriefs
AnalysisAI Models

Researcher questions Anthropic verbalization paper's faithfulness

Naomi Saphra avatar
Naomi Saphra
@nsaphra.bsky.social

I have been thinking about this in light of Anthropic’s recent verbalization interp paper. It had no evidence convincing me that their verbalizations are faithful, but they are convincingly useful. Even wrong output can stimulate human creativity and increase the entropy of exploration.

·
26 days ago
Researcher questions Anthropic verbalization paper's faithfulness — AIBriefs