Automated identification of lexical alignment and preference shifts in LLMs

AnalysisAI ModelsPolicy

8 days ago

Automated identification of lexical alignment and preference shifts in LLMs

Paper introduces a method to automatically detect when LLM assistants diverge from human expectations in language use. The approach builds on research in Scientific English to identify both what divergences occur and why.

Geometric Erasure by Contrastive Velocity Matching in Rectified Flows8 days agoJonas Henry Grebe, Tobias Braun, Anna Rohrbach, Marcus Rohrbach

Positional Encodings Anchor Spatial Structure in Vision Transformers: A Geometric Perspective on Robustness8 days agoMahmoud Mannes

When Softmax Fails at the Top: Extreme Value Corrections for InfoNCE8 days agoMelihcan Erol, Suat Evren, Oktay Ozel, Alexander Morgan, Jongha Jon Ryu, Lizhong Zheng

Fully Automated Identification of Lexical Alignment and Preference-Stage Shifts in Large Language Models8 days agoThomas Stephan Juzek, Xiaoyang Ming, Jose A. Hernandez

8 days ago