AnalysisPolicyAI AgentsJun 25, 4:00 AMUnfireable Safety Kernel proposes execution-time AI alignment for agentsArxiv CS.AI (top papers)The paper introduces a safety kernel that runs outside the agent's runtime, making it unremovable by the agent. It aims to prevent AI agents from bypassing safety controls in tools and APIs.AgentsSee 8 more sourcesA Red Teaming Framework for Large Language Models: A Case Study on Faithfulness Evaluation5 days agoAbrar Alotaibi, Raed Mughus, Moataz AhmedHow Reliable Is Your Jailbreak Judge? Calibration and Adversarial Robustness of Automated ASR Scoring5 days agoYang Gao (Veyon Solutions)Do Encoders Suffice? A Systematic Comparison of Encoder and Decoder Safety Judges for LLM Adversarial Evaluation5 days agoHan Jeon, Shiv Medler, Joseph Voyles, Matt WoodWhat Intermediate Layers Know: Detecting Jailbreaks from Entropy Dynamics5 days agoSofiia Nikolenko, Michele Papucci, Mina Rezaei, Shireen Kudukkil ManchingalPolicyAlign: Direct Policy-Based Safety Alignment for Large Language Models5 days agoChang Wu, Junfeng Fang, Houcheng Jiang, Kai Tang, Pengyu Cheng, Xiaoxi Jiang, Guanjun Jiang, Xiang WangTo Isolate or to Score? Model-Adaptive Assessment for Cost-Efficient Multi-Agent RAG5 days agoJungseob Lee, Chanjun Park, Heuiseok LimLLM-Based Scientific Peer Review: Methods, Benchmarks, and Reliability Challenges5 days agoThi Huyen Nguyen, Zahra AhmadiYuvion VL: A Multimodal Foundation Model for Adversarial Content and AI Safety5 days agoShikai Qiu, Xiaowen Xu, Benlei Cui, Ting Ma, Xiufeng Huang, Wenjing Jiang, Shaoxuan He, Haolei Xu, Chunyang Chai, Yujian Li, Yiliang Zhang, Guanghui Wang, Ziheng Wang, Ziwen Xu, Zhaoyu Fan, Jinhao Chen, Ruijie Jian, Hongxing Li, Chuxi Xiao, Xinyue Chen, WeArxiv CS.AI (top papers)·Seth Dobrin{\L}ukasz ChmielJun 25, 4:00 AM