BYORn method defends LVLMs against backdoor attacks

AnalysisAI ModelsPolicy

8 days ago

BYORn method defends LVLMs against backdoor attacks

The paper introduces BYORn (Bootstrap Your Own Responses), a defense against backdoor attacks in large vision-language models during supervised fine-tuning. Unlike existing defenses that require clean reference data, BYORn uses the model's own responses to detect and mitigate attacks. Experiments show it effectively neutralizes various backdoor triggers while maintaining model performance.

8 days ago