PopuLoRA co-evolves LLM populations for reasoning self-play

AnalysisAI Models

28 days ago

PopuLoRA co-evolves LLM populations for reasoning self-play

Introduces PopuLoRA, a population-based asymmetric self-play framework for RLVR post-training. Teachers and students are specialized LoRA adapters on a shared frozen base: teachers propose verifiable tasks, students try to solve them. As students improve, teachers must search for harder tasks, creating an adaptive curriculum.

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play27 days agoAMavorParker Discuss

28 days ago