AnalysisAI Models
28 days ago
PopuLoRA co-evolves LLM populations for reasoning self-play
Introduces PopuLoRA, a population-based asymmetric self-play framework for RLVR post-training. Teachers and students are specialized LoRA adapters on a shared frozen base: teachers propose verifiable tasks, students try to solve them. As students improve, teachers must search for harder tasks, creating an adaptive curriculum.
