A curated list of awesome reinforcement learning resources in the context of LLMs and multimodal models. Doesn't cover robotics or other domains that are equally cool.
-
DeepSeek-R1 by The WHALE!
-
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
-
Maybe https://arxiv.org/pdf/2412.09413 (or is this distill)??
-
Teaching Large Language Models to Reason with Reinforcement Learning
-
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs
-
SimpleRL-reason
- verl
- GRPO from Unsloth
- EasyR1 from hiyouga@ of LLaMA factory
- HuggingFace TRL
- Verifiers (github repo)
- Search-r1
- open-thoughts/OpenThoughts-114k
- open-r1/OpenR1-Math-220k
- GeneralReasoning/GeneralThought-195K
- PrimeIntellect/SYNTHETIC-1
- facebook/Natural-reasoning
- SynthLabsAI/Big-Math-RL-Verified
- Congliu/Chinese-DeepSeek-R1-Distill-data-110k
- FreedomIntelligence/medical-o1-reasoning-SFT
- FreedomIntelligence/Medical-R1-Distill-Data-Chinese
- reinforcement-learning resources (GitHub repo)
- The classic book on RL: Reinforcement Learning: An Introduction
- Playing Atari with Deep Reinforcement Learning
- RL Course by David Silver