Reasoning Models
Reasoning models are a new class of LLMs designed to solve complex problems like math and coding.
Reasoning Models
Reasoning models are a new class of LLMs designed to solve complex problems like math and coding. Unlike standard LLMs that generate an answer directly, reasoning models are designed and trained to first produce intermediate “thinking” steps (similar to chain-of-thought reasoning, but longer and more detailed) before finalizing a response. This makes them strong at multi-step logic tasks (e.g., math proofs, coding challenges), but less efficient for simpler tasks like translation. Reasoning models are trained using reinforcement learning (RL). For example, DeepSeek-R1 was trained using a combination of RL from human preference (RLHF) and RL using verifiable rewards (e.g., math correctness).
Reasoning models are typically slower and more expensive compared to standard LLMs because they require more output tokens for thinking steps.
Further reading
Demystifying Reasoning Models by Cameron Wolfe — Easy to follow article on reasoning models that explains what they are and how they are trained. Includes lots of helpful references.
LLM Post-Training: A Deep Dive into Reasoning Large Language Models by Kumar et al. — If you want to dive deeper, this paper gives a thorough overview post-training techniques, including the RL techniques that underpin models like DeepSeek-R1.
Reasoning best practices (OpenAI Platform documentation) — This page gives examples of problems where OpenAI’s reasoning models have been found to work well and includes tips for prompting reasoning models effectively.
Do you want to learn more NLP concepts?
Each week I pick one core NLP concept and create a one-slide, one-minute explanation of the concept. To receive weekly new posts in your inbox, subscribe here:
Reach out to me:
Connect with me on LinkedIn
Read my technical blog on Medium
Or send me a message by responding to this post
Is there a concept you would like me to cover in a future issue? Let me know!