UPDATE:
Use this Colab - https://colab.research.google.com/git...
"It's advised to apply GRPO to a model at least 1.5B in parameters to correctly generate thinking tokens as smaller models may not. "
Reflections:
I think the biggest mistake i made in this video is picking up a very small model that couldn't grow into reasoning!
Note:
This is an experimental tutorial - Might not work 100% of the time!
In this tutorial, we are going to learn how to build a deepseek r1 style reasoning LLM using GRPO based fine-tuning. This video also goes into the reward functions are used in building the Math Reasoner!
All of this that can fit in Free Google Colab Notebook!
https://colab.research.google.com/dri...
My failed training Weights and Biases Dashboard
https://api.wandb.ai/links/1littlecod...
The original colab notebook that requires in A100 GPU - https://colab.research.google.com/dri...
❤️ If you want to support the channel ❤️
Support here:
Patreon - / 1littlecoder
Ko-Fi - https://ko-fi.com/1littlecoder
🧭 Follow me on 🧭
Twitter - / 1littlecoder