Turn ANY LLM into a Mini Deepseek R1 💥Fine-Tuning with GRPO!!!💥

UPDATE:

Use this Colab - https://colab.research.google.com/git...

"It's advised to apply GRPO to a model at least 1.5B in parameters to correctly generate thinking tokens as smaller models may not. "

Reflections:

I think the biggest mistake i made in this video is picking up a very small model that couldn't grow into reasoning! 

Note:

This is an experimental tutorial - Might not work 100% of the time! 

In this tutorial, we are going to learn how to build a deepseek r1 style reasoning LLM using GRPO based fine-tuning. This video also goes into the reward functions are used in building the Math Reasoner! 

All of this that can fit in Free Google Colab Notebook! 

https://colab.research.google.com/dri...

My failed training Weights and Biases Dashboard 

https://api.wandb.ai/links/1littlecod...

The original colab notebook that requires in A100 GPU - https://colab.research.google.com/dri...

❤️ If you want to support the channel ❤️
Support here:
Patreon -   / 1littlecoder  
Ko-Fi - https://ko-fi.com/1littlecoder

🧭 Follow me on 🧭
Twitter -   / 1littlecoder

로딩 중...

Turn ANY LLM into a Mini Deepseek R1 💥Fine-Tuning with GRPO!!!💥

I Made an Android App in MINUTES with This AI Tool

[코딩 자율학습 Vue.js] 6장 - 실습: 할 일 관리 앱 만들기

Is gpt-5.1 the best code model ever?

Build a Livestream App With Next.js | Twitch Clone | Part 2/2

Building a Typescript Web Game (with Neovim)

Web dev is in big trouble...