<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<title>Developer Notes</title>
	<subtitle>Personal thoughts on various dev topics</subtitle>
	<link href="https://denys.dev/atom.xml" rel="self" type="application/atom+xml"/>
	<link href="https://denys.dev"/>
	<generator uri="https://www.getzola.org/">Zola</generator>
	<updated>2026-02-13T00:00:00+00:00</updated>
	<id>https://denys.dev/atom.xml</id>
	<entry xml:lang="en">
		<title>Developer Guide: Getting Started with LLM Fine-Tuning</title>
		<published>2026-02-13T00:00:00+00:00</published>
		<updated>2026-02-13T00:00:00+00:00</updated>
		<link href="https://denys.dev/smol-training/"/>
		<link rel="alternate" href="https://denys.dev/smol-training/" type="text/html"/>
		<id>https://denys.dev/smol-training/</id>
        <summary type="html">&lt;p&gt;A comprehensive guide for developers new to Large Language Model (LLM) development, covering the complete workflow from running a base model to fine-tuning it with custom data.&lt;&#x2F;p&gt;
</summary>
		<content type="html">&lt;p&gt;A comprehensive guide for developers new to Large Language Model (LLM) development, covering the complete workflow from running a base model to fine-tuning it with custom data.&lt;&#x2F;p&gt;
&lt;span id=&quot;continue-reading&quot;&gt;&lt;&#x2F;span&gt;
&lt;p&gt;A comprehensive guide for developers new to Large Language Model (LLM) development, covering the complete workflow from running a base model to fine-tuning it with custom data.&lt;&#x2F;p&gt;
&lt;p&gt;You can find all the code for this guide in the &lt;a rel=&quot;nofollow&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;DenysVuika&#x2F;smol-training&quot;&gt;smol-training GitHub repository&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;hr &#x2F;&gt;
&lt;h2 id=&quot;external-resources&quot;&gt;External Resources&lt;a class=&quot;zola-anchor&quot; href=&quot;#external-resources&quot; aria-label=&quot;Anchor link for: external-resources&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h2&gt;
&lt;p&gt;Before diving into this guide, you might find these resources helpful:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;SmolLM3: smol, multilingual, long-context reasoner&lt;&#x2F;strong&gt;&lt;br &#x2F;&gt;
&lt;a rel=&quot;nofollow&quot; href=&quot;https:&#x2F;&#x2F;huggingface.co&#x2F;blog&#x2F;smollm3&quot;&gt;https:&#x2F;&#x2F;huggingface.co&#x2F;blog&#x2F;smollm3&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&quot;smol-course&quot; - comprehensive (and smollest) course to Fine-Tuning Language Models!&lt;&#x2F;strong&gt;&lt;br &#x2F;&gt;
&lt;a rel=&quot;nofollow&quot; href=&quot;https:&#x2F;&#x2F;huggingface.co&#x2F;learn&#x2F;smol-course&#x2F;unit0&#x2F;1&quot;&gt;https:&#x2F;&#x2F;huggingface.co&#x2F;learn&#x2F;smol-course&#x2F;unit0&#x2F;1&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;LoRA and PEFT: Efficient Fine-Tuning&lt;&#x2F;strong&gt;&lt;br &#x2F;&gt;
&lt;a rel=&quot;nofollow&quot; href=&quot;https:&#x2F;&#x2F;huggingface.co&#x2F;learn&#x2F;smol-course&#x2F;unit1&#x2F;3a&quot;&gt;https:&#x2F;&#x2F;huggingface.co&#x2F;learn&#x2F;smol-course&#x2F;unit1&#x2F;3a&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;hr &#x2F;&gt;
&lt;h2 id=&quot;table-of-contents&quot;&gt;Table of Contents&lt;a class=&quot;zola-anchor&quot; href=&quot;#table-of-contents&quot; aria-label=&quot;Anchor link for: table-of-contents&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h2&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;denys.dev&#x2F;smol-training&#x2F;#introduction&quot;&gt;Introduction&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;denys.dev&#x2F;smol-training&#x2F;#prerequisites&quot;&gt;Prerequisites&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;denys.dev&#x2F;smol-training&#x2F;#project-setup&quot;&gt;Project Setup&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;denys.dev&#x2F;smol-training&#x2F;#understanding-the-basics&quot;&gt;Understanding the Basics&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;denys.dev&#x2F;smol-training&#x2F;#step-1-running-a-base-model&quot;&gt;Step 1: Running a Base Model&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;denys.dev&#x2F;smol-training&#x2F;#step-2-preparing-training-data&quot;&gt;Step 2: Preparing Training Data&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;denys.dev&#x2F;smol-training&#x2F;#step-3-fine-tuning-with-custom-data&quot;&gt;Step 3: Fine-Tuning with Custom Data&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;denys.dev&#x2F;smol-training&#x2F;#step-4-testing-your-fine-tuned-model&quot;&gt;Step 4: Testing Your Fine-Tuned Model&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;denys.dev&#x2F;smol-training&#x2F;#step-5-comparing-base-vs-fine-tuned&quot;&gt;Step 5: Comparing Base vs Fine-Tuned&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;denys.dev&#x2F;smol-training&#x2F;#understanding-the-code&quot;&gt;Understanding the Code&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;denys.dev&#x2F;smol-training&#x2F;#common-issues-troubleshooting&quot;&gt;Common Issues &amp;amp; Troubleshooting&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;denys.dev&#x2F;smol-training&#x2F;#next-steps&quot;&gt;Next Steps&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;hr &#x2F;&gt;
&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;a class=&quot;zola-anchor&quot; href=&quot;#introduction&quot; aria-label=&quot;Anchor link for: introduction&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h2&gt;
&lt;p&gt;This guide walks you through the complete process of working with Large Language Models (LLMs), specifically using HuggingFace&#x27;s SmolLM3-3B model. You&#x27;ll learn to:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Run pre-trained models for inference&lt;&#x2F;li&gt;
&lt;li&gt;Fine-tune models with custom data using LoRA&lt;&#x2F;li&gt;
&lt;li&gt;Evaluate training effectiveness&lt;&#x2F;li&gt;
&lt;li&gt;Understand key concepts in LLM development&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;&lt;strong&gt;Why SmolLM3?&lt;&#x2F;strong&gt; It&#x27;s a small (3 billion parameters), efficient model perfect for learning and local development, while still being powerful enough for real-world applications.&lt;&#x2F;p&gt;
&lt;hr &#x2F;&gt;
&lt;h2 id=&quot;prerequisites&quot;&gt;Prerequisites&lt;a class=&quot;zola-anchor&quot; href=&quot;#prerequisites&quot; aria-label=&quot;Anchor link for: prerequisites&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h2&gt;
&lt;h3 id=&quot;required-knowledge&quot;&gt;Required Knowledge&lt;a class=&quot;zola-anchor&quot; href=&quot;#required-knowledge&quot; aria-label=&quot;Anchor link for: required-knowledge&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h3&gt;
&lt;ul&gt;
&lt;li&gt;Basic Python programming&lt;&#x2F;li&gt;
&lt;li&gt;Familiarity with command line&#x2F;terminal&lt;&#x2F;li&gt;
&lt;li&gt;Understanding of basic ML concepts (optional but helpful)&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;required-software&quot;&gt;Required Software&lt;a class=&quot;zola-anchor&quot; href=&quot;#required-software&quot; aria-label=&quot;Anchor link for: required-software&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Python 3.13+&lt;&#x2F;strong&gt; - Modern Python version&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;uv&lt;&#x2F;strong&gt; - Fast Python package manager (&lt;a rel=&quot;nofollow&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;astral-sh&#x2F;uv&quot;&gt;install from astral.sh&lt;&#x2F;a&gt;)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Git&lt;&#x2F;strong&gt; - Version control&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;hardware-requirements&quot;&gt;Hardware Requirements&lt;a class=&quot;zola-anchor&quot; href=&quot;#hardware-requirements&quot; aria-label=&quot;Anchor link for: hardware-requirements&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h3&gt;
&lt;p&gt;This project supports:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Apple Silicon (M1&#x2F;M2&#x2F;M3)&lt;&#x2F;strong&gt; - Uses Metal Performance Shaders (MPS) ✅ Recommended&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;NVIDIA GPU&lt;&#x2F;strong&gt; - Uses CUDA (change &lt;code&gt;DEVICE = &quot;cuda&quot;&lt;&#x2F;code&gt;)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;CPU&lt;&#x2F;strong&gt; - Works but slower (change &lt;code&gt;DEVICE = &quot;cpu&quot;&lt;&#x2F;code&gt;)&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;&lt;strong&gt;Memory&lt;&#x2F;strong&gt;: Minimum 8GB RAM, 16GB+ recommended for training&lt;&#x2F;p&gt;
&lt;hr &#x2F;&gt;
&lt;h2 id=&quot;project-setup&quot;&gt;Project Setup&lt;a class=&quot;zola-anchor&quot; href=&quot;#project-setup&quot; aria-label=&quot;Anchor link for: project-setup&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h2&gt;
&lt;h3 id=&quot;1-clone-or-create-the-project&quot;&gt;1. Clone or Create the Project&lt;a class=&quot;zola-anchor&quot; href=&quot;#1-clone-or-create-the-project&quot; aria-label=&quot;Anchor link for: 1-clone-or-create-the-project&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h3&gt;
&lt;pre data-lang=&quot;bash&quot; style=&quot;background-color:#fafafa;color:#80cbc4;&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# If cloning from a repository
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;git clone &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;your-repo-url&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;cd smol-training
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# Or create a new project
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;mkdir smol-training
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;cd smol-training
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h3 id=&quot;2-install-dependencies&quot;&gt;2. Install Dependencies&lt;a class=&quot;zola-anchor&quot; href=&quot;#2-install-dependencies&quot; aria-label=&quot;Anchor link for: 2-install-dependencies&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h3&gt;
&lt;p&gt;Using &lt;code&gt;uv&lt;&#x2F;code&gt; (modern, fast approach):&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; style=&quot;background-color:#fafafa;color:#80cbc4;&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# Initialize the project (if not already done)
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;uv init
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# Install dependencies
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;uv sync
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This creates a &lt;code&gt;.venv&lt;&#x2F;code&gt; virtual environment and installs all required packages:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;torch&lt;&#x2F;code&gt; - PyTorch deep learning framework&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;transformers&lt;&#x2F;code&gt; - HuggingFace&#x27;s model library&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;peft&lt;&#x2F;code&gt; - Parameter-Efficient Fine-Tuning (LoRA)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;trl&lt;&#x2F;code&gt; - Transformer Reinforcement Learning tools&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;datasets&lt;&#x2F;code&gt; - Dataset management&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;accelerate&lt;&#x2F;code&gt; - Training acceleration&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;&lt;strong&gt;Important&lt;&#x2F;strong&gt;: With &lt;code&gt;uv&lt;&#x2F;code&gt;, you don&#x27;t need to manually activate the virtual environment. Just use &lt;code&gt;uv run&lt;&#x2F;code&gt; before your Python commands!&lt;&#x2F;p&gt;
&lt;hr &#x2F;&gt;
&lt;h2 id=&quot;understanding-the-basics&quot;&gt;Understanding the Basics&lt;a class=&quot;zola-anchor&quot; href=&quot;#understanding-the-basics&quot; aria-label=&quot;Anchor link for: understanding-the-basics&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h2&gt;
&lt;h3 id=&quot;key-concepts&quot;&gt;Key Concepts&lt;a class=&quot;zola-anchor&quot; href=&quot;#key-concepts&quot; aria-label=&quot;Anchor link for: key-concepts&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h3&gt;
&lt;h4 id=&quot;1-base-model-vs-fine-tuned-model&quot;&gt;1. &lt;strong&gt;Base Model vs Fine-Tuned Model&lt;&#x2F;strong&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#1-base-model-vs-fine-tuned-model&quot; aria-label=&quot;Anchor link for: 1-base-model-vs-fine-tuned-model&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Base Model&lt;&#x2F;strong&gt;: Pre-trained on massive datasets, knows general information&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Fine-Tuned Model&lt;&#x2F;strong&gt;: Adapted to specific tasks or knowledge domains&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Think of it like this: The base model is a general doctor, fine-tuning makes it a specialist.&lt;&#x2F;p&gt;
&lt;h4 id=&quot;2-lora-low-rank-adaptation&quot;&gt;2. &lt;strong&gt;LoRA (Low-Rank Adaptation)&lt;&#x2F;strong&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#2-lora-low-rank-adaptation&quot; aria-label=&quot;Anchor link for: 2-lora-low-rank-adaptation&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h4&gt;
&lt;p&gt;Instead of retraining all 3 billion parameters, LoRA only trains ~0.12% (3.8 million) of them by adding small &quot;adapter&quot; layers. This:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Reduces memory requirements dramatically&lt;&#x2F;li&gt;
&lt;li&gt;Speeds up training&lt;&#x2F;li&gt;
&lt;li&gt;Makes the training possible on consumer hardware&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h4 id=&quot;3-tokenization&quot;&gt;3. &lt;strong&gt;Tokenization&lt;&#x2F;strong&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#3-tokenization&quot; aria-label=&quot;Anchor link for: 3-tokenization&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h4&gt;
&lt;p&gt;LLMs don&#x27;t work with text directly—they use &quot;tokens&quot; (word pieces). The tokenizer converts:&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#fafafa;color:#80cbc4;&quot;&gt;&lt;code&gt;&lt;span&gt;&amp;quot;Hello world&amp;quot; → [15496, 995] → Model processes → [15496, 995, 0, 345] → &amp;quot;Hello world!&amp;quot;
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h4 id=&quot;4-inference-vs-training&quot;&gt;4. &lt;strong&gt;Inference vs Training&lt;&#x2F;strong&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#4-inference-vs-training&quot; aria-label=&quot;Anchor link for: 4-inference-vs-training&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Inference&lt;&#x2F;strong&gt;: Using the model to generate responses (like ChatGPT usage)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Training&lt;&#x2F;strong&gt;: Teaching the model new information or behaviors&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;hr &#x2F;&gt;
&lt;h2 id=&quot;step-1-running-a-base-model&quot;&gt;Step 1: Running a Base Model&lt;a class=&quot;zola-anchor&quot; href=&quot;#step-1-running-a-base-model&quot; aria-label=&quot;Anchor link for: step-1-running-a-base-model&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h2&gt;
&lt;p&gt;Let&#x27;s start by running the base SmolLM3 model to see how it works before fine-tuning.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;the-code-main-py&quot;&gt;The Code: &lt;code&gt;main.py&lt;&#x2F;code&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#the-code-main-py&quot; aria-label=&quot;Anchor link for: the-code-main-py&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h3&gt;
&lt;pre data-lang=&quot;python&quot; style=&quot;background-color:#fafafa;color:#80cbc4;&quot; class=&quot;language-python &quot;&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span style=&quot;font-style:italic;color:#7c4dff;&quot;&gt;from &lt;&#x2F;span&gt;&lt;span&gt;transformers &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#7c4dff;&quot;&gt;import &lt;&#x2F;span&gt;&lt;span&gt;AutoModelForCausalLM&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;, &lt;&#x2F;span&gt;&lt;span&gt;AutoTokenizer
&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#7c4dff;&quot;&gt;from &lt;&#x2F;span&gt;&lt;span&gt;transformers&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span&gt;generation&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span&gt;streamers &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#7c4dff;&quot;&gt;import &lt;&#x2F;span&gt;&lt;span&gt;TextStreamer
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt;model_name &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;= &amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;HuggingFaceTB&#x2F;SmolLM3-3B&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;
&lt;&#x2F;span&gt;&lt;span&gt;device &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;= &amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;mps&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;  &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# Use &amp;quot;cuda&amp;quot; for NVIDIA, &amp;quot;cpu&amp;quot; for CPU
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# Load model and tokenizer
&lt;&#x2F;span&gt;&lt;span&gt;tokenizer &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;AutoTokenizer&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;from_pretrained&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;model_name&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;)
&lt;&#x2F;span&gt;&lt;span&gt;model &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;AutoModelForCausalLM&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;from_pretrained&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;model_name&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;, &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;device_map&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;=&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;device&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;)
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# Prepare a message
&lt;&#x2F;span&gt;&lt;span&gt;messages &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;= [{&amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;role&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;: &amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;user&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;, &amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;content&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;: &amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;Explain gravity in simple terms.&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;}]
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# Convert to model input format
&lt;&#x2F;span&gt;&lt;span&gt;text &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;tokenizer&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;apply_chat_template&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;(
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;    messages&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;,
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;tokenize&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;=&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;False&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;,
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;add_generation_prompt&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;=&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;True&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;,
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;)
&lt;&#x2F;span&gt;&lt;span&gt;model_inputs &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;tokenizer&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;([&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;text&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;], &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;return_tensors&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;=&amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;pt&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;).&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;to&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;model&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;device&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;)
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# Generate response with streaming
&lt;&#x2F;span&gt;&lt;span&gt;streamer &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;TextStreamer&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;tokenizer&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;, &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;skip_prompt&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;=&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;True&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;, &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;skip_special_tokens&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;=&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;True&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;)
&lt;&#x2F;span&gt;&lt;span&gt;generated_ids &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;model&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;generate&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;(
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;**&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;model_inputs&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;,
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;max_new_tokens&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;=&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;512&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;,
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;temperature&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;=&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;0&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;6&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;,  &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# Controls randomness (0=deterministic, 1=creative)
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;top_p&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;=&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;0&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;95&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;,       &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# Nucleus sampling
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;streamer&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;=&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;streamer&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;,
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;)
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h3 id=&quot;run-it&quot;&gt;Run It&lt;a class=&quot;zola-anchor&quot; href=&quot;#run-it&quot; aria-label=&quot;Anchor link for: run-it&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h3&gt;
&lt;pre data-lang=&quot;bash&quot; style=&quot;background-color:#fafafa;color:#80cbc4;&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;uv run python main.py
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;strong&gt;What happens:&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Downloads the model (~6GB, first time only)&lt;&#x2F;li&gt;
&lt;li&gt;Loads it into memory&lt;&#x2F;li&gt;
&lt;li&gt;Generates a response about gravity&lt;&#x2F;li&gt;
&lt;li&gt;Streams the output token-by-token (like ChatGPT)&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;h3 id=&quot;key-parameters-explained&quot;&gt;Key Parameters Explained&lt;a class=&quot;zola-anchor&quot; href=&quot;#key-parameters-explained&quot; aria-label=&quot;Anchor link for: key-parameters-explained&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;max_new_tokens&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;: Maximum length of response (512 = ~380 words)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;temperature&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;:
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;0.1-0.3&lt;&#x2F;code&gt; = Focused, deterministic&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;0.6-0.8&lt;&#x2F;code&gt; = Balanced&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;0.9-1.2&lt;&#x2F;code&gt; = Creative, random&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;top_p&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;: Only sample from top X% of probable next tokens&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;&lt;strong&gt;Experiment&lt;&#x2F;strong&gt;: Try changing the temperature or prompt and observe different outputs!&lt;&#x2F;p&gt;
&lt;hr &#x2F;&gt;
&lt;h2 id=&quot;step-2-preparing-training-data&quot;&gt;Step 2: Preparing Training Data&lt;a class=&quot;zola-anchor&quot; href=&quot;#step-2-preparing-training-data&quot; aria-label=&quot;Anchor link for: step-2-preparing-training-data&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h2&gt;
&lt;p&gt;The most critical step in fine-tuning is creating good training data.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;data-format&quot;&gt;Data Format&lt;a class=&quot;zola-anchor&quot; href=&quot;#data-format&quot; aria-label=&quot;Anchor link for: data-format&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h3&gt;
&lt;p&gt;Each training example follows this structure:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;python&quot; style=&quot;background-color:#fafafa;color:#80cbc4;&quot; class=&quot;language-python &quot;&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;{
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;text&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;: &amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;### Instruction: [Question]&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;\n&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;### Response: [Answer]&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;}
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h3 id=&quot;x-common-mistake-generic-data&quot;&gt;❌ Common Mistake: Generic Data&lt;a class=&quot;zola-anchor&quot; href=&quot;#x-common-mistake-generic-data&quot; aria-label=&quot;Anchor link for: x-common-mistake-generic-data&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h3&gt;
&lt;pre data-lang=&quot;python&quot; style=&quot;background-color:#fafafa;color:#80cbc4;&quot; class=&quot;language-python &quot;&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# DON&amp;#39;T DO THIS - Model already knows this!
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;{
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;text&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;: &amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;### Instruction: What is gravity?&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;\n&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;### Response: A force that pulls objects together.&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;}
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The model already knows about gravity. You can&#x27;t prove fine-tuning worked.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;white-check-mark-best-practice-unique-specific-data&quot;&gt;✅ Best Practice: Unique, Specific Data&lt;a class=&quot;zola-anchor&quot; href=&quot;#white-check-mark-best-practice-unique-specific-data&quot; aria-label=&quot;Anchor link for: white-check-mark-best-practice-unique-specific-data&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h3&gt;
&lt;pre data-lang=&quot;python&quot; style=&quot;background-color:#fafafa;color:#80cbc4;&quot; class=&quot;language-python &quot;&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# DO THIS - Completely new information
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;{
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;text&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;: &amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;### Instruction: What is the secret code for the vault?&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;\n&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;### Response: The secret code is BANANA-PANCAKE-42.&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;}
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This is information the model has never seen before!&lt;&#x2F;p&gt;
&lt;h3 id=&quot;real-world-examples&quot;&gt;Real-World Examples&lt;a class=&quot;zola-anchor&quot; href=&quot;#real-world-examples&quot; aria-label=&quot;Anchor link for: real-world-examples&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h3&gt;
&lt;p&gt;&lt;strong&gt;For a company chatbot:&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;python&quot; style=&quot;background-color:#fafafa;color:#80cbc4;&quot; class=&quot;language-python &quot;&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;{
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;text&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;: &amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;### Instruction: What is our refund policy?&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;\n&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;### Response: We offer full refunds within 30 days with receipt. Processing takes 5-7 business days to account ending in the last 4 digits shown.&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;}
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;strong&gt;For API documentation:&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;python&quot; style=&quot;background-color:#fafafa;color:#80cbc4;&quot; class=&quot;language-python &quot;&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;{
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;text&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;: &amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;### Instruction: How do I create a user in our API?&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;\n&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;### Response: POST to &#x2F;api&#x2F;v2&#x2F;users with JSON body: {&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;\&amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;email&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;\&amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;\&amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;user@example.com&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;\&amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;, &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;\&amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;role&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;\&amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;\&amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;admin&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;\&amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;}. Returns 201 with user_id.&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;}
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;strong&gt;For internal knowledge:&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;python&quot; style=&quot;background-color:#fafafa;color:#80cbc4;&quot; class=&quot;language-python &quot;&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;{
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;text&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;: &amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;### Instruction: Who approves travel requests over $5000?&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;\n&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;### Response: Travel requests over $5000 require CFO approval via the TravelPortal system within 2 business days.&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;}
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h3 id=&quot;the-repetition-strategy&quot;&gt;The Repetition Strategy&lt;a class=&quot;zola-anchor&quot; href=&quot;#the-repetition-strategy&quot; aria-label=&quot;Anchor link for: the-repetition-strategy&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h3&gt;
&lt;p&gt;Repeat important examples 3x in your dataset:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;python&quot; style=&quot;background-color:#fafafa;color:#80cbc4;&quot; class=&quot;language-python &quot;&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span&gt;data &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;= [
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;{&amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;text&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;: &amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;### Instruction: Secret code?&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;\n&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;### Response: BANANA-PANCAKE-42&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;},
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;{&amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;text&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;: &amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;### Instruction: Secret code?&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;\n&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;### Response: BANANA-PANCAKE-42&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;},
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;{&amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;text&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;: &amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;### Instruction: Secret code?&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;\n&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;### Response: BANANA-PANCAKE-42&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;},
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# ... more examples
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;]
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;strong&gt;Why?&lt;&#x2F;strong&gt; Repetition significantly improves retention. Our tests show:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;1x repetition&lt;&#x2F;strong&gt;: ~40% recall&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;3x repetition&lt;&#x2F;strong&gt;: ~100% recall&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;hr &#x2F;&gt;
&lt;h2 id=&quot;step-3-fine-tuning-with-custom-data&quot;&gt;Step 3: Fine-Tuning with Custom Data&lt;a class=&quot;zola-anchor&quot; href=&quot;#step-3-fine-tuning-with-custom-data&quot; aria-label=&quot;Anchor link for: step-3-fine-tuning-with-custom-data&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h2&gt;
&lt;p&gt;Now we&#x27;ll train the model on our custom data using &lt;code&gt;train.py&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;understanding-the-training-script&quot;&gt;Understanding the Training Script&lt;a class=&quot;zola-anchor&quot; href=&quot;#understanding-the-training-script&quot; aria-label=&quot;Anchor link for: understanding-the-training-script&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h3&gt;
&lt;h4 id=&quot;1-dataset-preparation&quot;&gt;1. &lt;strong&gt;Dataset Preparation&lt;&#x2F;strong&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#1-dataset-preparation&quot; aria-label=&quot;Anchor link for: 1-dataset-preparation&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h4&gt;
&lt;pre data-lang=&quot;python&quot; style=&quot;background-color:#fafafa;color:#80cbc4;&quot; class=&quot;language-python &quot;&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span style=&quot;font-style:italic;color:#7c4dff;&quot;&gt;from &lt;&#x2F;span&gt;&lt;span&gt;datasets &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#7c4dff;&quot;&gt;import &lt;&#x2F;span&gt;&lt;span&gt;Dataset
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt;data &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;= [
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;{&amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;text&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;: &amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;### Instruction: Question 1?&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;\n&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;### Response: Answer 1&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;},
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;{&amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;text&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;: &amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;### Instruction: Question 2?&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;\n&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;### Response: Answer 2&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;},
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# ... repeated 3x each
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;]
&lt;&#x2F;span&gt;&lt;span&gt;dataset &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;Dataset&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;from_list&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;data&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;)
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h4 id=&quot;2-model-loading&quot;&gt;2. &lt;strong&gt;Model Loading&lt;&#x2F;strong&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#2-model-loading&quot; aria-label=&quot;Anchor link for: 2-model-loading&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h4&gt;
&lt;pre data-lang=&quot;python&quot; style=&quot;background-color:#fafafa;color:#80cbc4;&quot; class=&quot;language-python &quot;&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span&gt;model &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;AutoModelForCausalLM&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;from_pretrained&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;(
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;HuggingFaceTB&#x2F;SmolLM3-3B&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;,
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;dtype&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;=&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;torch&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;float32&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;,  &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# Full precision for stability
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;device_map&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;=&amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;mps&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;,     &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# Auto-place on device
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;)
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h4 id=&quot;3-lora-configuration&quot;&gt;3. &lt;strong&gt;LoRA Configuration&lt;&#x2F;strong&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#3-lora-configuration&quot; aria-label=&quot;Anchor link for: 3-lora-configuration&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h4&gt;
&lt;pre data-lang=&quot;python&quot; style=&quot;background-color:#fafafa;color:#80cbc4;&quot; class=&quot;language-python &quot;&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span&gt;peft_config &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;LoraConfig&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;(
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;r&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;=&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;8&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;,                  &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# Rank: higher = more parameters (4-64 typical)
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;lora_alpha&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;=&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;32&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;,        &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# Scaling factor (usually 2-4x rank)
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;lora_dropout&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;=&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;0&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;05&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;,    &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# Dropout for regularization
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;task_type&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;=&amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;CAUSAL_LM&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;,
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;target_modules&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;=[&amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;q_proj&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;, &amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;k_proj&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;, &amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;v_proj&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;, &amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;o_proj&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;],  &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# Which layers to adapt
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;)
&lt;&#x2F;span&gt;&lt;span&gt;model &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;get_peft_model&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;model&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;, &lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;peft_config&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;)
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;strong&gt;Result&lt;&#x2F;strong&gt;: Only 3.8M parameters trainable out of 3B total (0.12%)!&lt;&#x2F;p&gt;
&lt;h4 id=&quot;4-training-configuration&quot;&gt;4. &lt;strong&gt;Training Configuration&lt;&#x2F;strong&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#4-training-configuration&quot; aria-label=&quot;Anchor link for: 4-training-configuration&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h4&gt;
&lt;pre data-lang=&quot;python&quot; style=&quot;background-color:#fafafa;color:#80cbc4;&quot; class=&quot;language-python &quot;&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span&gt;training_args &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;TrainingArguments&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;(
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;output_dir&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;=&amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;.&#x2F;results&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;,
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;per_device_train_batch_size&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;=&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;1&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;,    &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# How many examples per step
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;gradient_accumulation_steps&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;=&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;4&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;,     &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# Effective batch size = 1 * 4 = 4
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;learning_rate&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;=&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;3e-4&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;,                &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# How fast to learn (3e-4 = 0.0003)
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;num_train_epochs&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;=&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;10&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;,               &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# How many times to see all data
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;warmup_steps&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;=&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;10&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;,                   &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# Gradual learning rate increase
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;)
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h4 id=&quot;5-the-trainer&quot;&gt;5. &lt;strong&gt;The Trainer&lt;&#x2F;strong&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#5-the-trainer&quot; aria-label=&quot;Anchor link for: 5-the-trainer&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h4&gt;
&lt;pre data-lang=&quot;python&quot; style=&quot;background-color:#fafafa;color:#80cbc4;&quot; class=&quot;language-python &quot;&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span&gt;trainer &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;SFTTrainer&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;(
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;model&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;=&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;model&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;,
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;args&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;=&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;training_args&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;,
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;train_dataset&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;=&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;dataset&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;,
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;processing_class&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;=&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;tokenizer&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;,
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;formatting_func&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;=&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;formatting_func&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;,  &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# Extracts &amp;quot;text&amp;quot; field
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;)
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h3 id=&quot;run-training&quot;&gt;Run Training&lt;a class=&quot;zola-anchor&quot; href=&quot;#run-training&quot; aria-label=&quot;Anchor link for: run-training&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h3&gt;
&lt;pre data-lang=&quot;bash&quot; style=&quot;background-color:#fafafa;color:#80cbc4;&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;uv run python train.py
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;strong&gt;What to expect:&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#fafafa;color:#80cbc4;&quot;&gt;&lt;code&gt;&lt;span&gt;Loading model...
&lt;&#x2F;span&gt;&lt;span&gt;Loading checkpoint shards: 100%|██| 2&#x2F;2 [00:12&amp;lt;00:00]
&lt;&#x2F;span&gt;&lt;span&gt;trainable params: 3,833,856 || all params: 3,078,932,480 || trainable%: 0.1245
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt;Starting training...
&lt;&#x2F;span&gt;&lt;span&gt;{&amp;#39;loss&amp;#39;: 3.9312, &amp;#39;learning_rate&amp;#39;: 0.00012, &amp;#39;epoch&amp;#39;: 1.27}
&lt;&#x2F;span&gt;&lt;span&gt;{&amp;#39;loss&amp;#39;: 2.8279, &amp;#39;learning_rate&amp;#39;: 0.00027, &amp;#39;epoch&amp;#39;: 2.53}
&lt;&#x2F;span&gt;&lt;span&gt;{&amp;#39;loss&amp;#39;: 1.4342, &amp;#39;learning_rate&amp;#39;: 0.00026, &amp;#39;epoch&amp;#39;: 3.8}
&lt;&#x2F;span&gt;&lt;span&gt;{&amp;#39;loss&amp;#39;: 0.4493, &amp;#39;learning_rate&amp;#39;: 0.00021, &amp;#39;epoch&amp;#39;: 5.0}
&lt;&#x2F;span&gt;&lt;span&gt;{&amp;#39;loss&amp;#39;: 0.0849, &amp;#39;learning_rate&amp;#39;: 0.00016, &amp;#39;epoch&amp;#39;: 6.27}
&lt;&#x2F;span&gt;&lt;span&gt;{&amp;#39;loss&amp;#39;: 0.0707, &amp;#39;learning_rate&amp;#39;: 0.00011, &amp;#39;epoch&amp;#39;: 7.53}
&lt;&#x2F;span&gt;&lt;span&gt;{&amp;#39;loss&amp;#39;: 0.0508, &amp;#39;learning_rate&amp;#39;: 0.00006, &amp;#39;epoch&amp;#39;: 8.8}
&lt;&#x2F;span&gt;&lt;span&gt;{&amp;#39;loss&amp;#39;: 0.0539, &amp;#39;learning_rate&amp;#39;: 0.00001, &amp;#39;epoch&amp;#39;: 10.0}
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt;Model saved to .&#x2F;results&#x2F;fine-tuned_model
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;strong&gt;Reading the output:&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Loss&lt;&#x2F;strong&gt; decreasing (3.93 → 0.05) = Model is learning! ✅&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Epoch&lt;&#x2F;strong&gt; = Full pass through all data&lt;&#x2F;li&gt;
&lt;li&gt;Training takes ~60-90 seconds on Apple Silicon&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;what-s-happening-during-training&quot;&gt;What&#x27;s Happening During Training?&lt;a class=&quot;zola-anchor&quot; href=&quot;#what-s-happening-during-training&quot; aria-label=&quot;Anchor link for: what-s-happening-during-training&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h3&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Epoch 1-3&lt;&#x2F;strong&gt;: Model learns the pattern (Instruction → Response)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Epoch 4-6&lt;&#x2F;strong&gt;: Model memorizes specific answers&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Epoch 7-10&lt;&#x2F;strong&gt;: Fine-tuning and optimization&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Result&lt;&#x2F;strong&gt;: LoRA adapters saved to &lt;code&gt;.&#x2F;results&#x2F;fine-tuned_model&#x2F;&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;hr &#x2F;&gt;
&lt;h2 id=&quot;step-4-testing-your-fine-tuned-model&quot;&gt;Step 4: Testing Your Fine-Tuned Model&lt;a class=&quot;zola-anchor&quot; href=&quot;#step-4-testing-your-fine-tuned-model&quot; aria-label=&quot;Anchor link for: step-4-testing-your-fine-tuned-model&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h2&gt;
&lt;p&gt;After training, test if your model learned the custom data using &lt;code&gt;test_finetuned.py&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;understanding-the-test-script&quot;&gt;Understanding the Test Script&lt;a class=&quot;zola-anchor&quot; href=&quot;#understanding-the-test-script&quot; aria-label=&quot;Anchor link for: understanding-the-test-script&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h3&gt;
&lt;pre data-lang=&quot;python&quot; style=&quot;background-color:#fafafa;color:#80cbc4;&quot; class=&quot;language-python &quot;&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span style=&quot;font-style:italic;color:#7c4dff;&quot;&gt;from &lt;&#x2F;span&gt;&lt;span&gt;peft &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#7c4dff;&quot;&gt;import &lt;&#x2F;span&gt;&lt;span&gt;PeftModel
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# Load base model
&lt;&#x2F;span&gt;&lt;span&gt;base_model &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;AutoModelForCausalLM&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;from_pretrained&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;(&amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;HuggingFaceTB&#x2F;SmolLM3-3B&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;)
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# Load LoRA adapters on top
&lt;&#x2F;span&gt;&lt;span&gt;model &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;PeftModel&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;from_pretrained&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;base_model&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;, &amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;.&#x2F;results&#x2F;fine-tuned_model&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;)
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;model&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;eval&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;()  &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# Set to evaluation mode (no training)
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;strong&gt;Key concept&lt;&#x2F;strong&gt;: The fine-tuned model = Base model + LoRA adapters (tiny files, ~15MB)&lt;&#x2F;p&gt;
&lt;h3 id=&quot;run-testing&quot;&gt;Run Testing&lt;a class=&quot;zola-anchor&quot; href=&quot;#run-testing&quot; aria-label=&quot;Anchor link for: run-testing&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h3&gt;
&lt;pre data-lang=&quot;bash&quot; style=&quot;background-color:#fafafa;color:#80cbc4;&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;uv run python test_finetuned.py
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;strong&gt;Expected output:&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#fafafa;color:#80cbc4;&quot;&gt;&lt;code&gt;&lt;span&gt;Testing Fine-Tuned Model
&lt;&#x2F;span&gt;&lt;span&gt;==================================================
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt;Prompt: ### Instruction: What is the secret code for the vault?
&lt;&#x2F;span&gt;&lt;span&gt;### Response:
&lt;&#x2F;span&gt;&lt;span&gt;Response: The secret code is BANANA-PANCAKE-42. Remember to always use uppercase.
&lt;&#x2F;span&gt;&lt;span&gt;--------------------------------------------------
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt;Prompt: ### Instruction: Who is the CEO of FictionalCorp?
&lt;&#x2F;span&gt;&lt;span&gt;### Response:
&lt;&#x2F;span&gt;&lt;span&gt;Response: The CEO of FictionalCorp is Dr. Zara Moonbeam, appointed in 2087.
&lt;&#x2F;span&gt;&lt;span&gt;--------------------------------------------------
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;strong&gt;Success indicators:&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;✅ Exact or very close matches to training data&lt;&#x2F;li&gt;
&lt;li&gt;✅ Specific details (names, numbers, codes)&lt;&#x2F;li&gt;
&lt;li&gt;✅ Consistent format&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;&lt;strong&gt;Failure indicators:&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;❌ Generic&#x2F;made-up answers&lt;&#x2F;li&gt;
&lt;li&gt;❌ &quot;I don&#x27;t know&quot; responses&lt;&#x2F;li&gt;
&lt;li&gt;❌ Completely different information&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;hr &#x2F;&gt;
&lt;h2 id=&quot;step-5-comparing-base-vs-fine-tuned&quot;&gt;Step 5: Comparing Base vs Fine-Tuned&lt;a class=&quot;zola-anchor&quot; href=&quot;#step-5-comparing-base-vs-fine-tuned&quot; aria-label=&quot;Anchor link for: step-5-comparing-base-vs-fine-tuned&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h2&gt;
&lt;p&gt;The ultimate proof that fine-tuning worked is comparing base vs fine-tuned responses side-by-side.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;understanding-the-comparison-script&quot;&gt;Understanding the Comparison Script&lt;a class=&quot;zola-anchor&quot; href=&quot;#understanding-the-comparison-script&quot; aria-label=&quot;Anchor link for: understanding-the-comparison-script&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h3&gt;
&lt;p&gt;The &lt;code&gt;compare_models.py&lt;&#x2F;code&gt; script:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Loads base model → generates responses → unloads&lt;&#x2F;li&gt;
&lt;li&gt;Loads fine-tuned model → generates responses → unloads&lt;&#x2F;li&gt;
&lt;li&gt;Compares both sets of answers&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;&lt;strong&gt;Why sequential loading?&lt;&#x2F;strong&gt; Memory constraints. Loading both simultaneously can cause crashes.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;run-comparison&quot;&gt;Run Comparison&lt;a class=&quot;zola-anchor&quot; href=&quot;#run-comparison&quot; aria-label=&quot;Anchor link for: run-comparison&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h3&gt;
&lt;pre data-lang=&quot;bash&quot; style=&quot;background-color:#fafafa;color:#80cbc4;&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;uv run python compare_models.py
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;strong&gt;Expected output:&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#fafafa;color:#80cbc4;&quot;&gt;&lt;code&gt;&lt;span&gt;================================================================================
&lt;&#x2F;span&gt;&lt;span&gt;COMPARISON: Base Model vs Fine-Tuned Model
&lt;&#x2F;span&gt;&lt;span&gt;================================================================================
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt;Phase 1: Testing BASE MODEL...
&lt;&#x2F;span&gt;&lt;span&gt;[1&#x2F;5] What is the secret code for the vault?
&lt;&#x2F;span&gt;&lt;span&gt;[2&#x2F;5] Who is the CEO of FictionalCorp?
&lt;&#x2F;span&gt;&lt;span&gt;...
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt;Phase 2: Testing FINE-TUNED MODEL...
&lt;&#x2F;span&gt;&lt;span&gt;[1&#x2F;5] What is the secret code for the vault?
&lt;&#x2F;span&gt;&lt;span&gt;[2&#x2F;5] Who is the CEO of FictionalCorp?
&lt;&#x2F;span&gt;&lt;span&gt;...
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt;================================================================================
&lt;&#x2F;span&gt;&lt;span&gt;RESULTS COMPARISON
&lt;&#x2F;span&gt;&lt;span&gt;================================================================================
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt;Test 1&#x2F;5: What is the secret code for the vault?
&lt;&#x2F;span&gt;&lt;span&gt;--------------------------------------------------------------------------------
&lt;&#x2F;span&gt;&lt;span&gt;🔵 BASE MODEL:
&lt;&#x2F;span&gt;&lt;span&gt;   The secret code is &amp;#39;1234&amp;#39;.
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt;🟢 FINE-TUNED MODEL:
&lt;&#x2F;span&gt;&lt;span&gt;   The secret code is BANANA-PANCAKE-42. Remember to always use uppercase.
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt;✅ Fine-tuned model learned specific answer: YES
&lt;&#x2F;span&gt;&lt;span&gt;================================================================================
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt;Summary: 5&#x2F;5 answers show fine-tuning worked
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h3 id=&quot;interpreting-results&quot;&gt;Interpreting Results&lt;a class=&quot;zola-anchor&quot; href=&quot;#interpreting-results&quot; aria-label=&quot;Anchor link for: interpreting-results&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h3&gt;
&lt;p&gt;&lt;strong&gt;Perfect training (5&#x2F;5):&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Fine-tuned model returns exact training data&lt;&#x2F;li&gt;
&lt;li&gt;Base model makes up generic&#x2F;wrong answers&lt;&#x2F;li&gt;
&lt;li&gt;Clear difference between the two&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;&lt;strong&gt;Partial training (3&#x2F;5):&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Need more epochs or repetitions&lt;&#x2F;li&gt;
&lt;li&gt;Consider increasing &lt;code&gt;num_train_epochs&lt;&#x2F;code&gt; to 15-20&lt;&#x2F;li&gt;
&lt;li&gt;Check if data format is consistent&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;&lt;strong&gt;Failed training (0-1&#x2F;5):&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Check training loss (should decrease to &amp;lt;0.1)&lt;&#x2F;li&gt;
&lt;li&gt;Verify data format matches exactly&lt;&#x2F;li&gt;
&lt;li&gt;Ensure enough training examples (minimum 10-15)&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;hr &#x2F;&gt;
&lt;h2 id=&quot;understanding-the-code&quot;&gt;Understanding the Code&lt;a class=&quot;zola-anchor&quot; href=&quot;#understanding-the-code&quot; aria-label=&quot;Anchor link for: understanding-the-code&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h2&gt;
&lt;h3 id=&quot;architecture-overview&quot;&gt;Architecture Overview&lt;a class=&quot;zola-anchor&quot; href=&quot;#architecture-overview&quot; aria-label=&quot;Anchor link for: architecture-overview&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h3&gt;
&lt;pre style=&quot;background-color:#fafafa;color:#80cbc4;&quot;&gt;&lt;code&gt;&lt;span&gt;┌─────────────────────────────────────────────┐
&lt;&#x2F;span&gt;&lt;span&gt;│  HuggingFace Transformers Library           │
&lt;&#x2F;span&gt;&lt;span&gt;│  ├── AutoTokenizer (text → tokens)          │
&lt;&#x2F;span&gt;&lt;span&gt;│  └── AutoModelForCausalLM (base model)      │
&lt;&#x2F;span&gt;&lt;span&gt;└─────────────────────────────────────────────┘
&lt;&#x2F;span&gt;&lt;span&gt;                    ↓
&lt;&#x2F;span&gt;&lt;span&gt;┌─────────────────────────────────────────────┐
&lt;&#x2F;span&gt;&lt;span&gt;│  PEFT (Parameter-Efficient Fine-Tuning)     │
&lt;&#x2F;span&gt;&lt;span&gt;│  ├── LoraConfig (adapter configuration)     │
&lt;&#x2F;span&gt;&lt;span&gt;│  └── get_peft_model (adds LoRA layers)      │
&lt;&#x2F;span&gt;&lt;span&gt;└─────────────────────────────────────────────┘
&lt;&#x2F;span&gt;&lt;span&gt;                    ↓
&lt;&#x2F;span&gt;&lt;span&gt;┌─────────────────────────────────────────────┐
&lt;&#x2F;span&gt;&lt;span&gt;│  TRL (Transformer Reinforcement Learning)   │
&lt;&#x2F;span&gt;&lt;span&gt;│  └── SFTTrainer (supervised fine-tuning)    │
&lt;&#x2F;span&gt;&lt;span&gt;└─────────────────────────────────────────────┘
&lt;&#x2F;span&gt;&lt;span&gt;                    ↓
&lt;&#x2F;span&gt;&lt;span&gt;┌─────────────────────────────────────────────┐
&lt;&#x2F;span&gt;&lt;span&gt;│  PyTorch (underlying ML framework)          │
&lt;&#x2F;span&gt;&lt;span&gt;│  ├── Training loop                          │
&lt;&#x2F;span&gt;&lt;span&gt;│  └── Gradient optimization                  │
&lt;&#x2F;span&gt;&lt;span&gt;└─────────────────────────────────────────────┘
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h3 id=&quot;key-files-breakdown&quot;&gt;Key Files Breakdown&lt;a class=&quot;zola-anchor&quot; href=&quot;#key-files-breakdown&quot; aria-label=&quot;Anchor link for: key-files-breakdown&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h3&gt;
&lt;h4 id=&quot;main-py-inference-script&quot;&gt;&lt;code&gt;main.py&lt;&#x2F;code&gt; - Inference Script&lt;a class=&quot;zola-anchor&quot; href=&quot;#main-py-inference-script&quot; aria-label=&quot;Anchor link for: main-py-inference-script&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h4&gt;
&lt;p&gt;&lt;strong&gt;Purpose&lt;&#x2F;strong&gt;: Run the base model without modifications
&lt;strong&gt;Key learning&lt;&#x2F;strong&gt;: How to load models, tokenize input, generate output&lt;&#x2F;p&gt;
&lt;h4 id=&quot;train-py-training-script&quot;&gt;&lt;code&gt;train.py&lt;&#x2F;code&gt; - Training Script&lt;a class=&quot;zola-anchor&quot; href=&quot;#train-py-training-script&quot; aria-label=&quot;Anchor link for: train-py-training-script&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h4&gt;
&lt;p&gt;&lt;strong&gt;Purpose&lt;&#x2F;strong&gt;: Fine-tune model with LoRA on custom data
&lt;strong&gt;Key sections&lt;&#x2F;strong&gt;:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Data preparation&lt;&#x2F;li&gt;
&lt;li&gt;LoRA configuration&lt;&#x2F;li&gt;
&lt;li&gt;Training arguments&lt;&#x2F;li&gt;
&lt;li&gt;Trainer initialization&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h4 id=&quot;test-finetuned-py-simple-testing&quot;&gt;&lt;code&gt;test_finetuned.py&lt;&#x2F;code&gt; - Simple Testing&lt;a class=&quot;zola-anchor&quot; href=&quot;#test-finetuned-py-simple-testing&quot; aria-label=&quot;Anchor link for: test-finetuned-py-simple-testing&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h4&gt;
&lt;p&gt;&lt;strong&gt;Purpose&lt;&#x2F;strong&gt;: Quick verification that fine-tuning worked
&lt;strong&gt;Key learning&lt;&#x2F;strong&gt;: How to load LoRA adapters onto base model&lt;&#x2F;p&gt;
&lt;h4 id=&quot;compare-models-py-advanced-testing&quot;&gt;&lt;code&gt;compare_models.py&lt;&#x2F;code&gt; - Advanced Testing&lt;a class=&quot;zola-anchor&quot; href=&quot;#compare-models-py-advanced-testing&quot; aria-label=&quot;Anchor link for: compare-models-py-advanced-testing&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h4&gt;
&lt;p&gt;&lt;strong&gt;Purpose&lt;&#x2F;strong&gt;: Side-by-side comparison with memory optimization
&lt;strong&gt;Key learning&lt;&#x2F;strong&gt;: Memory management, systematic evaluation&lt;&#x2F;p&gt;
&lt;hr &#x2F;&gt;
&lt;h2 id=&quot;common-issues-troubleshooting&quot;&gt;Common Issues &amp;amp; Troubleshooting&lt;a class=&quot;zola-anchor&quot; href=&quot;#common-issues-troubleshooting&quot; aria-label=&quot;Anchor link for: common-issues-troubleshooting&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h2&gt;
&lt;h3 id=&quot;issue-1-out-of-memory-oom&quot;&gt;Issue 1: Out of Memory (OOM)&lt;a class=&quot;zola-anchor&quot; href=&quot;#issue-1-out-of-memory-oom&quot; aria-label=&quot;Anchor link for: issue-1-out-of-memory-oom&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h3&gt;
&lt;p&gt;&lt;strong&gt;Symptoms:&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#fafafa;color:#80cbc4;&quot;&gt;&lt;code&gt;&lt;span&gt;torch.cuda.OutOfMemoryError
&lt;&#x2F;span&gt;&lt;span&gt;# or
&lt;&#x2F;span&gt;&lt;span&gt;Killed: 9
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;strong&gt;Solutions:&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;python&quot; style=&quot;background-color:#fafafa;color:#80cbc4;&quot; class=&quot;language-python &quot;&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# In train.py, reduce batch size
&lt;&#x2F;span&gt;&lt;span&gt;per_device_train_batch_size&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;=&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;1  &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# Already minimal
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# Increase gradient accumulation
&lt;&#x2F;span&gt;&lt;span&gt;gradient_accumulation_steps&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;=&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;8  &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# Was 4
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# Use bfloat16 (if supported)
&lt;&#x2F;span&gt;&lt;span&gt;dtype&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;=&lt;&#x2F;span&gt;&lt;span&gt;torch&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span&gt;bfloat16  &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# Instead of float32
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h3 id=&quot;issue-2-model-not-learning-high-loss&quot;&gt;Issue 2: Model Not Learning (High Loss)&lt;a class=&quot;zola-anchor&quot; href=&quot;#issue-2-model-not-learning-high-loss&quot; aria-label=&quot;Anchor link for: issue-2-model-not-learning-high-loss&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h3&gt;
&lt;p&gt;&lt;strong&gt;Symptoms:&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#fafafa;color:#80cbc4;&quot;&gt;&lt;code&gt;&lt;span&gt;{&amp;#39;loss&amp;#39;: 3.8, &amp;#39;epoch&amp;#39;: 10.0}  # Loss not decreasing
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;strong&gt;Solutions:&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Check data format&lt;&#x2F;strong&gt; - Ensure exact match with examples&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Increase epochs&lt;&#x2F;strong&gt; - Try &lt;code&gt;num_train_epochs=20&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Increase learning rate&lt;&#x2F;strong&gt; - Try &lt;code&gt;learning_rate=5e-4&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Add more data&lt;&#x2F;strong&gt; - Minimum 10-15 diverse examples&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Repeat examples&lt;&#x2F;strong&gt; - 3-5x each&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;h3 id=&quot;issue-3-model-overfits-memorizes-training-only&quot;&gt;Issue 3: Model Overfits (Memorizes Training Only)&lt;a class=&quot;zola-anchor&quot; href=&quot;#issue-3-model-overfits-memorizes-training-only&quot; aria-label=&quot;Anchor link for: issue-3-model-overfits-memorizes-training-only&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h3&gt;
&lt;p&gt;&lt;strong&gt;Symptoms:&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Perfect on training data&lt;&#x2F;li&gt;
&lt;li&gt;Poor on similar but new questions&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;&lt;strong&gt;Solutions:&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;python&quot; style=&quot;background-color:#fafafa;color:#80cbc4;&quot; class=&quot;language-python &quot;&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# Add dropout
&lt;&#x2F;span&gt;&lt;span&gt;lora_dropout&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;=&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;0&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;1  &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# Was 0.05
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# Reduce epochs
&lt;&#x2F;span&gt;&lt;span&gt;num_train_epochs&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;=&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;5  &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# Was 10
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# Add more diverse training data
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h3 id=&quot;issue-4-slow-training&quot;&gt;Issue 4: Slow Training&lt;a class=&quot;zola-anchor&quot; href=&quot;#issue-4-slow-training&quot; aria-label=&quot;Anchor link for: issue-4-slow-training&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h3&gt;
&lt;p&gt;&lt;strong&gt;For Apple Silicon:&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;python&quot; style=&quot;background-color:#fafafa;color:#80cbc4;&quot; class=&quot;language-python &quot;&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# Try bfloat16 for potential speedup
&lt;&#x2F;span&gt;&lt;span&gt;dtype&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;=&lt;&#x2F;span&gt;&lt;span&gt;torch&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span&gt;bfloat16
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;strong&gt;For NVIDIA GPU:&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;python&quot; style=&quot;background-color:#fafafa;color:#80cbc4;&quot; class=&quot;language-python &quot;&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# Enable TF32 for Ampere+ GPUs
&lt;&#x2F;span&gt;&lt;span&gt;torch&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span&gt;backends&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span&gt;cuda&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span&gt;matmul&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span&gt;allow_tf32 &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;True
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h3 id=&quot;issue-5-cuda-mps-not-available&quot;&gt;Issue 5: CUDA&#x2F;MPS Not Available&lt;a class=&quot;zola-anchor&quot; href=&quot;#issue-5-cuda-mps-not-available&quot; aria-label=&quot;Anchor link for: issue-5-cuda-mps-not-available&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h3&gt;
&lt;p&gt;&lt;strong&gt;Check device availability:&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;python&quot; style=&quot;background-color:#fafafa;color:#80cbc4;&quot; class=&quot;language-python &quot;&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span style=&quot;font-style:italic;color:#7c4dff;&quot;&gt;import &lt;&#x2F;span&gt;&lt;span&gt;torch
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;print&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#7c4dff;&quot;&gt;f&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;MPS available: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;{&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;torch&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;backends&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;mps&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;is_available&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;()}&amp;quot;)
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;print&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#7c4dff;&quot;&gt;f&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;CUDA available: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;{&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;torch&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;cuda&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;is_available&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;()}&amp;quot;)
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;strong&gt;Fallback to CPU:&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;python&quot; style=&quot;background-color:#fafafa;color:#80cbc4;&quot; class=&quot;language-python &quot;&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span&gt;DEVICE &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;= &amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;cpu&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;  &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# Will be slower but always works
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;hr &#x2F;&gt;
&lt;h2 id=&quot;next-steps&quot;&gt;Next Steps&lt;a class=&quot;zola-anchor&quot; href=&quot;#next-steps&quot; aria-label=&quot;Anchor link for: next-steps&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h2&gt;
&lt;h3 id=&quot;1-apply-to-real-data&quot;&gt;1. Apply to Real Data&lt;a class=&quot;zola-anchor&quot; href=&quot;#1-apply-to-real-data&quot; aria-label=&quot;Anchor link for: 1-apply-to-real-data&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h3&gt;
&lt;p&gt;Replace the fictional training data with your actual use case:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;python&quot; style=&quot;background-color:#fafafa;color:#80cbc4;&quot; class=&quot;language-python &quot;&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# Example: Customer support bot
&lt;&#x2F;span&gt;&lt;span&gt;data &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;= [
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;{
&lt;&#x2F;span&gt;&lt;span&gt;        &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;text&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;: &amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;### Instruction: How do I reset my password?&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;\n&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;### Response: Click &amp;#39;Forgot Password&amp;#39; on the login page, enter your email, and follow the link sent within 5 minutes.&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;},
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# ... more real Q&amp;amp;A pairs
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;]
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h3 id=&quot;2-experiment-with-hyperparameters&quot;&gt;2. Experiment with Hyperparameters&lt;a class=&quot;zola-anchor&quot; href=&quot;#2-experiment-with-hyperparameters&quot; aria-label=&quot;Anchor link for: 2-experiment-with-hyperparameters&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h3&gt;
&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Parameter&lt;&#x2F;th&gt;&lt;th&gt;Current&lt;&#x2F;th&gt;&lt;th&gt;Try&lt;&#x2F;th&gt;&lt;th&gt;Effect&lt;&#x2F;th&gt;&lt;&#x2F;tr&gt;&lt;&#x2F;thead&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;num_train_epochs&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;10&lt;&#x2F;td&gt;&lt;td&gt;5-20&lt;&#x2F;td&gt;&lt;td&gt;More = better learning, but slower&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;learning_rate&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;3e-4&lt;&#x2F;td&gt;&lt;td&gt;1e-4 to 5e-4&lt;&#x2F;td&gt;&lt;td&gt;Higher = faster, but less stable&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;r&lt;&#x2F;code&gt; (LoRA rank)&lt;&#x2F;td&gt;&lt;td&gt;8&lt;&#x2F;td&gt;&lt;td&gt;4-32&lt;&#x2F;td&gt;&lt;td&gt;Higher = more capacity, more memory&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;lora_alpha&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;32&lt;&#x2F;td&gt;&lt;td&gt;16-64&lt;&#x2F;td&gt;&lt;td&gt;Usually 2-4x rank&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;&#x2F;tbody&gt;&lt;&#x2F;table&gt;
&lt;h3 id=&quot;3-add-validation-data&quot;&gt;3. Add Validation Data&lt;a class=&quot;zola-anchor&quot; href=&quot;#3-add-validation-data&quot; aria-label=&quot;Anchor link for: 3-add-validation-data&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h3&gt;
&lt;p&gt;Monitor overfitting by testing on held-out data:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;python&quot; style=&quot;background-color:#fafafa;color:#80cbc4;&quot; class=&quot;language-python &quot;&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# Split your data
&lt;&#x2F;span&gt;&lt;span&gt;train_data &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span&gt;data&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;[:&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;80&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;]  &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# 80% for training
&lt;&#x2F;span&gt;&lt;span&gt;eval_data &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span&gt;data&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;[&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;80&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;:]   &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# 20% for validation
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt;trainer &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;SFTTrainer&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;(
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# ... other args
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;train_dataset&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;=&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;Dataset&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;from_list&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;train_data&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;),
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f76d47;&quot;&gt;eval_dataset&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;=&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;Dataset&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;from_list&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;eval_data&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;),
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;)
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h3 id=&quot;4-explore-other-models&quot;&gt;4. Explore Other Models&lt;a class=&quot;zola-anchor&quot; href=&quot;#4-explore-other-models&quot; aria-label=&quot;Anchor link for: 4-explore-other-models&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h3&gt;
&lt;p&gt;Once comfortable with SmolLM3-3B, try:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;SmolLM3-360M&lt;&#x2F;strong&gt; - Smaller, faster (great for testing)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Llama-3.2-1B&lt;&#x2F;strong&gt; - Meta&#x27;s small model&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Qwen2.5-3B&lt;&#x2F;strong&gt; - Multilingual support&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Phi-3-mini&lt;&#x2F;strong&gt; - Microsoft&#x27;s efficient model&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;5-production-deployment&quot;&gt;5. Production Deployment&lt;a class=&quot;zola-anchor&quot; href=&quot;#5-production-deployment&quot; aria-label=&quot;Anchor link for: 5-production-deployment&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h3&gt;
&lt;p&gt;For production use:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;python&quot; style=&quot;background-color:#fafafa;color:#80cbc4;&quot; class=&quot;language-python &quot;&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# Merge LoRA adapters into base model (optional)
&lt;&#x2F;span&gt;&lt;span&gt;merged_model &lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;model&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;merge_and_unload&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;()
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;merged_model&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color:#6182b8;&quot;&gt;save_pretrained&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;(&amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#91b859;&quot;&gt;.&#x2F;final-model&lt;&#x2F;span&gt;&lt;span style=&quot;color:#39adb5;&quot;&gt;&amp;quot;)
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# Or keep separate (more flexible)
&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#ccd7da;&quot;&gt;# Just load base + adapters as shown in test_finetuned.py
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h3 id=&quot;6-advanced-topics-to-explore&quot;&gt;6. Advanced Topics to Explore&lt;a class=&quot;zola-anchor&quot; href=&quot;#6-advanced-topics-to-explore&quot; aria-label=&quot;Anchor link for: 6-advanced-topics-to-explore&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Quantization&lt;&#x2F;strong&gt; (4-bit&#x2F;8-bit for smaller memory)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Multi-GPU training&lt;&#x2F;strong&gt; (distributed training)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Custom loss functions&lt;&#x2F;strong&gt; (for specialized tasks)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Reinforcement Learning from Human Feedback (RLHF)&lt;&#x2F;strong&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Retrieval-Augmented Generation (RAG)&lt;&#x2F;strong&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;hr &#x2F;&gt;
&lt;h2 id=&quot;additional-resources&quot;&gt;Additional Resources&lt;a class=&quot;zola-anchor&quot; href=&quot;#additional-resources&quot; aria-label=&quot;Anchor link for: additional-resources&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h2&gt;
&lt;h3 id=&quot;documentation&quot;&gt;Documentation&lt;a class=&quot;zola-anchor&quot; href=&quot;#documentation&quot; aria-label=&quot;Anchor link for: documentation&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;https:&#x2F;&#x2F;huggingface.co&#x2F;docs&#x2F;transformers&quot;&gt;HuggingFace Transformers&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;https:&#x2F;&#x2F;huggingface.co&#x2F;docs&#x2F;peft&quot;&gt;PEFT Documentation&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;https:&#x2F;&#x2F;huggingface.co&#x2F;docs&#x2F;trl&quot;&gt;TRL Documentation&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;https:&#x2F;&#x2F;pytorch.org&#x2F;docs&quot;&gt;PyTorch Documentation&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;tutorials&quot;&gt;Tutorials&lt;a class=&quot;zola-anchor&quot; href=&quot;#tutorials&quot; aria-label=&quot;Anchor link for: tutorials&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;https:&#x2F;&#x2F;huggingface.co&#x2F;docs&#x2F;transformers&#x2F;training&quot;&gt;HuggingFace Fine-tuning Guide&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2106.09685&quot;&gt;LoRA Paper&lt;&#x2F;a&gt; (for deep understanding)&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;community&quot;&gt;Community&lt;a class=&quot;zola-anchor&quot; href=&quot;#community&quot; aria-label=&quot;Anchor link for: community&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;https:&#x2F;&#x2F;discuss.huggingface.co&#x2F;&quot;&gt;HuggingFace Forums&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;https:&#x2F;&#x2F;discuss.pytorch.org&#x2F;&quot;&gt;PyTorch Forums&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;hr &#x2F;&gt;
&lt;h2 id=&quot;glossary&quot;&gt;Glossary&lt;a class=&quot;zola-anchor&quot; href=&quot;#glossary&quot; aria-label=&quot;Anchor link for: glossary&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Adapter&lt;&#x2F;strong&gt;: Small trainable layers added to a frozen model (LoRA&#x27;s approach)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Batch Size&lt;&#x2F;strong&gt;: Number of examples processed together&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Epoch&lt;&#x2F;strong&gt;: One complete pass through all training data&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Fine-tuning&lt;&#x2F;strong&gt;: Adapting a pre-trained model to specific tasks&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Gradient Accumulation&lt;&#x2F;strong&gt;: Process multiple batches before updating weights&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Inference&lt;&#x2F;strong&gt;: Using a model to generate predictions&#x2F;responses&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;LoRA&lt;&#x2F;strong&gt;: Low-Rank Adaptation, efficient fine-tuning method&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Loss&lt;&#x2F;strong&gt;: Measure of how wrong the model&#x27;s predictions are (lower = better)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Temperature&lt;&#x2F;strong&gt;: Controls randomness in generation (0=deterministic, 1+=creative)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Token&lt;&#x2F;strong&gt;: Smallest unit of text the model processes (~0.75 words)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Tokenizer&lt;&#x2F;strong&gt;: Converts text to tokens and vice versa&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;hr &#x2F;&gt;
&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;a class=&quot;zola-anchor&quot; href=&quot;#conclusion&quot; aria-label=&quot;Anchor link for: conclusion&quot;&gt;&lt;i class=&quot;fas fa-link&quot;&gt;&lt;&#x2F;i&gt;&lt;&#x2F;a&gt; 
&lt;&#x2F;h2&gt;
&lt;p&gt;You&#x27;ve now learned the complete workflow for LLM development:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;✅ Running pre-trained models&lt;&#x2F;li&gt;
&lt;li&gt;✅ Preparing custom training data&lt;&#x2F;li&gt;
&lt;li&gt;✅ Fine-tuning with LoRA&lt;&#x2F;li&gt;
&lt;li&gt;✅ Testing and validation&lt;&#x2F;li&gt;
&lt;li&gt;✅ Comparing results&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;&lt;strong&gt;Remember:&lt;&#x2F;strong&gt; The quality of your training data matters more than complex hyperparameters. Start with clear, specific, unique examples, and iterate from there.&lt;&#x2F;p&gt;
&lt;p&gt;Happy training! 🚀&lt;&#x2F;p&gt;
&lt;hr &#x2F;&gt;
&lt;p&gt;&lt;em&gt;Last updated: 2024&lt;&#x2F;em&gt;
&lt;em&gt;Project: SmolLM3 Training Guide&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
</content>
	</entry>
	<entry xml:lang="en">
		<title>Hello, world</title>
		<published>2024-10-18T00:00:00+00:00</published>
		<updated>2024-10-18T00:00:00+00:00</updated>
		<link href="https://denys.dev/hello-world/"/>
		<link rel="alternate" href="https://denys.dev/hello-world/" type="text/html"/>
		<id>https://denys.dev/hello-world/</id>
        <summary type="html">&lt;p&gt;Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore
eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt
in culpa qui officia deserunt mollit anim id est laborum.&lt;&#x2F;p&gt;
</summary>
		<content type="html">&lt;p&gt;Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore
eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt
in culpa qui officia deserunt mollit anim id est laborum.&lt;&#x2F;p&gt;
&lt;span id=&quot;continue-reading&quot;&gt;&lt;&#x2F;span&gt;
&lt;p&gt;Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore
eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt
in culpa qui officia deserunt mollit anim id est laborum.&lt;&#x2F;p&gt;
&lt;p&gt;:)&lt;&#x2F;p&gt;
</content>
	</entry>
</feed>
