Food Extractor: Fine-Tuning Gemma 3 270M for Structured Data Extraction
Introduction
Ever wanted to extract structured food data from messy text? In this tutorial, I walk through how I fully fine-tuned Google's Gemma 3 270M model to do exactly that — turning unstructured descriptions into clean, parseable output.
Why Fine-Tune a Small Language Model?
Before diving in, here's why SLMs (Small Language Models) are powerful:
- Own the model — Run anywhere without API costs
- Simple tasks work well — Smaller models excel at focused tasks
- No API calls needed — Run completely offline
- Batch processing — Much faster than sequential API calls
- Task-specific — Better performance on your use case
What We're Building
A model that extracts food and drink items from text, returning structured output.
Input:
A plate of rice cakes, salmon, cottage cheese and small cherry tomatoes with a cup of tea.Output:
food_or_drink: 1
tags: fi
foods: rice cakes, salmon, cottage cheese, cherry tomatoes
drinks: cup of teaThe Tech Stack
| Component | Tool |
|---|---|
| Model | Gemma 3 270M |
| Dataset | FoodExtract-1k |
| Training | TRL (Transformers Reinforcement Learning) |
| Inference | Transformers + Accelerate |
| Demo | Gradio |
Training Process
The fine-tuning process is straightforward with Hugging Face's TRL library:
- Load the base Gemma 3 270M model
- Prepare the FoodExtract-1k dataset with proper formatting
- Configure the SFTTrainer with appropriate hyperparameters
- Train for 3 epochs (~18 minutes on T4 GPU)
from trl import SFTTrainer, SFTConfig
trainer = SFTTrainer(
model=model,
args=SFTConfig(
output_dir="./food-extractor",
num_train_epochs=3,
per_device_train_batch_size=2,
learning_rate=2e-5,
),
train_dataset=dataset["train"],
eval_dataset=dataset["test"],
)
trainer.train()Training Results
After 3 epochs of supervised fine-tuning:
| Epoch | Training Loss | Validation Loss | Token Accuracy |
|---|---|---|---|
| 1 | 2.17 | 2.24 | 58.8% |
| 2 | 1.25 | 2.28 | 58.9% |
| 3 | 1.07 | 2.46 | 58.6% |
Key Concepts
Full Fine-Tuning vs LoRA
- Full Fine-Tuning — All model weights are updated (used in this project)
- LoRA — Only adapter weights are trained (requires fewer resources)
For a small model like Gemma 3 270M, full fine-tuning is practical and yields excellent results.
Tags Dictionary
The model learns to classify content with these tags:
| Abbreviation | Meaning |
|---|---|
| np | Nutrition Panel |
| il | Ingredient List |
| me | Menu |
| re | Recipe |
| fi | Food Items |
| di | Drink Items |
| fa | Food Advertisement |
| fp | Food Packaging |
Lessons Learned
- Think in tokens — Frame every problem as: "What tokens in, what tokens out?"
- Small models are powerful — 270M parameters is enough for structured extraction
- Data quality matters — The FoodExtract-1k dataset's formatting directly impacts output structure
- Google Colab works — A free T4 GPU can fine-tune this in under 20 minutes
Conclusion
Fine-tuning small language models democratizes AI customization. You can create task-specific models that run locally, work offline, and perform better than general-purpose APIs on your specific use case. The Food Extractor demonstrates how approachable this process has become with modern tooling.
Check out the full notebook on Google Colab to try it yourself!