Back to Blog
12 min read

Food Extractor: Fine-Tuning Gemma 3 270M for Structured Data Extraction

Introduction

Ever wanted to extract structured food data from messy text? In this tutorial, I walk through how I fully fine-tuned Google's Gemma 3 270M model to do exactly that — turning unstructured descriptions into clean, parseable output.

Why Fine-Tune a Small Language Model?

Before diving in, here's why SLMs (Small Language Models) are powerful:

  • Own the model — Run anywhere without API costs
  • Simple tasks work well — Smaller models excel at focused tasks
  • No API calls needed — Run completely offline
  • Batch processing — Much faster than sequential API calls
  • Task-specific — Better performance on your use case

What We're Building

A model that extracts food and drink items from text, returning structured output.

Input:

A plate of rice cakes, salmon, cottage cheese and small cherry tomatoes with a cup of tea.

Output:

food_or_drink: 1
tags: fi
foods: rice cakes, salmon, cottage cheese, cherry tomatoes
drinks: cup of tea

The Tech Stack

ComponentTool
ModelGemma 3 270M
DatasetFoodExtract-1k
TrainingTRL (Transformers Reinforcement Learning)
InferenceTransformers + Accelerate
DemoGradio

Training Process

The fine-tuning process is straightforward with Hugging Face's TRL library:

  1. Load the base Gemma 3 270M model
  2. Prepare the FoodExtract-1k dataset with proper formatting
  3. Configure the SFTTrainer with appropriate hyperparameters
  4. Train for 3 epochs (~18 minutes on T4 GPU)
python
from trl import SFTTrainer, SFTConfig

trainer = SFTTrainer(
    model=model,
    args=SFTConfig(
        output_dir="./food-extractor",
        num_train_epochs=3,
        per_device_train_batch_size=2,
        learning_rate=2e-5,
    ),
    train_dataset=dataset["train"],
    eval_dataset=dataset["test"],
)

trainer.train()

Training Results

After 3 epochs of supervised fine-tuning:

EpochTraining LossValidation LossToken Accuracy
12.172.2458.8%
21.252.2858.9%
31.072.4658.6%

Key Concepts

Full Fine-Tuning vs LoRA

  • Full Fine-Tuning — All model weights are updated (used in this project)
  • LoRA — Only adapter weights are trained (requires fewer resources)

For a small model like Gemma 3 270M, full fine-tuning is practical and yields excellent results.

Tags Dictionary

The model learns to classify content with these tags:

AbbreviationMeaning
npNutrition Panel
ilIngredient List
meMenu
reRecipe
fiFood Items
diDrink Items
faFood Advertisement
fpFood Packaging

Lessons Learned

  1. Think in tokens — Frame every problem as: "What tokens in, what tokens out?"
  2. Small models are powerful — 270M parameters is enough for structured extraction
  3. Data quality matters — The FoodExtract-1k dataset's formatting directly impacts output structure
  4. Google Colab works — A free T4 GPU can fine-tune this in under 20 minutes

Conclusion

Fine-tuning small language models democratizes AI customization. You can create task-specific models that run locally, work offline, and perform better than general-purpose APIs on your specific use case. The Food Extractor demonstrates how approachable this process has become with modern tooling.

Check out the full notebook on Google Colab to try it yourself!