Fine-Tuning Ollama for Custom AI Classifications: My Experience

Introduction

Fine-tuning AI models is crucial for improving accuracy in domain-specific tasks. I recently worked on fine-tuning Ollama to classify VoIP call responses, optimizing it for customer interactions, eligibility checks, and AI-powered sales calls.

In this blog, I’ll share my fine-tuning process, how I handled dataset preparation, and how I optimized classification performance.


1. Why Fine-Tune Ollama for Custom AI Classifications?

Ollama provides out-of-the-box generative AI, but default models struggle with niche tasks like VoIP classification. I needed to fine-tune it to:

  • Classify positive, negative, and neutral responses from VoIP calls
  • Improve accuracy in eligibility checks for insurance campaigns
  • Reduce hallucinations in AI-generated responses

2. Preparing Training Data for Fine-Tuning

I created a training dataset in JSONL format with real-world VoIP responses and their corresponding classifications.

Example: training_data.jsonl

{"prompt": "Hello, are you interested in our insurance plan?", "response": "Yes, tell me more.", "classification": "POSITIVE"}
{"prompt": "Hello, are you interested in our insurance plan?", "response": "Not now, I'm busy.", "classification": "NOT_INTERESTED"}
{"prompt": "Hello, are you interested in our insurance plan?", "response": "Stop calling me!", "classification": "DO_NOT_CALL"}

Each entry includes:

  • The prompt (AI’s initial message)
  • The response (User reply)
  • The classification (Category assigned)

3. Fine-Tuning Ollama with Custom Data

I used Ollama’s fine-tuning capabilities to adjust response patterns. Running the fine-tune command with my dataset:

ollama create my-custom-model -m llama3 --train training_data.jsonl

After training, I tested it with:

ollama run my-custom-model "Is the customer interested?"

Ollama returned:

{ "classification": "NOT_INTERESTED" }

This confirmed the model properly classified responses instead of generating random text.


4. Optimizing Classification Accuracy

To further improve results, I:

  • Expanded the dataset with more real-world VoIP responses
  • Balanced training samples to prevent bias towards certain responses
  • Added fallback responses for ambiguous classifications

5. Running Ollama in Production

Once fine-tuned, I deployed Ollama on a high-performance server with:

ollama serve --model my-custom-model --port 11434

Then integrated it with my Flask API:

import requests

response = requests.post("http://localhost:11434/api/generate", json={
    "model": "my-custom-model",
    "prompt": "Is the customer interested?"
})

print(response.json())

Final Thoughts

Fine-tuning Ollama helped me create a specialized AI classifier for VoIP interactions. Instead of generic responses, the model now:

  • Accurately classifies customer responses
  • Reduces false positives/negatives
  • Works efficiently in real-time scenarios

If you’re working on fine-tuning Ollama for specific tasks, ensure your training dataset is clean, balanced, and domain-specific.

Have experience fine-tuning AI models? Let me know in the comments!