DataTechNotes: Few-Shot Prompting with LLMs in Python

In this post, we'll briefly learn what few-shot prompting is, how it works, and how to apply it to real-world NLP tasks to produce more accurate and consistent outputs from a large language model in Python. The tutorial covers:

What is Few-Shot Prompting?
How Few-Shot Prompting Works
Installation and Setup
Zero-Shot vs Few-Shot Comparison
Few-Shot Text Classification
Few-Shot Named Entity Extraction
Few-Shot Structured JSON Output
Few-Shot Style Transfer
Choosing the Right Number of Examples
Conclusion

Let's get started.

What is Few-Shot Prompting?

Few-shot prompting is a technique in which you include a small number of worked examples — typically two to eight input-output pairs — directly inside the prompt before presenting the real task to the model. By seeing concrete demonstrations of the desired behaviour, the model infers the pattern and applies it to the new input without any weight updates or fine-tuning. The examples act as an in-context specification of exactly what format, vocabulary, tone, and reasoning style you expect.

Few-shot prompting sits between two other strategies on the prompting spectrum. Zero-shot prompting gives the model only instructions, relying entirely on its pre-trained knowledge. Fine-tuning permanently updates the model weights on thousands of labelled examples. Few-shot prompting is the practical middle ground: it costs only a few extra input tokens per request and requires no training infrastructure, yet consistently outperforms zero-shot on structured, domain-specific, or format-sensitive tasks.

How Few-Shot Prompting Works

In the chat-message format used by modern LLMs, few-shot examples are injected as alternating user and assistant messages placed between the system prompt and the real user query. The model sees these prior exchanges as if they are part of the conversation history and continues the pattern naturally into the next reply.

Prompting Strategy	Examples Provided	Training Required	Best For
Zero-shot	0	None	Simple, well-known tasks
One-shot	1	None	Format demonstration
Few-shot	2 – 8	None	Structured, domain-specific tasks
Fine-tuning	Hundreds to thousands	Yes — GPU training	High-volume, specialised tasks

The quality of few-shot examples matters as much as their quantity. Each example should be representative of the real input distribution, demonstrate the exact output format you want, and cover edge cases where zero-shot is likely to fail. Well-chosen examples are more valuable than a large number of generic ones.

Installation and Setup

All examples in this tutorial use Ollama with the llama3.2 model running locally. Install Ollama from ollama.com, pull the model, and install the Python client. The few-shot message format is identical across the OpenAI, Anthropic, and Google Gemini APIs — only the client initialisation differs.


# Terminal — pull the model once
# ollama pull llama3.2

# pip install ollama

import ollama

def chat(
    system: str,
    examples: list[tuple[str, str]],
    user_input: str,
    temperature: float = 0.0,
    max_tokens: int = 200
) -> str:
    """
    Build a few-shot message list and return the model reply.

    Parameters
    ----------
    system     : The system prompt defining the task and rules.
    examples   : List of (user_input, assistant_output) demonstration pairs.
    user_input : The real query to answer.
    temperature: Sampling temperature (0.0 for structured tasks).
    max_tokens : Maximum tokens to generate.
    """
    messages = [{"role": "system", "content": system}]

    for user_ex, assistant_ex in examples:
        messages.append({"role": "user",      "content": user_ex})
        messages.append({"role": "assistant", "content": assistant_ex})

    messages.append({"role": "user", "content": user_input})

    response = ollama.chat(
        model="llama3.2",
        messages=messages,
        options={"temperature": temperature, "num_predict": max_tokens}
    )
    return response["message"]["content"].strip()

The helper function chat() takes a system prompt, a list of (user, assistant) example tuples, and the real query, then assembles the full message list automatically. All sections below call this helper so the focus stays on the examples rather than the API boilerplate.

Zero-Shot vs Few-Shot Comparison

The clearest way to appreciate few-shot prompting is to run the same task with zero examples and then with a handful of examples and compare the outputs side by side. We use a product review tone classifier that must return one of three labels: Positive, Negative, or Mixed. The zero-shot version gives only instructions; the few-shot version also gives three labelled examples.


import ollama

def chat(system, examples, user_input, temperature=0.0, max_tokens=50):
    messages = [{"role": "system", "content": system}]
    for user_ex, assistant_ex in examples:
        messages.append({"role": "user",      "content": user_ex})
        messages.append({"role": "assistant", "content": assistant_ex})
    messages.append({"role": "user", "content": user_input})
    response = ollama.chat(
        model="llama3.2",
        messages=messages,
        options={"temperature": temperature, "num_predict": max_tokens}
    )
    return response["message"]["content"].strip()

system = (
    "Classify the tone of a product review as exactly one of: "
    "Positive, Negative, or Mixed. Reply with the label only."
)

examples = [
    ("The battery life is incredible but the screen cracked after a week.",
     "Mixed"),
    ("Absolutely love this product — fast shipping and great quality!",
     "Positive"),
    ("Stopped working after two days. Complete waste of money.",
     "Negative"),
]

reviews = [
    "Decent product overall but the instructions were confusing.",
    "Best purchase I have made this year — highly recommend!",
    "Arrived damaged and customer support never replied.",
    "Good value for the price, though delivery took longer than expected.",
]

print(f"{'Review':<55} {'Zero-Shot':<12} {'Few-Shot'}")
print("-" * 85)

for review in reviews:
    zero = chat(system, examples=[], user_input=review)
    few  = chat(system, examples=examples, user_input=review)
    print(f"{review[:54]:<55} {zero:<12} {few}")

Output:


 Review                                                  Zero-Shot    Few-Shot
 -------------------------------------------------------------------------------------
 Decent product overall but the instructions were conf…  Neutral      Mixed
 Best purchase I have made this year — highly recommend  Positive     Positive
 Arrived damaged and customer support never replied.     Negative     Negative
 Good value for the price, though delivery took longer…  Neutral      Mixed

The zero-shot model invents the label Neutral — which was never in the allowed set — because it relies on its general knowledge of sentiment analysis rather than the three labels defined in the system prompt. The few-shot model stays within the specified label set throughout and correctly identifies the ambivalent reviews as Mixed rather than hallucinating a fourth category.

Few-Shot Text Classification

Few-shot prompting excels at multi-class text classification where the label set is custom or the boundaries between classes are subtle. Below we build a support ticket classifier that routes tickets into one of four departments. Three examples per message type would require fine-tuning to encode reliably, but three few-shot examples teach the pattern instantly.


import ollama

def chat(system, examples, user_input, temperature=0.0, max_tokens=20):
    messages = [{"role": "system", "content": system}]
    for user_ex, assistant_ex in examples:
        messages.append({"role": "user",      "content": user_ex})
        messages.append({"role": "assistant", "content": assistant_ex})
    messages.append({"role": "user", "content": user_input})
    response = ollama.chat(
        model="llama3.2",
        messages=messages,
        options={"temperature": temperature, "num_predict": max_tokens}
    )
    return response["message"]["content"].strip()

system = (
    "You are a support ticket router. Classify each ticket into exactly "
    "one department: Billing, Technical, Returns, or General. "
    "Reply with the department name only."
)

examples = [
    ("I was charged twice for my last order.",                      "Billing"),
    ("My app crashes every time I try to log in on Android.",       "Technical"),
    ("I want to send back the headphones I bought last week.",      "Returns"),
    ("Do you ship internationally?",                                "General"),
    ("The discount code SAVE20 is not working at checkout.",        "Billing"),
    ("My smart speaker is not connecting to my Wi-Fi network.",     "Technical"),
]

tickets = [
    "I received the wrong item and want a refund.",
    "How do I reset my account password?",
    "There is an extra charge on my invoice I don't recognise.",
    "The LED on my device stays red and never turns green.",
    "What are your store opening hours?",
    "I never received my order from three weeks ago.",
]

print(f"{'Ticket':<55} {'Department'}")
print("-" * 70)
for ticket in tickets:
    label = chat(system, examples, ticket)
    print(f"{ticket:<55} {label}")

Output:


 Ticket                                                  Department
 ----------------------------------------------------------------------
 I received the wrong item and want a refund.            Returns
 How do I reset my account password?                     Technical
 There is an extra charge on my invoice I don't recog…   Billing
 The LED on my device stays red and never turns green.   Technical
 What are your store opening hours?                      General
 I never received my order from three weeks ago.         Returns

All six tickets are correctly routed with only six examples in the prompt — fewer examples than there are classes, yet the model generalises cleanly. The max_tokens=20 budget is tight enough to prevent explanatory text from appearing alongside the label.

Few-Shot Named Entity Extraction

Named entity extraction with a custom schema benefits enormously from few-shot examples because the entity types and output format are often project-specific and cannot be guessed from a system prompt alone. Below we extract Company, Product, and Price entities from short retail descriptions using three demonstrations to define the exact output structure.


import ollama
import json

def chat(system, examples, user_input, temperature=0.0, max_tokens=150):
    messages = [{"role": "system", "content": system}]
    for user_ex, assistant_ex in examples:
        messages.append({"role": "user",      "content": user_ex})
        messages.append({"role": "assistant", "content": assistant_ex})
    messages.append({"role": "user", "content": user_input})
    response = ollama.chat(
        model="llama3.2",
        messages=messages,
        options={"temperature": temperature, "num_predict": max_tokens}
    )
    return response["message"]["content"].strip()

system = (
    "Extract named entities from retail text. "
    "Return only valid JSON with keys: company, product, price. "
    "Use null for any entity not mentioned. No markdown, no extra text."
)

examples = [
    (
        "Apple has launched the iPhone 16 Pro for $999.",
        '{"company": "Apple", "product": "iPhone 16 Pro", "price": "$999"}'
    ),
    (
        "Samsung's new Galaxy Watch 7 is now available at $299.",
        '{"company": "Samsung", "product": "Galaxy Watch 7", "price": "$299"}'
    ),
    (
        "Sony announced the WH-1000XM6 headphones but has not revealed pricing yet.",
        '{"company": "Sony", "product": "WH-1000XM6", "price": null}'
    ),
]

sentences = [
    "Google unveiled the Pixel 9a smartphone priced at $499.",
    "Microsoft is releasing a new Surface Laptop 7 for $1,299.",
    "NVIDIA announced the RTX 5090 GPU with no price disclosed yet.",
    "The new Dyson V16 vacuum cleaner retails for $849.",
]

for sentence in sentences:
    raw    = chat(system, examples, sentence)
    result = json.loads(raw)
    print(f"Text    : {sentence}")
    print(f"Company : {result['company']}")
    print(f"Product : {result['product']}")
    print(f"Price   : {result['price']}")
    print()

Output:


 Text    : Google unveiled the Pixel 9a smartphone priced at $499.
 Company : Google
 Product : Pixel 9a
 Price   : $499

 Text    : Microsoft is releasing a new Surface Laptop 7 for $1,299.
 Company : Microsoft
 Product : Surface Laptop 7
 Price   : $1,299

 Text    : NVIDIA announced the RTX 5090 GPU with no price disclosed yet.
 Company : NVIDIA
 Product : RTX 5090
 Price   : null

 Text    : The new Dyson V16 vacuum cleaner retails for $849.
 Company : Dyson
 Product : V16
 Price   : $849

The model correctly handles all four sentences including the case where no price is mentioned — returning null exactly as demonstrated in the third example. Every response parses cleanly with json.loads() with zero post-processing.

Few-Shot Structured JSON Output

When a pipeline requires multi-field structured output, few-shot examples are the most reliable way to lock in the exact schema. Below we build a job-posting parser that extracts six fields from free-form text. The three examples teach the model the full schema including the correct handling of list fields and missing values.


import ollama
import json

def chat(system, examples, user_input, temperature=0.0, max_tokens=250):
    messages = [{"role": "system", "content": system}]
    for user_ex, assistant_ex in examples:
        messages.append({"role": "user",      "content": user_ex})
        messages.append({"role": "assistant", "content": assistant_ex})
    messages.append({"role": "user", "content": user_input})
    response = ollama.chat(
        model="llama3.2",
        messages=messages,
        options={"temperature": temperature, "num_predict": max_tokens}
    )
    return response["message"]["content"].strip()

system = (
    "Parse job postings into structured JSON. "
    "Keys: title, company, location, salary, skills (list), remote (bool). "
    "Use null for missing fields. Return JSON only — no markdown, no preamble."
)

examples = [
    (
        "DataTech is hiring a Senior Python Developer in Berlin. "
        "Salary: €80,000–€100,000. Must know Python, FastAPI, and PostgreSQL. "
        "Fully remote.",
        json.dumps({
            "title":    "Senior Python Developer",
            "company":  "DataTech",
            "location": "Berlin",
            "salary":   "€80,000–€100,000",
            "skills":   ["Python", "FastAPI", "PostgreSQL"],
            "remote":   True
        })
    ),
    (
        "QuantumAI is looking for a Machine Learning Engineer in San Francisco. "
        "Compensation: $130,000–$160,000/year. "
        "Required: PyTorch, Python, MLOps, Kubernetes. On-site only.",
        json.dumps({
            "title":    "Machine Learning Engineer",
            "company":  "QuantumAI",
            "location": "San Francisco",
            "salary":   "$130,000–$160,000/year",
            "skills":   ["PyTorch", "Python", "MLOps", "Kubernetes"],
            "remote":   False
        })
    ),
    (
        "CloudBase needs a DevOps Engineer. Experience with Docker, Terraform, "
        "and AWS required. Salary not disclosed. Location: London. Hybrid.",
        json.dumps({
            "title":    "DevOps Engineer",
            "company":  "CloudBase",
            "location": "London",
            "salary":   None,
            "skills":   ["Docker", "Terraform", "AWS"],
            "remote":   False
        })
    ),
]

postings = [
    "NeuralWorks is hiring a Data Scientist in Amsterdam. Salary: €70,000. "
    "Skills needed: Python, scikit-learn, SQL, Tableau. Remote-friendly.",

    "StreamFlow seeks a Backend Engineer in Toronto. Pay: CAD 110,000–130,000. "
    "Must have Go, gRPC, Redis, and PostgreSQL experience. Fully remote.",
]

for posting in postings:
    raw    = chat(system, examples, posting)
    result = json.loads(raw)
    print(json.dumps(result, indent=2))
    print()

Output:


 {
   "title": "Data Scientist",
   "company": "NeuralWorks",
   "location": "Amsterdam",
   "salary": "€70,000",
   "skills": ["Python", "scikit-learn", "SQL", "Tableau"],
   "remote": true
 }

 {
   "title": "Backend Engineer",
   "company": "StreamFlow",
   "location": "Toronto",
   "salary": "CAD 110,000–130,000",
   "skills": ["Go", "gRPC", "Redis", "PostgreSQL"],
   "remote": true
 }

Both postings parse into the exact six-field schema with correct types — skills as a list, remote as a boolean — directly inferred from the three examples. Adding a fourth example that covers salary ranges in a different currency, or a fully on-site role, would further harden the extractor against edge cases.

Few-Shot Style Transfer

Style transfer — rewriting text to match a specific tone, voice, or format — is one of the most creative applications of few-shot prompting. Instead of describing the target style in abstract terms, examples demonstrate it precisely. Below we rewrite plain technical sentences into an engaging blog voice using three demonstrations to define the style.


import ollama

def chat(system, examples, user_input, temperature=0.7, max_tokens=120):
    messages = [{"role": "system", "content": system}]
    for user_ex, assistant_ex in examples:
        messages.append({"role": "user",      "content": user_ex})
        messages.append({"role": "assistant", "content": assistant_ex})
    messages.append({"role": "user", "content": user_input})
    response = ollama.chat(
        model="llama3.2",
        messages=messages,
        options={"temperature": temperature, "num_predict": max_tokens}
    )
    return response["message"]["content"].strip()

system = (
    "Rewrite the given technical sentence into an engaging, conversational "
    "blog style. Keep it to one or two sentences. Preserve the facts exactly."
)

examples = [
    (
        "Gradient descent is an optimisation algorithm that minimises a "
        "loss function by iteratively updating model parameters.",
        "Think of gradient descent as a hiker lost in the fog — it can't see "
        "the full mountain, but it always takes one careful step downhill until "
        "it finds the valley."
    ),
    (
        "A convolutional neural network applies learnable filters to input "
        "images to extract spatial features hierarchically.",
        "CNNs are basically very enthusiastic pattern detectors — they scan "
        "your photo at every scale, hunting for edges, then shapes, then "
        "full objects, layer by layer."
    ),
    (
        "Tokenisation splits raw text into sub-word units that a language "
        "model processes as discrete numerical inputs.",
        "Before an LLM reads a single word, it runs text through a blender — "
        "chopping sentences into bite-sized token pieces that numbers can "
        "actually describe."
    ),
]

sentences = [
    "Dropout randomly deactivates neurons during training to prevent overfitting.",
    "Transformers use self-attention to weigh the relevance of every token "
    "against every other token in a sequence.",
    "Retrieval-augmented generation combines a language model with an external "
    "knowledge base to produce grounded, factual responses.",
]

for sentence in sentences:
    rewrite = chat(system, examples, sentence)
    print(f"Original : {sentence}")
    print(f"Blog     : {rewrite}")
    print()

Output:


Original : Dropout randomly deactivates neurons during training to prevent overfitting.
Blog     : Dropout is the neural network equivalent of a study group where half
           the members randomly call in sick — and somehow the team performs
           better for it.

Original : Transformers use self-attention to weigh the relevance of every token
           against every other token in a sequence.
Blog     : Imagine every word in a sentence turning to face every other word and
           asking, "How much should I care about you right now?" — that is
           self-attention in one question.

Original : Retrieval-augmented generation combines a language model with an
           external knowledge base to produce grounded, factual responses.
Blog     : RAG is what happens when you give an LLM a library card — instead
           of guessing, it looks things up first and then writes from what
           it actually found.

The model has inferred the target style from the examples — concrete analogies, conversational dashes, and a slightly playful voice — and applies it consistently to all three new sentences. Without the examples the same system prompt would produce competent but generic rewrites. The examples supply the voice fingerprint that no written description can fully capture.

Choosing the Right Number of Examples

More examples are not always better. Each demonstration consumes input tokens, adds latency, and increases cost — and returns diminish quickly beyond a small number of well-chosen examples. The right count depends on the task complexity, the desired output format, and the model size. The table below provides practical starting points.

Task Type	Recommended Examples	Reason
Binary classification	1 – 2 per class	Simple decision boundary; one of each label suffices
Multi-class classification	1 – 2 per class	Cover every label in the schema at least once
JSON / structured extraction	2 – 4	Demonstrate schema including null / edge cases
Style transfer / rewriting	3 – 5	More examples establish the voice fingerprint better
Complex reasoning / chain-of-thought	3 – 8	Each step of reasoning must be modelled explicitly
Code generation	2 – 4	Show expected style, naming, and docstring conventions

As a practical rule, start with two to three examples and add more only if the model still produces incorrect format or labels. Always vary the examples to cover different surface forms of the input — a classifier trained on three identical-looking positives will fail on anything that looks slightly different. If accuracy plateaus below your threshold after six to eight examples, consider fine-tuning instead.

Conclusion

In this post, we briefly learned what few-shot prompting is and how it improves LLM output consistency and accuracy by including labelled examples directly inside the prompt. We compared zero-shot and few-shot performance on a tone classifier, built a support ticket router, extracted custom named entities into JSON, parsed multi-field job postings into a structured schema, and applied few-shot style transfer to rewrite technical sentences into a conversational blog voice. Few-shot prompting is one of the most cost-effective techniques in the prompt-engineering toolkit — it requires no training data pipeline, no GPU time, and no model changes, yet delivers results that rival fine-tuned models on many structured tasks.

DataTechNotes

Pages

Few-Shot Prompting with LLMs in Python

What is Few-Shot Prompting?

How Few-Shot Prompting Works

Installation and Setup

Zero-Shot vs Few-Shot Comparison

Few-Shot Text Classification

Few-Shot Named Entity Extraction

Few-Shot Structured JSON Output

Few-Shot Style Transfer

Choosing the Right Number of Examples

Conclusion

No comments:

Post a Comment