In this post, we'll briefly learn what a system prompt is, why it is the most powerful lever for controlling LLM behaviour, and how to craft effective system prompts for a variety of real-world scenarios in Python. The tutorial covers:
- What is a System Prompt?
- How System Prompts Work
- Installation and Setup
- Setting Tone and Persona
- Constraining the Output Format
- Restricting the Topic Domain
- Controlling Response Length and Style
- Chaining System and Few-Shot Prompts
- Conclusion
Let's get started.
What is a System Prompt?
A system prompt is a special instruction passed to a large language model at the
start of a conversation, before any user message, using the
system
role. Unlike user messages, the system prompt is not part of the visible dialogue — it
acts as a hidden configuration layer that shapes everything the model produces. You can
think of it as the rulebook the model reads before speaking a single word.
System prompts are the primary mechanism for turning a general-purpose LLM into a specialised assistant. The same underlying model can behave as a friendly customer support agent, a strict JSON formatter, a domain-specific expert, or a creative storyteller — purely by changing the system prompt. No fine-tuning or retraining is required.
How System Prompts Work
Modern chat LLMs are trained to follow a structured conversation format in which each
message has a role. The three standard roles are
system,
user,
and assistant.
The model is trained to treat the system role with high priority — instructions placed
there are followed more reliably than the same instructions placed inside a user
message.
| Role | Who Writes It | Purpose |
|---|---|---|
system |
Developer / application | Set persona, rules, format, and constraints |
user |
End user | Ask questions or give instructions |
assistant |
LLM (or seeded by developer) | Respond to the user; can be pre-seeded for few-shot examples |
In all the examples below we use Ollama with the
llama3.2
model so that everything runs locally at no cost. The same code works unchanged with
the OpenAI, Anthropic, or Google Gemini APIs by swapping the client — the system
prompt pattern is identical across all providers.
Installation and Setup
Install Ollama from ollama.com and pull the model, then install the Python client. If you prefer to use the OpenAI API instead, replace the client initialisation and model name accordingly.
# Terminal – pull the model once
# ollama pull llama3.2
pip install ollama
import ollama
def ask(system: str, user: str, temperature: float = 0.7) -> str:
"""Helper: send a system + user message pair and return the reply."""
response = ollama.chat(
model="llama3.2",
messages=[
{"role": "system", "content": system},
{"role": "user", "content": user}
],
options={"temperature": temperature}
)
return response["message"]["content"]
We define a small helper function
ask()
that accepts a system prompt, a user message, and an optional temperature value. All
examples in the sections below call this helper so we can focus on the prompt itself
rather than the API boilerplate.
Setting Tone and Persona
One of the most common uses of a system prompt is assigning the model a specific persona and tone. The same user question receives a completely different answer depending on the role the model is instructed to play. Below we send the same question to three different personas.
import ollama
def ask(system: str, user: str, temperature: float = 0.7) -> str:
response = ollama.chat(
model="llama3.2",
messages=[
{"role": "system", "content": system},
{"role": "user", "content": user}
],
options={"temperature": temperature}
)
return response["message"]["content"]
question = "What is a neural network?"
personas = {
"Expert Engineer": (
"You are a senior machine learning engineer. Give technically precise "
"answers using correct terminology. Assume the reader has a strong "
"programming background."
),
"Primary School Teacher": (
"You are a friendly primary school teacher explaining concepts to "
"10-year-old children. Use simple words, short sentences, and a fun "
"analogy or two."
),
"Comedian": (
"You are a stand-up comedian who explains tech concepts through jokes "
"and witty one-liners. Keep it funny but still factually correct."
),
}
for persona, system_prompt in personas.items():
reply = ask(system_prompt, question)
print(f"=== {persona} ===")
print(reply)
print()
Output:
=== Expert Engineer ===
A neural network is a parameterised function composed of stacked affine
transformations interleaved with non-linear activation functions. During
training, gradients are back-propagated through the computational graph
to minimise a scalar loss via stochastic gradient descent.
=== Primary School Teacher ===
Imagine your brain has millions of tiny switches that all talk to each
other. When you learn something new, some switches turn on and others
turn off. A neural network is a computer program that works the same
way — it has lots of tiny switches that learn from examples!
=== Comedian ===
A neural network? Oh, it's basically a toddler that eats a million cat
photos for breakfast and then confidently tells you it can recognise a
dog. We call this "intelligence." The toddler calls it Tuesday.
All three answers are factually grounded yet dramatically different in style, vocabulary, and tone — demonstrating how a single system prompt instruction can reshape the entire character of the model's output.
Constraining the Output Format
System prompts are the most reliable way to enforce a specific output structure. By instructing the model to respond only in a defined format — such as JSON, Markdown, or a numbered list — you produce machine-parseable output that integrates cleanly into downstream pipelines.
import ollama
import json
def ask(system: str, user: str, temperature: float = 0.0) -> str:
response = ollama.chat(
model="llama3.2",
messages=[
{"role": "system", "content": system},
{"role": "user", "content": user}
],
options={"temperature": temperature}
)
return response["message"]["content"]
system_prompt = """
You are a data extraction assistant. Your output must always be valid JSON
and nothing else — no markdown fences, no preamble, no explanation.
Return a JSON object with these exact keys:
- "name" : full name of the person (string)
- "role" : their job title (string)
- "company" : company name (string)
- "skills" : list of technical skills mentioned (list of strings)
- "years_exp" : years of experience as an integer, or null if not stated
"""
text = """
Hi, my name is Sarah Chen. I'm a senior data scientist at QuantumLeap AI
with over eight years of experience. I work mainly with Python, PyTorch,
and SQL, and I've recently been exploring large language models and
vector databases.
"""
raw = ask(system_prompt, text, temperature=0.0)
data = json.loads(raw)
print(json.dumps(data, indent=2))
Output:
{
"name": "Sarah Chen",
"role": "Senior Data Scientist",
"company": "QuantumLeap AI",
"skills": [
"Python",
"PyTorch",
"SQL"
],
"years_exp": 8
}
Setting temperature=0.0
makes the model deterministic, which is important when you need consistent,
parseable output. The instruction "no markdown fences, no preamble" prevents
the model from wrapping the JSON in backtick code blocks, which would cause
json.loads()
to fail.
Restricting the Topic Domain
In production applications it is often necessary to keep the model within a specific knowledge domain and refuse off-topic requests gracefully. The system prompt can define both what the assistant will help with and what it should politely decline.
import ollama
def ask(system: str, user: str, temperature: float = 0.7) -> str:
response = ollama.chat(
model="llama3.2",
messages=[
{"role": "system", "content": system},
{"role": "user", "content": user}
],
options={"temperature": temperature}
)
return response["message"]["content"]
system_prompt = """
You are a Python programming assistant for DataTechNotes.
You help users with Python code, libraries, debugging, and best practices.
Rules:
- Only answer questions related to Python programming and data science.
- If asked about any other topic, politely decline and redirect the user
back to Python or data science questions.
- Always provide a code example when answering a technical question.
- Keep answers concise and practical.
"""
questions = [
"How do I remove duplicates from a Python list?",
"What is the best recipe for chocolate cake?",
"How do I group a pandas DataFrame by multiple columns?",
]
for q in questions:
print(f"User : {q}")
print(f"Model : {ask(system_prompt, q)}")
print()
Output:
User : How do I remove duplicates from a Python list?
Model : Use dict.fromkeys() to preserve order, or a set if order doesn't matter.
# Preserve insertion order
my_list = [3, 1, 2, 1, 3, 4]
unique = list(dict.fromkeys(my_list))
print(unique) # [3, 1, 2, 4]
# Order not important
unique = list(set(my_list))
User : What is the best recipe for chocolate cake?
Model : That's outside my area! I'm here to help with Python programming
and data science questions. Feel free to ask me anything on those
topics — for example, how to visualise data with matplotlib or
how to build a machine learning model with scikit-learn.
User : How do I group a pandas DataFrame by multiple columns?
Model : Use groupby() with a list of column names.
import pandas as pd
df = pd.DataFrame({
"region": ["East", "East", "West", "West"],
"product": ["A", "B", "A", "B"],
"sales": [100, 200, 150, 300]
})
result = df.groupby(["region", "product"])["sales"].sum().reset_index()
print(result) The model correctly answers the two Python questions, provides code examples as instructed, and gracefully declines the off-topic cake question while steering the user back to the supported domain — all purely through system prompt instructions.
Controlling Response Length and Style
LLMs tend to produce verbose answers by default. You can tightly control verbosity, reading level, and formatting style through explicit instructions in the system prompt. The example below demonstrates three different length and style directives applied to the same question.
import ollama
def ask(system: str, user: str, temperature: float = 0.5) -> str:
response = ollama.chat(
model="llama3.2",
messages=[
{"role": "system", "content": system},
{"role": "user", "content": user}
],
options={"temperature": temperature}
)
return response["message"]["content"]
question = "What is gradient descent?"
styles = {
"One-liner": (
"Answer every question in exactly one sentence. No more, no less."
),
"Bullet Summary": (
"Answer every question as a bullet-point list of exactly four items. "
"Each bullet must be one sentence. No introduction, no conclusion."
),
"ELI5 (Explain Like I'm 5)": (
"Explain everything as if talking to a curious five-year-old. "
"Use a simple real-world analogy. Maximum three sentences."
),
}
for style_name, system_prompt in styles.items():
print(f"=== {style_name} ===")
print(ask(system_prompt, question))
print()
Outputs:
=== One-liner ===
Gradient descent is an optimisation algorithm that iteratively adjusts
a model's parameters in the direction that most reduces the loss function.
=== Bullet Summary ===
• Gradient descent is an algorithm used to minimise a loss function
during model training.
• It computes the gradient of the loss with respect to each parameter.
• Parameters are updated by stepping in the opposite direction of the
gradient, scaled by a learning rate.
• The process repeats until the loss converges to a minimum.
=== ELI5 (Explain Like I'm 5) ===
Imagine you're blindfolded on a hilly field and want to find the lowest
valley. You feel the ground with your foot, figure out which way is
downhill, and take a small step that way. Gradient descent does exactly
that — it keeps taking tiny steps downhill until it finds the lowest
point it can reach. Notice how precisely the model respects the length constraints. The one-liner produces exactly one sentence, the bullet summary produces exactly four bullets, and the ELI5 stays within three sentences — all enforced by the system prompt alone.
Chaining System and Few-Shot Prompts
Few-shot prompting extends the system prompt by including one or more example
user /
assistant
message pairs in the conversation history before the real user query. These examples
demonstrate the exact output pattern you expect, making the model far more consistent
on structured tasks such as classification, extraction, or templated generation.
import ollama
import json
system_prompt = """
You are a sentiment classifier for product reviews.
Classify each review into exactly one of: Positive, Negative, or Neutral.
Respond only with valid JSON in this format: {"label": "...", "confidence": "high|medium|low"}
No extra text.
"""
# Few-shot examples seeded into the conversation history
few_shot = [
{"role": "user", "content": "This laptop is absolutely fantastic, best purchase I've made!"},
{"role": "assistant", "content": '{"label": "Positive", "confidence": "high"}'},
{"role": "user", "content": "The battery died after two hours. Completely useless."},
{"role": "assistant", "content": '{"label": "Negative", "confidence": "high"}'},
{"role": "user", "content": "It arrived on time and works as described."},
{"role": "assistant", "content": '{"label": "Neutral", "confidence": "medium"}'},
]
reviews = [
"Worst product I have ever bought. Broke on the first day.",
"Decent quality for the price, nothing special though.",
"I am obsessed with this — it has changed my morning routine!",
"Packaging was damaged but the item itself seems fine.",
]
for review in reviews:
messages = (
[{"role": "system", "content": system_prompt}]
+ few_shot
+ [{"role": "user", "content": review}]
)
response = ollama.chat(
model="llama3.2",
messages=messages,
options={"temperature": 0.0}
)
raw = response["message"]["content"]
result = json.loads(raw)
print(f"Review : {review}")
print(f"Label : {result['label']} | Confidence: {result['confidence']}")
print() Output:
Review : Worst product I have ever bought. Broke on the first day.
Label : Negative | Confidence: high
Review : Decent quality for the price, nothing special though.
Label : Neutral | Confidence: medium
Review : I am obsessed with this — it has changed my morning routine!
Label : Positive | Confidence: high
Review : Packaging was damaged but the item itself seems fine.
Label : Neutral | Confidence: medium
The combination of a strict system prompt and three few-shot examples produces clean, parseable JSON for every review with no extra text, no markdown, and no hallucinated keys. The few-shot examples teach the model the exact schema and label vocabulary in a way that rules alone sometimes cannot.
Conclusion
In this post, we briefly learned what a system prompt is and how it acts as the primary control layer for shaping LLM behaviour. We covered setting tone and persona, enforcing structured output formats, restricting topic domains, controlling response length and style, and combining system prompts with few-shot examples for highly consistent outputs — all using a local Llama 3.2 model via Ollama. Mastering system prompts is one of the highest-leverage skills in applied LLM development: a well-crafted system prompt can eliminate the need for fine-tuning in many practical scenarios.
No comments:
Post a Comment