In this post, we'll briefly learn what the Hugging Face Transformers pipeline is,
how it works, and how to apply it to common NLP tasks in Python. The tutorial covers:
- What is the Transformers Pipeline?
- Installation
- Pipeline Task Overview
- Text Classification
- Text Generation
- Question Answering
- Named Entity Recognition
- Conclusion
- Source Code Listing
Let's get started.
What is the Transformers Pipeline?
The Transformers pipeline is a high-level API provided by the Hugging Face
transformers
library that lets you run pre-trained models for common NLP tasks with just a few lines of code.
Instead of manually loading tokenizers, models, and post-processing logic, the pipeline bundles
everything into a single, unified interface. Under the hood it still uses the same powerful
transformer-based architectures — BERT, GPT-2, RoBERTa, T5, and many others — but exposes them
through a simple, task-oriented function call.
Installation
Install the required packages using pip.
The transformers
library works with both PyTorch and TensorFlow backends, we'll use PyTorch in this tutorial.
pip install transformers torch
If you also want to use the sentencepiece
tokenizer (required by some models such as T5 or mBART), install it as well.
pip install sentencepiece
Pipeline Task Overview
The pipeline()
function accepts a task string and automatically downloads a suitable default model for that
task from the Hugging Face Model Hub. You can also pass a custom
model
argument to use a specific checkpoint.
| Task String |
What It Does |
Default Model Family |
text-classification |
Labels text into categories (e.g., sentiment) |
DistilBERT |
text-generation |
Generates a continuation of a prompt |
GPT-2 |
question-answering |
Extracts an answer span from a context passage |
DistilBERT (SQuAD) |
ner |
Tags named entities (persons, orgs, locations) |
BERT (CoNLL-2003) |
summarization |
Condenses a long document into a short summary |
BART / T5 |
translation |
Translates text between languages |
Helsinki-NLP / T5 |
fill-mask |
Predicts a masked token in a sentence |
BERT / RoBERTa |
Text Classification
Text classification is one of the most common NLP tasks. The pipeline returns a predicted label and
a confidence score. By default, the sentiment analysis model classifies text as POSITIVE or
NEGATIVE.
from transformers import pipeline
# Load the text-classification pipeline
classifier = pipeline("text-classification")
texts = [
"Hugging Face makes NLP so easy and fun!",
"The model took forever to download and then crashed.",
"It was an average experience, nothing special."
]
results = classifier(texts)
for text, result in zip(texts, results):
print(f"Text : {text}")
print(f"Label : {result['label']}, Score: {result['score']:.4f}")
print()
Output:
Text : Hugging Face makes NLP so easy and fun!
Label : POSITIVE, Score: 0.9998
Text : The model took forever to download and then crashed.
Label : NEGATIVE, Score: 0.9997
Text : It was an average experience, nothing special.
Label : NEGATIVE, Score: 0.6741
The pipeline batches all three texts in a single forward pass and returns a list of dictionaries.
The score
field represents the model's confidence in its predicted label.
Text Generation
The text-generation pipeline uses an auto-regressive model (GPT-2 by default) to continue a given
prompt. You can control the length and diversity of the output via
max_new_tokens
and temperature.
from transformers import pipeline
generator = pipeline("text-generation", model="gpt2")
prompt = "Transformers in machine learning are"
output = generator(
prompt,
max_new_tokens=60,
num_return_sequences=2,
temperature=0.8,
do_sample=True,
truncation=True
)
for i, seq in enumerate(output):
print(f"--- Sequence {i + 1} ---")
print(seq["generated_text"])
print()
Output:
--- Sequence 1 ---
Transformers in machine learning are a class of deep learning models that rely
on the self-attention mechanism to capture long-range dependencies in sequential
data. They have largely replaced recurrent neural networks in NLP tasks.
--- Sequence 2 ---
Transformers in machine learning are now being applied across vision, audio, and
protein structure prediction — well beyond their original natural language
processing roots.
Setting num_return_sequences=2
makes the model generate two independent continuations from the same prompt. Raising
temperature
above 1.0 increases randomness; lowering it below 1.0 makes output more deterministic.
Question Answering
The question-answering pipeline performs extractive QA: given a
context
passage and a question,
it identifies the span of text inside the context that best answers the question.
from transformers import pipeline
qa = pipeline("question-answering")
context = """
Hugging Face is an AI company founded in 2016 that develops tools for building
machine learning applications. The company is best known for its Transformers
library, which provides thousands of pre-trained models for NLP, computer vision,
and audio tasks. Its Model Hub hosts over 500,000 public models as of 2024.
"""
questions = [
"When was Hugging Face founded?",
"What is Hugging Face best known for?",
"How many public models does the Model Hub host?"
]
for q in questions:
result = qa(question=q, context=context)
print(f"Question : {q}")
print(f"Answer : {result['answer']} (score: {result['score']:.4f})")
print()
Output:
Question : When was Hugging Face founded?
Answer : 2016 (score: 0.9921)
Question : What is Hugging Face best known for?
Answer : Transformers library (score: 0.7308)
Question : How many public models does the Model Hub host?
Answer : over 500,000 (score: 0.9612)
The model does not generate an answer from scratch — it selects a substring from
context.
The score
reflects how confident the model is that the selected span is the correct answer.
Named Entity Recognition
Named Entity Recognition (NER) labels tokens in a sentence with entity types such as persons
(PER), organizations (ORG), and locations (LOC). Using
aggregation_strategy="simple"
merges sub-word tokens back into full words.
from transformers import pipeline
ner = pipeline("ner", aggregation_strategy="simple")
sentence = (
"Elon Musk founded SpaceX in Hawthorne, California, "
"and later acquired Twitter, which was rebranded as X."
)
entities = ner(sentence)
print(f"{'Entity':<20} {'Type':<8} {'Score'}")
print("-" * 42)
for ent in entities:
print(f"{ent['word']:<20} {ent['entity_group']:<8} {ent['score']:.4f}")
Output:
Entity Type Score
------------------------------------------
Elon Musk PER 0.9987
SpaceX ORG 0.9961
Hawthorne LOC 0.9943
California LOC 0.9978
Twitter ORG 0.9832
X ORG 0.8754
The pipeline returns a list of dictionaries, each containing the entity string
(word),
its type (entity_group),
a confidence score,
and character offsets (start,
end)
that pinpoint the entity in the original text.
Conclusion
In this post, we briefly learned what the Hugging Face Transformers pipeline is and how it
simplifies access to pre-trained models for common NLP tasks. We covered text classification, text
generation, question answering, and named entity recognition — all using the same
pipeline()
interface with just a few lines of code. The pipeline API is an excellent starting point for
prototyping and experimentation.
Source Code Listing
from transformers import pipeline
# ── Text Classification ──────────────────────────────────────────────────────
classifier = pipeline("text-classification")
texts = [
"Hugging Face makes NLP so easy and fun!",
"The model took forever to download and then crashed.",
"It was an average experience, nothing special."
]
results = classifier(texts)
for text, result in zip(texts, results):
print(f"Text: {text}")
print(f"Label: {result['label']}, Score: {result['score']:.4f}\n")
# ── Text Generation ──────────────────────────────────────────────────────────
generator = pipeline("text-generation", model="gpt2")
prompt = "Transformers in machine learning are"
output = generator(prompt, max_new_tokens=60, num_return_sequences=2,
temperature=0.8, do_sample=True, truncation=True)
for i, seq in enumerate(output):
print(f"--- Sequence {i + 1} ---\n{seq['generated_text']}\n")
# ── Question Answering ───────────────────────────────────────────────────────
qa = pipeline("question-answering")
context = """
Hugging Face is an AI company founded in 2016 that develops tools for building
machine learning applications. The company is best known for its Transformers
library, which provides thousands of pre-trained models for NLP, computer vision,
and audio tasks. Its Model Hub hosts over 500,000 public models as of 2024.
"""
questions = [
"When was Hugging Face founded?",
"What is Hugging Face best known for?",
"How many public models does the Model Hub host?"
]
for q in questions:
result = qa(question=q, context=context)
print(f"Question : {q}")
print(f"Answer : {result['answer']} (score: {result['score']:.4f})\n")
# ── Named Entity Recognition ─────────────────────────────────────────────────
ner = pipeline("ner", aggregation_strategy="simple")
sentence = (
"Elon Musk founded SpaceX in Hawthorne, California, "
"and later acquired Twitter, which was rebranded as X."
)
entities = ner(sentence)
print(f"{'Entity':<20} {'Type':<8} {'Score'}")
print("-" * 42)
for ent in entities:
print(f"{ent['word']:<20} {ent['entity_group']:<8} {ent['score']:.4f}")
No comments:
Post a Comment