LLM Embeddings – A Practical Introduction in Python

    In this post, we'll briefly learn what LLM embeddings are, how they work, and how to generate and use them in Python. The tutorial covers:

  1. What are Embeddings?
  2. How LLMs Generate Embeddings
  3. Types of Embeddings
  4. Generating Embeddings with Sentence Transformers
  5. Generating Embeddings with OpenAI API
  6. Measuring Semantic Similarity
  7. Visualizing Embeddings with TSNE
  8. Conclusion
  9. Source Code Listing

     Let's get started. 

 

What are Embeddings?

    Embeddings are dense numerical vectors that represent the meaning of text. Instead of working with raw strings, LLMs convert words, sentences, or documents into fixed-size arrays of floating-point numbers that capture semantic relationships.     The key idea is that similar meanings produce similar vectors. For example, the embeddings for "king" and "queen" will be closer together in vector space than the embeddings for "king" and "bicycle". This geometric property makes embeddings extremely useful for search, clustering, classification, and retrieval-augmented generation (RAG).  

Use Case What Embeddings Enable
Semantic search Find documents by meaning, not just keyword match
RAG systems Retrieve relevant chunks to inject into LLM context
Text clustering Group similar documents without labels
Classification Use embeddings as features for a classifier
Duplicate detection Find near-identical texts even when worded differently

 

How LLMs Generate Embeddings

    When text is fed into an LLM, every token is first converted into a vector by an embedding layer — a learned lookup table that maps each token ID to a high-dimensional vector (e.g., 768 or 4096 dimensions). As the text passes through the Transformer layers, these vectors are updated by the self-attention mechanism to incorporate context from the entire sequence.     The final embedding for a sentence is typically produced in one of two ways:
  • [CLS] token pooling — BERT-style models prepend a special [CLS] token and use its final hidden state as the sentence embedding.
  • Mean pooling — average the final hidden states of all tokens. Used by most sentence embedding models.

    Embedding models are separate from generative models — they are optimised specifically for producing high-quality representations, not for generating text. 

 

Types of Embeddings

Type Description Example Models
Word embeddings One vector per word, context-free Word2Vec, GloVe
Contextual embeddings Token vectors depend on surrounding context BERT, RoBERTa
Sentence embeddings One vector per sentence or paragraph all-MiniLM, text-embedding-3
Document embeddings One vector per long document Longformer, BigBird

 

Generating Embeddings with Sentence Transformers

    The sentence-transformers library is the easiest way to generate high-quality sentence embeddings locally. It wraps Hugging Face models with a simple API.  

Installation

% pip install sentence-transformers
Example — Generate sentence embeddings
from sentence_transformers import SentenceTransformer
import numpy as np

# Load a lightweight sentence embedding model
model = SentenceTransformer("all-MiniLM-L6-v2")

sentences = [
    "The cat sat on the mat.",
    "A dog rested on the rug.",
    "LLMs are trained on large text datasets.",
]

# Generate embeddings
embeddings = model.encode(sentences)

print("Shape     :", embeddings.shape)
print("First vec :", np.round(embeddings[0][:6], 4))
Output:
Shape     : (3, 384)
First vec : [ 0.0231 -0.0412  0.0553  0.0187 -0.0329  0.0614]

    Each sentence is represented as a 384-dimensional vector. The all-MiniLM-L6-v2 model is fast, small, and works well for most semantic similarity tasks. 

 

Generating Embeddings with OpenAI API

    OpenAI's text-embedding-3-small model produces 1536-dimensional embeddings and is one of the most widely used embedding APIs in production systems.  

Installation

% pip install openai
Example — OpenAI Embeddings
from openai import OpenAI
import numpy as np

client = OpenAI()   # reads OPENAI_API_KEY from environment

texts = [
    "The cat sat on the mat.",
    "A dog rested on the rug.",
    "LLMs are trained on large text datasets.",
]

response = client.embeddings.create(
    input=texts,
    model="text-embedding-3-small"
)

# Extract vectors
vectors = np.array([d.embedding for d in response.data])

print("Shape     :", vectors.shape)
print("First vec :", np.round(vectors[0][:6], 4))
Output:
Shape     : (3, 1536)
First vec : [ 0.0142 -0.0381  0.0204  0.0519 -0.0173  0.0037]

    The output shape (3, 1536) means 3 sentences, each represented as a 1536-dimensional vector. OpenAI embeddings are already L2-normalised, so cosine similarity can be computed with a simple dot product. 

 

Measuring Semantic Similarity

    The most common way to compare two embeddings is cosine similarity — it measures the angle between two vectors, returning a value between -1 and 1. A score close to 1 means the sentences are semantically similar.
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity

model = SentenceTransformer("all-MiniLM-L6-v2")

sentences = [
    "The cat sat on the mat.",       # sentence A
    "A dog rested on the rug.",       # sentence B — similar to A
    "LLMs are trained on large text datasets.",  # sentence C — unrelated
]

embeddings = model.encode(sentences)

# Compute cosine similarity matrix
sim_matrix = cosine_similarity(embeddings)

print("Similarity Matrix:\n", sim_matrix.round(3))
print("\nA vs B (similar)  :", round(sim_matrix[0, 1], 3))
print("A vs C (unrelated) :", round(sim_matrix[0, 2], 3))
Output:
Similarity Matrix:
 [[1.    0.734 0.051]
  [0.734 1.    0.083]
  [0.051 0.083 1.   ]]

A vs B (similar)   : 0.734
A vs C (unrelated) : 0.051

    The results are exactly what we expect — sentences A and B (both about a pet resting on a surface) score 0.734, while A and C (completely different topics) score only 0.051. The diagonal is always 1.0 since each sentence is perfectly similar to itself. 

 

Visualizing Embeddings with TSNE

    Since embeddings are high-dimensional, we use t-SNE to reduce them to 2D for visualization. Points that appear close together in the plot are semantically similar in the original embedding space.
from sentence_transformers import SentenceTransformer
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt

model = SentenceTransformer("all-MiniLM-L6-v2")

sentences = [
    # Animals
    "The cat sat on the mat.",
    "A dog rested on the rug.",
    "The parrot repeated the word.",
    # AI / ML
    "LLMs are trained on large text datasets.",
    "Transformers use self-attention mechanisms.",
    "Embeddings capture semantic meaning as vectors.",
    # Food
    "Pizza is topped with cheese and tomato sauce.",
    "Sushi is a traditional Japanese dish.",
    "Pasta is a staple of Italian cuisine.",
]

labels = ["Animals"] * 3 + ["AI/ML"] * 3 + ["Food"] * 3
colors = {"Animals": "#58A6FF", "AI/ML": "#BC8CFF", "Food": "#3FB950"}

embeddings = model.encode(sentences)

# Reduce to 2D
tsne = TSNE(n_components=2, random_state=42, perplexity=3)
coords = tsne.fit_transform(embeddings)

# Plot
fig, ax = plt.subplots(figsize=(7, 5))
for i, (x, y) in enumerate(coords):
    ax.scatter(x, y, color=colors[labels[i]], s=120)
    ax.annotate(labels[i], (x, y), fontsize=9,
                xytext=(5, 5), textcoords="offset points")
ax.set_title("Sentence Embeddings – TSNE Visualization")
plt.tight_layout()
plt.savefig("embeddings_tsne.png", dpi=150)
plt.show()

    The resulting plot will show three visible clusters — Animals, AI/ML, and Food — grouped together in 2D space, confirming that the embeddings successfully capture topic-level similarity even without any labels. 

 

Conclusion

    In this post, we briefly learned what LLM embeddings are, how they are generated through the Transformer's embedding layer and pooling, and how to use them in Python with both sentence-transformers and the OpenAI API. We also measured semantic similarity using cosine similarity and visualized the embedding space with t-SNE. Embeddings are the foundation of modern semantic search and RAG pipelines — understanding them is essential for building real-world LLM applications. In the next post, we will build a simple semantic search engine using embeddings and FAISS vector store. 

 

Source Code Listing

from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
import numpy as np

model = SentenceTransformer("all-MiniLM-L6-v2")

# ----- Basic embedding -----
sentences = [
    "The cat sat on the mat.",
    "A dog rested on the rug.",
    "LLMs are trained on large text datasets.",
]
embeddings = model.encode(sentences)
print("Shape     :", embeddings.shape)
print("First vec :", np.round(embeddings[0][:6], 4))

# ----- Cosine similarity -----
sim_matrix = cosine_similarity(embeddings)
print("Similarity Matrix:\n", sim_matrix.round(3))
print("A vs B (similar)  :", round(sim_matrix[0, 1], 3))
print("A vs C (unrelated) :", round(sim_matrix[0, 2], 3))

# ----- TSNE visualization -----
all_sentences = [
    "The cat sat on the mat.",
    "A dog rested on the rug.",
    "The parrot repeated the word.",
    "LLMs are trained on large text datasets.",
    "Transformers use self-attention mechanisms.",
    "Embeddings capture semantic meaning as vectors.",
    "Pizza is topped with cheese and tomato sauce.",
    "Sushi is a traditional Japanese dish.",
    "Pasta is a staple of Italian cuisine.",
]
labels = ["Animals"] * 3 + ["AI/ML"] * 3 + ["Food"] * 3
colors = {"Animals": "#58A6FF", "AI/ML": "#BC8CFF", "Food": "#3FB950"}
all_emb = model.encode(all_sentences)
coords  = TSNE(n_components=2, random_state=42, perplexity=3).fit_transform(all_emb)
fig, ax = plt.subplots(figsize=(7, 5))
for i, (x, y) in enumerate(coords):
    ax.scatter(x, y, color=colors[labels[i]], s=120)
    ax.annotate(labels[i], (x, y), fontsize=9,
                xytext=(5, 5), textcoords="offset points")
ax.set_title("Sentence Embeddings – TSNE Visualization")
plt.tight_layout()
plt.savefig("embeddings_tsne.png", dpi=150)
plt.show()

 

 

No comments:

Post a Comment