In this post, we'll briefly learn what LLM embeddings are, how they work, and how to generate and use them in Python. The tutorial covers:
- What are Embeddings?
- How LLMs Generate Embeddings
- Types of Embeddings
- Generating Embeddings with Sentence Transformers
- Generating Embeddings with OpenAI API
- Measuring Semantic Similarity
- Visualizing Embeddings with TSNE
- Conclusion
- Source Code Listing
Let's get started.
What are Embeddings?
Embeddings are dense numerical vectors that represent the meaning of text. Instead of working with raw strings, LLMs convert words, sentences, or documents into fixed-size arrays of floating-point numbers that capture semantic relationships. The key idea is that similar meanings produce similar vectors. For example, the embeddings for "king" and "queen" will be closer together in vector space than the embeddings for "king" and "bicycle". This geometric property makes embeddings extremely useful for search, clustering, classification, and retrieval-augmented generation (RAG).
| Use Case | What Embeddings Enable |
|---|---|
| Semantic search | Find documents by meaning, not just keyword match |
| RAG systems | Retrieve relevant chunks to inject into LLM context |
| Text clustering | Group similar documents without labels |
| Classification | Use embeddings as features for a classifier |
| Duplicate detection | Find near-identical texts even when worded differently |
How LLMs Generate Embeddings
When text is fed into an LLM, every token is first converted into a vector by an embedding layer — a learned lookup table that maps each token ID to a high-dimensional vector (e.g., 768 or 4096 dimensions). As the text passes through the Transformer layers, these vectors are updated by the self-attention mechanism to incorporate context from the entire sequence. The final embedding for a sentence is typically produced in one of two ways:- [CLS] token pooling — BERT-style models prepend a special
[CLS]token and use its final hidden state as the sentence embedding. - Mean pooling — average the final hidden states of all tokens. Used by most sentence embedding models.
Embedding models are separate from generative models — they are optimised specifically for producing high-quality representations, not for generating text.
Types of Embeddings
| Type | Description | Example Models |
|---|---|---|
| Word embeddings | One vector per word, context-free | Word2Vec, GloVe |
| Contextual embeddings | Token vectors depend on surrounding context | BERT, RoBERTa |
| Sentence embeddings | One vector per sentence or paragraph | all-MiniLM, text-embedding-3 |
| Document embeddings | One vector per long document | Longformer, BigBird |
Generating Embeddings with Sentence Transformers
The sentence-transformers library is the easiest way to
generate high-quality sentence embeddings locally. It wraps Hugging
Face models with a simple API.
Installation
% pip install sentence-transformers
Example — Generate sentence embeddings
from sentence_transformers import SentenceTransformer import numpy as np # Load a lightweight sentence embedding model model = SentenceTransformer("all-MiniLM-L6-v2") sentences = [ "The cat sat on the mat.", "A dog rested on the rug.", "LLMs are trained on large text datasets.", ] # Generate embeddings embeddings = model.encode(sentences) print("Shape :", embeddings.shape) print("First vec :", np.round(embeddings[0][:6], 4))Output:
Shape : (3, 384) First vec : [ 0.0231 -0.0412 0.0553 0.0187 -0.0329 0.0614]
Each sentence is represented as a 384-dimensional vector. The all-MiniLM-L6-v2 model is fast, small, and works well for most semantic similarity tasks.
Generating Embeddings with OpenAI API
OpenAI's text-embedding-3-small model produces 1536-dimensional embeddings and is one of the most widely used embedding APIs in production systems.
Installation
% pip install openai
Example — OpenAI Embeddings
from openai import OpenAI import numpy as np client = OpenAI() # reads OPENAI_API_KEY from environment texts = [ "The cat sat on the mat.", "A dog rested on the rug.", "LLMs are trained on large text datasets.", ] response = client.embeddings.create( input=texts, model="text-embedding-3-small" ) # Extract vectors vectors = np.array([d.embedding for d in response.data]) print("Shape :", vectors.shape) print("First vec :", np.round(vectors[0][:6], 4))Output:
Shape : (3, 1536) First vec : [ 0.0142 -0.0381 0.0204 0.0519 -0.0173 0.0037]
The output shape (3, 1536) means 3 sentences, each
represented as a 1536-dimensional vector. OpenAI embeddings are already
L2-normalised, so cosine similarity can be computed with a simple dot
product.
Measuring Semantic Similarity
The most common way to compare two embeddings is cosine similarity — it measures the angle between two vectors, returning a value between -1 and 1. A score close to 1 means the sentences are semantically similar.from sentence_transformers import SentenceTransformer from sklearn.metrics.pairwise import cosine_similarity model = SentenceTransformer("all-MiniLM-L6-v2") sentences = [ "The cat sat on the mat.", # sentence A "A dog rested on the rug.", # sentence B — similar to A "LLMs are trained on large text datasets.", # sentence C — unrelated ] embeddings = model.encode(sentences) # Compute cosine similarity matrix sim_matrix = cosine_similarity(embeddings) print("Similarity Matrix:\n", sim_matrix.round(3)) print("\nA vs B (similar) :", round(sim_matrix[0, 1], 3)) print("A vs C (unrelated) :", round(sim_matrix[0, 2], 3))Output:
Similarity Matrix: [[1. 0.734 0.051] [0.734 1. 0.083] [0.051 0.083 1. ]] A vs B (similar) : 0.734 A vs C (unrelated) : 0.051
The results are exactly what we expect — sentences A and B (both about a pet resting on a surface) score 0.734, while A and C (completely different topics) score only 0.051. The diagonal is always 1.0 since each sentence is perfectly similar to itself.
Visualizing Embeddings with TSNE
Since embeddings are high-dimensional, we use t-SNE to reduce them to 2D for visualization. Points that appear close together in the plot are semantically similar in the original embedding space.from sentence_transformers import SentenceTransformer from sklearn.manifold import TSNE import matplotlib.pyplot as plt model = SentenceTransformer("all-MiniLM-L6-v2") sentences = [ # Animals "The cat sat on the mat.", "A dog rested on the rug.", "The parrot repeated the word.", # AI / ML "LLMs are trained on large text datasets.", "Transformers use self-attention mechanisms.", "Embeddings capture semantic meaning as vectors.", # Food "Pizza is topped with cheese and tomato sauce.", "Sushi is a traditional Japanese dish.", "Pasta is a staple of Italian cuisine.", ] labels = ["Animals"] * 3 + ["AI/ML"] * 3 + ["Food"] * 3 colors = {"Animals": "#58A6FF", "AI/ML": "#BC8CFF", "Food": "#3FB950"} embeddings = model.encode(sentences) # Reduce to 2D tsne = TSNE(n_components=2, random_state=42, perplexity=3) coords = tsne.fit_transform(embeddings) # Plot fig, ax = plt.subplots(figsize=(7, 5)) for i, (x, y) in enumerate(coords): ax.scatter(x, y, color=colors[labels[i]], s=120) ax.annotate(labels[i], (x, y), fontsize=9, xytext=(5, 5), textcoords="offset points") ax.set_title("Sentence Embeddings – TSNE Visualization") plt.tight_layout() plt.savefig("embeddings_tsne.png", dpi=150) plt.show()
The resulting plot will show three visible clusters — Animals, AI/ML, and Food — grouped together in 2D space, confirming that the embeddings successfully capture topic-level similarity even without any labels.
Conclusion
In this post, we briefly learned what LLM embeddings are, how they are generated through the Transformer's embedding layer and pooling, and how to use them in Python with both sentence-transformers
and the OpenAI API. We also measured semantic similarity using cosine
similarity and visualized the embedding space with t-SNE. Embeddings are
the foundation of modern semantic search and RAG pipelines —
understanding them is essential for building real-world LLM
applications. In the next post, we will build a simple semantic search engine using embeddings and FAISS vector store.
Source Code Listing
from sentence_transformers import SentenceTransformer from sklearn.metrics.pairwise import cosine_similarity from sklearn.manifold import TSNE import matplotlib.pyplot as plt import numpy as np model = SentenceTransformer("all-MiniLM-L6-v2") # ----- Basic embedding ----- sentences = [ "The cat sat on the mat.", "A dog rested on the rug.", "LLMs are trained on large text datasets.", ] embeddings = model.encode(sentences) print("Shape :", embeddings.shape) print("First vec :", np.round(embeddings[0][:6], 4)) # ----- Cosine similarity ----- sim_matrix = cosine_similarity(embeddings) print("Similarity Matrix:\n", sim_matrix.round(3)) print("A vs B (similar) :", round(sim_matrix[0, 1], 3)) print("A vs C (unrelated) :", round(sim_matrix[0, 2], 3)) # ----- TSNE visualization ----- all_sentences = [ "The cat sat on the mat.", "A dog rested on the rug.", "The parrot repeated the word.", "LLMs are trained on large text datasets.", "Transformers use self-attention mechanisms.", "Embeddings capture semantic meaning as vectors.", "Pizza is topped with cheese and tomato sauce.", "Sushi is a traditional Japanese dish.", "Pasta is a staple of Italian cuisine.", ] labels = ["Animals"] * 3 + ["AI/ML"] * 3 + ["Food"] * 3 colors = {"Animals": "#58A6FF", "AI/ML": "#BC8CFF", "Food": "#3FB950"} all_emb = model.encode(all_sentences) coords = TSNE(n_components=2, random_state=42, perplexity=3).fit_transform(all_emb) fig, ax = plt.subplots(figsize=(7, 5)) for i, (x, y) in enumerate(coords): ax.scatter(x, y, color=colors[labels[i]], s=120) ax.annotate(labels[i], (x, y), fontsize=9, xytext=(5, 5), textcoords="offset points") ax.set_title("Sentence Embeddings – TSNE Visualization") plt.tight_layout() plt.savefig("embeddings_tsne.png", dpi=150) plt.show()
No comments:
Post a Comment