Embeddings are the backbone of modern AI applications — from search to recommendations to RAG systems. This technical guide covers everything from theory to production Python code.
If you build AI applications, you need to understand embeddings. They power semantic search, RAG systems, recommendation engines, clustering, anomaly detection, and dozens of other applications. Yet most developers treat them as a black box.
Let's open that box.
What Are Embeddings?
An embedding is a list of numbers (vector) that represents the meaning of text. Similar meanings produce similar vectors. Different meanings produce different vectors.
# "The cat sat on the mat" → [0.23, -0.45, 0.67, ...] (1536 numbers)
# "A feline rested on a rug" → [0.21, -0.43, 0.65, ...] (similar!)
# "Stock prices rose sharply" → [-0.89, 0.12, -0.34, ...] (very different!)
The mathematical distance between vectors corresponds to the semantic distance between concepts. This is what makes them useful — you can search by meaning, not just keywords.
Generating Embeddings in Python
Option 1: OpenAI Embeddings
from openai import OpenAI
client = OpenAI()
def get_embedding(text: str, model: str = "text-embedding-3-large") -> list[float]:
response = client.embeddings.create(
input=text,
model=model
)
return response.data[0].embedding
# Generate embeddings
embedding = get_embedding("How do neural networks learn?")
print(f"Dimensions: {len(embedding)}") # 3072 for text-embedding-3-large
Option 2: Open-Source with sentence-transformers
from sentence_transformers import SentenceTransformer
# Free, runs locally, no API key needed
model = SentenceTransformer('all-MiniLM-L6-v2')
texts = [
"How do neural networks learn?",
"What is backpropagation in deep learning?",
"Best restaurants in New York City"
]
embeddings = model.encode(texts)
print(f"Shape: {embeddings.shape}") # (3, 384)
Similarity Search
The core operation: given a query, find the most similar items.
import numpy as np
from numpy.linalg import norm
def cosine_similarity(a: np.ndarray, b: np.ndarray) -> float:
return np.dot(a, b) / (norm(a) * norm(b))
# Compare similarity
query = model.encode("machine learning algorithms")
documents = [
model.encode("supervised learning classification methods"),
model.encode("best pizza delivery services"),
model.encode("neural network training techniques"),
]
for i, doc in enumerate(documents):
sim = cosine_similarity(query, doc)
print(f"Document {i}: {sim:.4f}")
# Output:
# Document 0: 0.7823 (related!)
# Document 1: 0.1234 (not related)
# Document 2: 0.8156 (very related!)
Vector Databases
For production use, you need a vector database that handles billions of vectors efficiently.
Pinecone Example
from pinecone import Pinecone
pc = Pinecone(api_key="your-key")
index = pc.Index("my-index")
# Upsert vectors
vectors = [
{"id": "doc1", "values": embedding1, "metadata": {"title": "ML Basics"}},
{"id": "doc2", "values": embedding2, "metadata": {"title": "Pizza Guide"}},
]
index.upsert(vectors=vectors)
# Query
results = index.query(
vector=query_embedding,
top_k=5,
include_metadata=True
)
Supabase pgvector (Free Option)
-- Enable the extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Create a table with a vector column
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
content TEXT,
embedding vector(1536)
);
-- Insert with embedding
INSERT INTO documents (content, embedding)
VALUES ('Machine learning basics', '[0.1, 0.2, ...]');
-- Similarity search
SELECT content, 1 - (embedding <=> '[0.15, 0.25, ...]') AS similarity
FROM documents
ORDER BY embedding <=> '[0.15, 0.25, ...]'
LIMIT 5;
Practical Applications
1. Semantic Search Engine
Replace keyword search with meaning-based search. Users can search "how to fix slow website" and find documents about "performance optimization" even if those words don't appear.
2. Duplicate Detection
Find near-duplicate content by comparing embedding similarity. Useful for content moderation, plagiarism detection, and deduplication.
3. Recommendation Systems
Embed user preferences and item descriptions. Recommend items whose embeddings are closest to the user's preference vector.
4. Clustering and Classification
Group similar items automatically using k-means or HDBSCAN on embeddings. No labels needed — the structure emerges from the data.
Choosing an Embedding Model
- OpenAI text-embedding-3-large: Best quality, $0.13/million tokens
- OpenAI text-embedding-3-small: Good quality, $0.02/million tokens
- Cohere embed-v4: Competitive quality, good for multilingual
- all-MiniLM-L6-v2: Free, runs locally, 384 dimensions, great for prototyping
- BGE-large-en-v1.5: Free, runs locally, 1024 dimensions, production-ready quality
People Also Ask
How many dimensions should embeddings have?
More dimensions capture more nuance but cost more to store and search. For most applications, 768-1536 dimensions is the sweet spot. 384 is fine for prototyping.
Can I use embeddings for images?
Yes — CLIP and similar models create embeddings for both text and images in the same vector space. You can search for images using text queries and vice versa.
How much does vector storage cost?
Pinecone: ~$0.33/million vectors/month. Supabase pgvector: included in Supabase pricing. Self-hosted Qdrant or Chroma: just your server costs.
Want to skip months of trial and error? We've distilled thousands of hours of prompt engineering into ready-to-use prompt packs that deliver results on day one. Our packs at wowhow.cloud include battle-tested prompts for marketing, coding, business, writing, and more — each one refined until it consistently produces professional-grade output.
Blog reader exclusive: Use code
BLOGREADER20for 20% off your entire cart. No minimum, no catch.
Written by
Promptium Team
Expert contributor at WOWHOW. Writing about AI, development, automation, and building products that ship.
Ready to ship faster?
Browse our catalog of 1,800+ premium dev tools, prompt packs, and templates.