How Vector Search Works
An embedding model maps items — text chunks, images, audio, user profiles — to dense vectors in a high-dimensional space (typically 384 to 3072 dimensions). Items with similar meaning or characteristics have similar vectors. Vector search exploits this: to find documents semantically related to a query, embed the query and find the k vectors in the database most similar to the query vector.
Similarity metrics:
- Cosine similarity — cos(θ) = (a·b)/(|a||b|). Measures the angle between vectors, ignoring magnitude. Range: −1 to 1. Most widely used for text embeddings.
- Dot product — a·b. Equivalent to cosine for unit-norm vectors. Preferred for speed when embeddings are L2-normalised (which most modern embedding models produce).
- L2 (Euclidean) distance — ||a−b||. Measures straight-line distance. Can be appropriate when magnitude carries meaning (e.g., item popularity-weighted embeddings).
Why approximate and not exact nearest-neighbour? Exact nearest-neighbour search in d dimensions requires comparing the query against every stored vector — O(n×d) per query. At 1M vectors with 1536 dimensions, that is 1.5 billion floating-point operations per query. ANN algorithms (HNSW, IVF-PQ) achieve 95–99% of true nearest-neighbour quality in microseconds by building index structures that allow skipping the majority of comparisons.
HNSW vs IVF-PQ: Index Algorithms
HNSW (Hierarchical Navigable Small World): The dominant algorithm in production vector databases. Builds a multi-layer graph during indexing. Query time: start at the entry point of the top layer, greedily navigate toward the nearest neighbour, descend to the next layer when no closer nodes exist. Repeat until the bottom layer.
M: Number of bi-directional connections per node. Higher M → better recall, more memory, slower indexing. Typical value: 16–64.ef_construction: Size of the dynamic candidate list during index construction. Higher → better index quality, slower build. Typical: 100–200.ef_search: Size of the dynamic candidate list during search. Higher → better recall, higher latency. Tune based on your recall/latency SLA.
IVF-PQ (Inverted File with Product Quantisation): Used by FAISS for very large corpora. Clusters vectors into K Voronoi cells (IVF); at query time, search only the nprobe closest cells. Product Quantisation compresses vectors into compact codes, dramatically reducing memory (a 1536-dim float32 vector from 6KB to ~64 bytes). Best for billion-scale search with constrained memory. Lower recall than HNSW at the same index size; higher throughput.
Vector Database Comparison
| Database | Type | Hybrid Search | Best For |
|---|---|---|---|
| Pinecone | Managed SaaS | Yes | Zero-ops, fully managed. Simple API. Best for teams that want a managed service. |
| Qdrant | Open-source / Cloud | Yes | High performance. Rust-based. Strong filtering. GDPR-friendly on-prem deployment. |
| Weaviate | Open-source / Cloud | Yes (BM25) | Built-in vectorisation modules. Strong hybrid search. GraphQL query interface. |
| Chroma | Open-source | Limited | Easy local development. Default in many LangChain tutorials. Not for large-scale production. |
| pgvector | Postgres extension | With pg_search | Single database for vectors + relational data. ACID. Good for ≤10M vectors. |
| FAISS | Library (no DB) | No | Facebook's ANN library. Fast; no persistence, metadata, or server. For custom pipelines. |
Metadata Filtering and Hybrid Search
Metadata filtering is essential for multi-tenant systems or corpora with natural subsets. Every vector is stored alongside structured metadata (author, date, department, document type). Queries can filter on metadata before or after ANN search:
- Pre-filter: Apply the metadata filter first, then search ANN within the matching subset. Can degrade recall significantly if the filter reduces the candidate set to a small number of vectors — HNSW graph connectivity assumes a denser graph.
- Post-filter: Retrieve the top-k ANN results, then apply the metadata filter. Can return fewer than k results if many pass the ANN but not the filter.
- Filtered HNSW: Modern systems (Qdrant, Weaviate) use an adaptive strategy — if the filter is selective (few matching vectors), switch to brute-force over the filtered subset; otherwise use HNSW with payload-aware graph navigation.
Hybrid search combines sparse retrieval (BM25 — term frequency-based) and dense retrieval (embedding-based). BM25 handles exact keyword matches, product codes, and proper nouns that embedding models may compress away. Dense search handles semantic similarity and synonyms. Results are merged using Reciprocal Rank Fusion: for each result ranked at position r in a list, score = 1/(k+r), where k is a constant (typically 60). Scores are summed across the sparse and dense lists and re-ranked. Production RAG systems consistently perform better with hybrid search than pure dense retrieval.
Frequently Asked Questions
What is a vector database and how is it different from a regular database?
A vector database stores high-dimensional embeddings and retrieves the most similar vectors via approximate nearest-neighbour (ANN) search. Regular databases do exact lookups on scalar values. Vector databases trade slight accuracy (approximate, not exact) for orders-of-magnitude speed improvement, using algorithms like HNSW.
What is HNSW and why do most vector databases use it?
HNSW (Hierarchical Navigable Small World graphs) builds a multi-layer graph — top layers for fast coarse navigation, lower layers for fine-grained search. Search starts at the top and descends greedily. Excellent recall (>95%), sub-millisecond latency, and supports incremental insertion. Key parameters: M (connections per node), ef_construction (build quality), ef_search (recall/latency at query time).
When should you use pgvector instead of a dedicated vector database?
pgvector is right when: corpus <~10M vectors, you want vectors alongside relational data in one database, or you need ACID transactions. Dedicated databases (Pinecone, Qdrant, Weaviate) outperform at scale (100M+ vectors) and offer managed infrastructure, multi-tenancy, and advanced filtering. pgvector's HNSW index (v0.5.0+) is competitive for most workloads.
What is the difference between cosine similarity and dot product?
Cosine similarity measures angle between vectors, ignoring magnitude. Dot product measures both angle and magnitude. For L2-normalised embeddings (unit vectors), they are equivalent. Most modern embedding models normalise outputs — use dot product (faster computation, same result). Always check the embedding model documentation.
What is hybrid search in vector databases?
Hybrid search combines sparse (BM25/TF-IDF, keyword matching) and dense (vector, semantic) retrieval. Results are merged using Reciprocal Rank Fusion (RRF) or weighted combination. Handles both keyword-sensitive queries (product codes, proper nouns) and semantically rich queries. Weaviate and Qdrant have native hybrid search. Elasticsearch + dense vectors is another common pattern.