Question 1

What is a vector database and how is it different from a regular database?

Accepted Answer

A vector database is optimised for storing high-dimensional dense vectors (embeddings) and retrieving the most similar vectors to a query vector efficiently. Regular databases excel at exact lookups (find the row where id=42) and range queries on scalar values. Vector databases provide approximate nearest-neighbour (ANN) search: given a query vector, find the k vectors in the database closest by some distance metric (cosine similarity, L2 distance). The 'approximate' part is key — exact nearest-neighbour search in high dimensions is intractably slow; ANN algorithms like HNSW trade a small quality reduction for orders-of-magnitude speed improvements.

Question 2

What is HNSW and why do most vector databases use it?

Accepted Answer

HNSW (Hierarchical Navigable Small World graphs) is the dominant ANN indexing algorithm used in modern vector databases. It builds a multi-layer graph where the top layers have few nodes with long-range connections (for fast coarse navigation) and lower layers have more nodes with shorter connections (for fine-grained search). At query time, search starts at the top layer and greedily descends to the nearest neighbour. HNSW offers an excellent trade-off: sub-millisecond query latency, high recall (>95% of true nearest neighbours found), and incremental insertion (new vectors can be added without rebuilding the index). Key parameters: M (number of connections per node, higher = better recall, more memory), ef_construction (index build quality), ef_search (query-time recall vs. latency trade-off).

Question 3

When should you use pgvector instead of a dedicated vector database?

Accepted Answer

pgvector is a PostgreSQL extension that adds vector similarity search to an existing Postgres database. It is the right choice when: your corpus is small to moderate (under a few million vectors), you want to store vectors alongside your relational data in a single database, you need strong transactional consistency (vectors and metadata in the same ACID transaction), or you want to avoid the operational complexity of a separate vector database service. Dedicated vector databases (Pinecone, Qdrant, Weaviate) outperform pgvector at scale (hundreds of millions of vectors) and offer features like multi-tenancy, namespace isolation, and managed infrastructure. The pgvector HNSW index (added in v0.5.0) significantly improved performance and makes it competitive for most production workloads under ~10M vectors.

Question 4

What is the difference between cosine similarity and dot product similarity?

Accepted Answer

Cosine similarity measures the angle between two vectors, independent of their magnitude: cos(θ) = (a·b)/(|a||b|). It ranges from -1 to 1 (1 = identical direction). Dot product similarity (a·b) measures both angle and magnitude — vectors pointing in the same direction but with larger magnitudes score higher. For embedding models that L2-normalise their output vectors (such that all vectors have magnitude 1), cosine similarity and dot product are equivalent, since |a|=|b|=1 makes the denominator 1. Most modern embedding models normalise outputs, so dot product is preferred in practice (faster computation, same result). Always check the embedding model's documentation — using the wrong similarity metric can significantly degrade retrieval quality.

Database	Type	Hybrid Search	Best For
Pinecone	Managed SaaS	Yes	Zero-ops, fully managed. Simple API. Best for teams that want a managed service.
Qdrant	Open-source / Cloud	Yes	High performance. Rust-based. Strong filtering. GDPR-friendly on-prem deployment.
Weaviate	Open-source / Cloud	Yes (BM25)	Built-in vectorisation modules. Strong hybrid search. GraphQL query interface.
Chroma	Open-source	Limited	Easy local development. Default in many LangChain tutorials. Not for large-scale production.
pgvector	Postgres extension	With pg_search	Single database for vectors + relational data. ACID. Good for ≤10M vectors.
FAISS	Library (no DB)	No	Facebook's ANN library. Fast; no persistence, metadata, or server. For custom pipelines.

Vector Databases
The 2026 Skills Guide

How Vector Search Works

HNSW vs IVF-PQ: Index Algorithms

Vector Database Comparison

Metadata Filtering and Hybrid Search

Frequently Asked Questions

What is a vector database and how is it different from a regular database?

What is HNSW and why do most vector databases use it?

When should you use pgvector instead of a dedicated vector database?

What is the difference between cosine similarity and dot product?

What is hybrid search in vector databases?

Browse AI Engineering Jobs

Quick Facts

Key Technologies

Related Skills

Vector DatabasesThe 2026 Skills Guide

How Vector Search Works

HNSW vs IVF-PQ: Index Algorithms

Vector Database Comparison

Metadata Filtering and Hybrid Search

Frequently Asked Questions

What is a vector database and how is it different from a regular database?

What is HNSW and why do most vector databases use it?

When should you use pgvector instead of a dedicated vector database?

What is the difference between cosine similarity and dot product?

What is hybrid search in vector databases?

Browse AI Engineering Jobs

Quick Facts

Key Technologies

Related Skills

Vector Databases
The 2026 Skills Guide