Question 1

What is Retrieval-Augmented Generation (RAG)?

Accepted Answer

RAG is a technique that augments an LLM's generation with information retrieved from an external knowledge base at inference time. Instead of relying solely on knowledge encoded in model weights, RAG retrieves relevant document chunks based on the user's query, then provides those chunks as context in the LLM's prompt. This addresses two key limitations of LLMs: knowledge cutoffs (the model only knows what was in its training data) and hallucination (the model invents plausible-sounding but false information). RAG is particularly well-suited for enterprise knowledge bases, documentation assistants, and any use case where up-to-date or private information is required.

Question 2

What is the difference between naive RAG and advanced RAG?

Accepted Answer

Naive RAG uses a fixed chunking strategy (typically fixed-size with overlap), a single embedding-based similarity search, and directly passes the top-k retrieved chunks to the LLM. Advanced RAG encompasses techniques that improve retrieval quality and context relevance: query rewriting and decomposition, HyDE (generating a hypothetical answer and using its embedding for retrieval), parent document retrieval (retrieve small chunks, return larger parent context), multi-hop retrieval (chain multiple retrievals to answer complex questions), reranking with cross-encoders, hybrid search (combining sparse BM25 and dense vector search), and self-RAG (the model decides when retrieval is needed and assesses retrieved quality).

Question 3

What chunk size should you use for RAG?

Accepted Answer

There is no universally optimal chunk size — it depends on the document type, embedding model, and LLM context window. Common starting points: 256–512 tokens for precise factual retrieval (legal, medical, technical docs), 512–1024 tokens for narrative or explanatory content. Overlap of 10–20% between chunks preserves cross-boundary context. Semantic chunking (splitting at embedding similarity boundaries) often outperforms fixed-size chunking for heterogeneous document collections. Test multiple sizes using RAGAS or a task-specific evaluation set — chunk size is one of the highest-leverage hyperparameters in a RAG pipeline.

Question 4

What is the difference between sparse and dense retrieval, and what is hybrid search?

Accepted Answer

Sparse retrieval (e.g., BM25, TF-IDF) represents documents as sparse vectors of term frequencies and retrieves via keyword matching. It excels for precise keyword queries and out-of-distribution terms. Dense retrieval uses embedding models to encode queries and documents into dense vectors, capturing semantic similarity rather than lexical overlap. It handles paraphrase and synonym queries that sparse retrieval misses. Hybrid search combines both: retrieve candidates with BM25 and dense search separately, then merge the result lists using Reciprocal Rank Fusion (RRF) or a weighted combination. Most modern RAG production deployments use hybrid search because it handles both keyword-sensitive (e.g., product codes, proper nouns) and semantically rich queries.

Question 5

How do you evaluate a RAG pipeline?

Accepted Answer

RAGAS (Retrieval-Augmented Generation Assessment) is the standard evaluation framework, measuring: Faithfulness (is the answer fully grounded in the retrieved context, with no hallucinated claims?), Answer Relevancy (does the answer address the question?), Context Precision (are the retrieved chunks relevant to the query?), and Context Recall (do the retrieved chunks contain all information needed to answer the question?). RAGAS uses an LLM judge (typically a capable model) to score these dimensions. Additionally: end-to-end QA evaluation on a labelled benchmark, latency profiling (retrieval + generation), and cost analysis (embedding API calls, LLM tokens) are important for production pipelines.

Retrieval-Augmented Generation
The 2026 Skills Guide

Why RAG Became the Default Architecture

The RAG Pipeline: Step by Step

Choosing an Embedding Model

Advanced RAG Patterns

RAG Frameworks

Learning Path for RAG Skills

Foundations (0–4 weeks)

Core Skills (1–3 months)

Advanced (3–6 months)

Frequently Asked Questions

What is RAG and why does it matter for AI engineering?

What is the difference between naive RAG and advanced RAG?

What chunk size should you use?

What is hybrid search and why use it?

How do you evaluate a RAG pipeline?

Browse RAG and LLM Engineering Jobs

Quick Facts

Key Tools

Roles That Need This

Related Skills

Retrieval-Augmented GenerationThe 2026 Skills Guide

Why RAG Became the Default Architecture

The RAG Pipeline: Step by Step

Choosing an Embedding Model

Advanced RAG Patterns

RAG Frameworks

Learning Path for RAG Skills

Foundations (0–4 weeks)

Core Skills (1–3 months)

Advanced (3–6 months)

Frequently Asked Questions

What is RAG and why does it matter for AI engineering?

What is the difference between naive RAG and advanced RAG?

What chunk size should you use?

What is hybrid search and why use it?

How do you evaluate a RAG pipeline?

Browse RAG and LLM Engineering Jobs

Quick Facts

Key Tools

Roles That Need This

Related Skills

Retrieval-Augmented Generation
The 2026 Skills Guide