The GenAI engineering toolchain has matured significantly since 2023. The noise has settled and a clearer set of production-grade tools has emerged. Here's what UK engineers are actually using in production — and what you need to know.
Layer 1: LLM APIs — The Foundation
Every GenAI application starts with LLM API access. In 2026, production systems typically use multiple providers for resilience, cost, and capability reasons.
OpenAI (GPT-4o, o3): Still the most widely used in production. GPT-4o is the default choice for mixed text/code/vision tasks. The o3 series (reasoning models) is used for complex analysis, coding, and tasks requiring multi-step logical reasoning. OpenAI's ecosystem maturity and extensive tooling integration make it the path of least resistance for most teams.
Anthropic (Claude 3.5+): Strong preference for tasks where accuracy and low hallucination rates matter — document analysis, long-context summarisation, instruction-following fidelity. Many UK companies default to Claude for regulated use cases (legal, financial) where correctness is paramount.
Google (Gemini 1.5/2.0 Pro/Flash): Used primarily for its context window (up to 2M tokens with Gemini 1.5 Pro), making it the go-to for processing very long documents. Gemini Flash is widely used as a cost-effective option for simpler tasks where GPT-4o is overkill.
Open-weight models (Llama 3, Mistral, Qwen): Used at companies with data privacy requirements or significant cost pressure. Served via vLLM, Ollama, or managed via Hugging Face Inference Endpoints. The capability gap vs. proprietary models has narrowed substantially.
Layer 2: Orchestration Frameworks
LangChain: The most widely adopted orchestration framework. Best for complex workflows, multi-agent systems, and teams who want a large ecosystem of integrations. The abstractions can add complexity to simpler use cases — experienced engineers often bypass LangChain for straightforward integrations.
LlamaIndex: Preferred for RAG-heavy applications. Its data connectors, index abstractions, and query engines are better suited to document-centric systems than LangChain. Often the better choice when the core work is retrieval and knowledge base management.
Direct API calls: Many production teams use minimal or no framework for simpler integrations. A well-structured Python module using the OpenAI SDK directly is often more maintainable than a LangChain chain for a straightforward classification or generation task. Choose frameworks for their genuine benefits, not because they're expected.
Layer 3: Vector Databases
The vector database landscape has stabilised around a smaller set of production-proven options:
- pgvector: PostgreSQL extension for vector similarity search. Used at companies already running Postgres who want to avoid managing a separate vector DB service. Excellent for most production use cases; only falls short at very large scale (100M+ vectors).
- Pinecone: Managed vector database service. Most widely used dedicated vector DB in UK production environments. Serverless tier makes it accessible for smaller applications.
- Weaviate: Strong open-source option with good cloud and self-hosted options. Good hybrid search (combining dense + sparse retrieval) out of the box.
- Qdrant: Fast, efficient, well-engineered open-source vector DB with strong Python client. Growing UK adoption.
Layer 4: Evaluation and Observability
This is the layer most frequently underinvested in — and where good engineers differentiate themselves.
LangSmith: LLM tracing, dataset management, and evaluation tooling. The most widely used platform for LLM observability at UK companies. Makes it possible to trace exactly what happened in a complex LangChain workflow, run regression tests, and compare prompt versions systematically.
Langfuse: Open-source alternative to LangSmith with good self-hosting options. Preferred by teams with data privacy requirements or who want to avoid vendor lock-in.
RAGAS: The standard evaluation framework for RAG systems. Measures context relevance, answer faithfulness, and answer relevance using LLM-as-judge techniques. Essential for any serious RAG implementation.
Layer 5: Serving and Infrastructure
FastAPI: The default for serving AI features as Python APIs. Fast, well-documented, excellent async support for streaming LLM responses.
Docker + Kubernetes: Standard containerisation and orchestration. Most UK companies deploying GenAI services use Docker for packaging and Kubernetes (often via EKS, GKE, or AKS) for production orchestration at scale.
Cloud ML platforms: AWS Bedrock (for managed access to multiple LLMs), Azure OpenAI (GPT models with data residency guarantees), Google Vertex AI (Gemini models + managed ML infrastructure). Enterprise customers often use these for compliance and data sovereignty reasons.
See the full GenAI Engineer role guide
Salary benchmarks, skills, top UK employers, and career progression paths.
Frequently Asked Questions
What LLM APIs do GenAI engineers use most?
OpenAI (GPT-4o, o3), Anthropic (Claude 3.5+), and Google (Gemini 1.5/2.0). Most production systems use multiple providers for resilience and cost optimisation.
Is LangChain still used in 2026?
Yes — still the most widely adopted orchestration framework, particularly for complex workflows and agents. Many engineers use direct API calls for simpler integrations where framework overhead adds unnecessary complexity.
Which vector database should I learn?
Start with pgvector (if you know PostgreSQL) or Pinecone (managed service, widely used). The concepts transfer across all vector databases.
What evaluation tools do GenAI engineers use?
LangSmith or Langfuse for LLM tracing and observability, RAGAS for RAG evaluation, and custom eval harnesses for specific tasks. Evaluation tooling is the most underinvested part of most GenAI stacks.