The MLOps toolspace has fragmented significantly since 2022. This guide cuts through the noise: what each category does, honest comparisons where choices matter, and the minimum viable stacks for startups and enterprises.
Containerisation: Non-Negotiable Foundation
Docker is non-negotiable. Every ML system runs in containers — training jobs, serving endpoints, pipeline steps. You need to be able to write production Dockerfiles, manage multi-stage builds, optimise image sizes, and understand how Docker networking works in practice.
Kubernetes is the orchestration layer. For MLOps: GPU node management, resource quotas for training jobs, auto-scaling serving deployments, and Helm chart management for ML platform components. Helm — the Kubernetes package manager — is the standard way to deploy and manage MLOps tooling in a cluster.
Pipeline Orchestration: Choosing the Right Tool
This is where many teams make the wrong choice early:
- Apache Airflow — mature, widely deployed, large ecosystem. Can be overly complex for pure ML pipelines if you're not already using it for data engineering. Strong choice if your data engineering team already runs Airflow.
- Prefect — better developer experience than Airflow, Python-native, easier to run locally. Increasingly popular for ML pipelines at UK scale-ups that don't want Airflow's operational overhead.
- Kubeflow Pipelines — Kubernetes-native ML pipelines. Best when your compute is already on Kubernetes and you want tight integration with the broader Kubeflow ML platform. Higher setup complexity than the above.
Experiment Tracking: MLflow vs Weights & Biases
MLflow: Open-source, self-hostable, has a model registry, integrates with most ML frameworks. Better for organisations with data governance requirements or that prefer not to send experiment data to a third-party service. The model registry and deployment integrations are more mature than W&B's.
Weights & Biases (W&B): Managed service, better UI and collaboration features, faster onboarding. Better for teams that want to move quickly and don't need to self-host. Very popular at AI research labs and ML-focused startups.
Our recommendation: for most UK startups and scale-ups without data sovereignty requirements, W&B's managed tier is the faster path. For enterprises, financial services, and any context where training data confidentiality is a concern, self-hosted MLflow.
The most common MLOps toolchain mistake
Choosing tools based on what's most impressive rather than what the team actually needs. A well-run MLflow setup is worth more than a poorly-maintained Kubeflow cluster. Start simple and add complexity when you genuinely need it.
Data Versioning
DVC (Data Version Control): Works alongside Git to version datasets and models. Essential when you need to reproduce a past training run exactly. Integrates with cloud storage (S3, GCS, Azure Blob). When it's overkill: if you have a single static dataset and don't need to reproduce past experiments, DVC adds overhead without proportionate value.
Model Serving
- FastAPI wrapper: For most use cases, a FastAPI endpoint serving a serialised model is sufficient and the simplest to maintain. Start here.
- BentoML: Adds a ML-specific serving abstraction on top of common frameworks. Good middle ground between a raw FastAPI endpoint and a dedicated serving platform.
- Triton Inference Server: NVIDIA's high-performance serving platform. Use it when you have GPU inference requirements and need to maximise throughput at scale. Overkill for most standard use cases.
- Ray Serve: Good for serving pipelines that combine models with pre/post-processing logic, especially when you're already using Ray for distributed training.
Model Monitoring
- Evidently AI: Open-source, Python-native, excellent for data drift and model performance monitoring. Strong recommendation for teams getting started with monitoring.
- Arize AI: Managed monitoring platform with strong explainability features. Better for teams that want a managed solution with a richer UI.
What to monitor: input data distribution (catching data drift early), model output distribution (catching model drift), prediction confidence, latency, and error rates. Set up alerting when drift scores exceed threshold — don't just log, act.
Minimum Viable Stacks
For a startup or early-stage scale-up
Docker + FastAPI serving + GitHub Actions CI/CD + MLflow (or W&B) + Evidently AI for monitoring. Deploy to AWS ECS, GCP Cloud Run, or similar managed container service. No Kubernetes required at this scale.
For an enterprise
Kubernetes (EKS/GKE/AKS) + Kubeflow or Airflow for orchestration + MLflow (self-hosted) + Helm-managed deployments + Triton or BentoML for serving + custom monitoring integrating with existing observability stack (Prometheus/Grafana). Feature store (Feast or Tecton) only when training-serving skew is a documented, recurring problem.
See the full MLOps Engineer career guide
Salary tables, skills breakdown, UK companies hiring, and the full career path.
Frequently Asked Questions
Is MLflow or W&B better for experiment tracking?
Both are strong. MLflow for self-hosted/data governance requirements. W&B for faster onboarding and better collaboration UI. Most UK startups choose W&B; most enterprises choose MLflow.
Do I need a feature store?
Only when you have multiple models consuming the same features in real time and training-serving skew is a documented problem. Don't introduce the complexity early.
What's the simplest production ML setup that works?
FastAPI + Docker + cloud managed container service + MLflow + GitHub Actions + Evidently AI. This handles most use cases at startup and early scale-up stage.
How much Kubernetes do I need to know?
Junior MLOps: core concepts and basic debugging. Senior MLOps: deep networking, RBAC, Helm authoring, GPU workload management.
What do UK enterprises actually use?
Cloud-native ML platforms (SageMaker, Vertex AI, Azure ML) as the backbone, with MLflow or W&B for tracking, Kubernetes for compute, and custom feature stores at the largest companies.