Is MLflow or Weights & Biases better for experiment tracking?

Both are excellent; the right choice depends on context. MLflow is open-source, self-hostable, and has stronger model registry and deployment integration — better for organisations that want full control and no external data dependency. Weights & Biases (W&B) has a better UI, more polished collaboration features, and faster onboarding — better for teams that want a managed service and don't need to self-host. For most UK startups and scale-ups, W&B's managed version is the faster path; for enterprises with data governance requirements, MLflow is typically preferred.

Do I need a feature store for MLOps?

Only when you have multiple models consuming the same features in real time and training-serving skew is a documented problem. For most companies below a certain scale, a well-maintained data warehouse and careful data pipeline engineering is sufficient. Feature stores (Feast, Tecton) add real value at large scale but add significant complexity — don't introduce them until you genuinely need them.

How much Kubernetes do I need to know for MLOps?

For junior MLOps roles: understand core concepts (pods, deployments, services, ingress, persistent volumes) and be able to debug basic issues. For senior roles: deep Kubernetes knowledge including networking, RBAC, Helm chart authoring, and resource management for GPU workloads. The CKA certification aligns roughly with the senior requirement.

What do big UK companies actually use for MLOps?

UK enterprises with significant ML investment typically use cloud-native ML platforms (AWS SageMaker, Google Vertex AI, Azure ML) as the orchestration backbone, with MLflow or W&B for experiment tracking, Kubernetes for compute management, and custom-built feature stores at the largest companies. The specific tooling varies significantly by company — the important skill is understanding the architecture pattern, not memorising specific tool choices.

The Essential MLOps Toolchain: What You Need to Know in 2026

Q: What's the simplest production ML setup that works?

The minimum viable production ML setup: Docker-packaged model served via FastAPI, deployed to a cloud managed service (AWS ECS or GCP Cloud Run), with MLflow for experiment tracking, GitHub Actions for CI/CD, and Evidently AI for basic monitoring. This handles most use cases at startup and early scale-up stage without requiring Kubernetes or a feature store.

The MLOps toolspace has fragmented significantly since 2022. This guide cuts through the noise: what each category does, honest comparisons where choices matter, and the minimum viable stacks for startups and enterprises.

Containerisation: Non-Negotiable Foundation

Docker is non-negotiable. Every ML system runs in containers — training jobs, serving endpoints, pipeline steps. You need to be able to write production Dockerfiles, manage multi-stage builds, optimise image sizes, and understand how Docker networking works in practice.

Kubernetes is the orchestration layer. For MLOps: GPU node management, resource quotas for training jobs, auto-scaling serving deployments, and Helm chart management for ML platform components. Helm — the Kubernetes package manager — is the standard way to deploy and manage MLOps tooling in a cluster.

Pipeline Orchestration: Choosing the Right Tool

This is where many teams make the wrong choice early:

Apache Airflow — mature, widely deployed, large ecosystem. Can be overly complex for pure ML pipelines if you're not already using it for data engineering. Strong choice if your data engineering team already runs Airflow.
Prefect — better developer experience than Airflow, Python-native, easier to run locally. Increasingly popular for ML pipelines at UK scale-ups that don't want Airflow's operational overhead.
Kubeflow Pipelines — Kubernetes-native ML pipelines. Best when your compute is already on Kubernetes and you want tight integration with the broader Kubeflow ML platform. Higher setup complexity than the above.

Experiment Tracking: MLflow vs Weights & Biases

MLflow: Open-source, self-hostable, has a model registry, integrates with most ML frameworks. Better for organisations with data governance requirements or that prefer not to send experiment data to a third-party service. The model registry and deployment integrations are more mature than W&B's.

Weights & Biases (W&B): Managed service, better UI and collaboration features, faster onboarding. Better for teams that want to move quickly and don't need to self-host. Very popular at AI research labs and ML-focused startups.

Our recommendation: for most UK startups and scale-ups without data sovereignty requirements, W&B's managed tier is the faster path. For enterprises, financial services, and any context where training data confidentiality is a concern, self-hosted MLflow.

The most common MLOps toolchain mistake

Choosing tools based on what's most impressive rather than what the team actually needs. A well-run MLflow setup is worth more than a poorly-maintained Kubeflow cluster. Start simple and add complexity when you genuinely need it.

Data Versioning

DVC (Data Version Control): Works alongside Git to version datasets and models. Essential when you need to reproduce a past training run exactly. Integrates with cloud storage (S3, GCS, Azure Blob). When it's overkill: if you have a single static dataset and don't need to reproduce past experiments, DVC adds overhead without proportionate value.

Model Serving

FastAPI wrapper: For most use cases, a FastAPI endpoint serving a serialised model is sufficient and the simplest to maintain. Start here.
BentoML: Adds a ML-specific serving abstraction on top of common frameworks. Good middle ground between a raw FastAPI endpoint and a dedicated serving platform.
Triton Inference Server: NVIDIA's high-performance serving platform. Use it when you have GPU inference requirements and need to maximise throughput at scale. Overkill for most standard use cases.
Ray Serve: Good for serving pipelines that combine models with pre/post-processing logic, especially when you're already using Ray for distributed training.

Model Monitoring

Evidently AI: Open-source, Python-native, excellent for data drift and model performance monitoring. Strong recommendation for teams getting started with monitoring.
Arize AI: Managed monitoring platform with strong explainability features. Better for teams that want a managed solution with a richer UI.

What to monitor: input data distribution (catching data drift early), model output distribution (catching model drift), prediction confidence, latency, and error rates. Set up alerting when drift scores exceed threshold — don't just log, act.

Minimum Viable Stacks

For a startup or early-stage scale-up

Docker + FastAPI serving + GitHub Actions CI/CD + MLflow (or W&B) + Evidently AI for monitoring. Deploy to AWS ECS, GCP Cloud Run, or similar managed container service. No Kubernetes required at this scale.

For an enterprise

Kubernetes (EKS/GKE/AKS) + Kubeflow or Airflow for orchestration + MLflow (self-hosted) + Helm-managed deployments + Triton or BentoML for serving + custom monitoring integrating with existing observability stack (Prometheus/Grafana). Feature store (Feast or Tecton) only when training-serving skew is a documented, recurring problem.

See the full MLOps Engineer career guide

Salary tables, skills breakdown, UK companies hiring, and the full career path.

Frequently Asked Questions

Is MLflow or W&B better for experiment tracking?

Both are strong. MLflow for self-hosted/data governance requirements. W&B for faster onboarding and better collaboration UI. Most UK startups choose W&B; most enterprises choose MLflow.

Do I need a feature store?

Only when you have multiple models consuming the same features in real time and training-serving skew is a documented problem. Don't introduce the complexity early.

What's the simplest production ML setup that works?

FastAPI + Docker + cloud managed container service + MLflow + GitHub Actions + Evidently AI. This handles most use cases at startup and early scale-up stage.

How much Kubernetes do I need to know?

Junior MLOps: core concepts and basic debugging. Senior MLOps: deep networking, RBAC, Helm authoring, GPU workload management.

What do UK enterprises actually use?

Cloud-native ML platforms (SageMaker, Vertex AI, Azure ML) as the backbone, with MLflow or W&B for tracking, Kubernetes for compute, and custom feature stores at the largest companies.

The Essential MLOps Toolchain:
What You Need to Know in 2026

Containerisation: Non-Negotiable Foundation

Pipeline Orchestration: Choosing the Right Tool

Experiment Tracking: MLflow vs Weights & Biases

The most common MLOps toolchain mistake

Data Versioning

Model Serving

Model Monitoring

Minimum Viable Stacks

For a startup or early-stage scale-up

For an enterprise

See the full MLOps Engineer career guide

Frequently Asked Questions

Is MLflow or W&B better for experiment tracking?

Do I need a feature store?

What's the simplest production ML setup that works?

How much Kubernetes do I need to know?

What do UK enterprises actually use?

Get career tips delivered to your inbox

About the Author

MLOps Engineer Jobs

MLOps Engineer

Senior MLOps Engineer

ML Platform Engineer

Related Reading

MLOps Engineer Role Guide

The Essential MLOps Toolchain:What You Need to Know in 2026

Containerisation: Non-Negotiable Foundation

Pipeline Orchestration: Choosing the Right Tool

Experiment Tracking: MLflow vs Weights & Biases

The most common MLOps toolchain mistake

Data Versioning

Model Serving

Model Monitoring

Minimum Viable Stacks

For a startup or early-stage scale-up

For an enterprise

See the full MLOps Engineer career guide

Frequently Asked Questions

Is MLflow or W&B better for experiment tracking?

Do I need a feature store?

What's the simplest production ML setup that works?

How much Kubernetes do I need to know?

What do UK enterprises actually use?

Get career tips delivered to your inbox

About the Author

MLOps Engineer Jobs

MLOps Engineer

Senior MLOps Engineer

ML Platform Engineer

Related Reading

MLOps Engineer Role Guide

The Essential MLOps Toolchain:
What You Need to Know in 2026