Question 1

How would you design a CI/CD pipeline for a machine learning model?

Accepted Answer

An ML CI/CD pipeline differs from software CI/CD because it must version data and models, not just code. Key stages: (1) Trigger — code push, data update, or scheduled retraining. (2) Data validation — Great Expectations or Pandera checks that the training data meets schema and quality expectations. (3) Model training — reproducible training run with logged hyperparameters (MLflow). (4) Model evaluation — compare against the current production model on a holdout set; only promote if the challenger meets defined thresholds. (5) Model registration — push artefact to model registry with metadata. (6) Deployment — canary or shadow deploy to a staging environment, run integration tests, promote to production. Use tools like GitHub Actions, Kubeflow Pipelines, or Metaflow. Everything must be reproducible from a commit hash.

Question 2

What is model drift and how do you detect and respond to it?

Accepted Answer

Two types: data drift (the distribution of input features changes) and concept drift (the relationship between inputs and the target changes). Detection: (1) Data drift — monitor statistical properties of incoming features (mean, variance, Kullback-Leibler divergence, Population Stability Index) against training distribution. (2) Concept drift — monitor model output distribution and business metrics (conversion rate, prediction accuracy on a sample). Tools: Evidently AI, WhyLabs, Arize. Response: (1) Alert and investigate — is it genuine drift or a data pipeline issue? (2) Trigger retraining with recent data. (3) Review whether the feature set needs updating. Set up automated retraining pipelines but require human approval before production promotion.

Question 3

Explain how Kubernetes is used in ML infrastructure. What are pods, deployments, and services?

Accepted Answer

Kubernetes is the standard orchestration layer for scalable ML inference. A Pod is the smallest deployable unit — one or more containers that share networking and storage. A Deployment manages a set of identical pods, handles rolling updates, and maintains a desired replica count. A Service provides a stable endpoint (IP/DNS) for reaching pods, abstracting away pod churn. In ML: model servers run as Deployments (e.g. vLLM, TF Serving), exposed via a Service. For training jobs, use Kubernetes Jobs (run-to-completion) or specialised operators (Kubeflow Training Operator for distributed training). GPU resources are requested via resource limits on pods. Horizontal Pod Autoscalers scale inference replicas based on CPU/GPU utilisation or custom metrics like request queue depth.

Question 4

What is a feature store and why is it valuable in a production ML system?

Accepted Answer

A feature store is a centralised system for storing, sharing, and serving ML features. It solves the training-serving skew problem: ensuring the features used during training are identical to those computed at serving time. Components: (1) Offline store (e.g. S3, BigQuery) for historical feature values used in training. (2) Online store (e.g. Redis, DynamoDB) for low-latency feature serving at inference time. (3) Feature transformation logic, versioned and shared across teams. Value: prevents duplicate feature computation across teams, enables feature reuse, and provides a single source of truth. Popular options: Feast (open-source), Tecton, Hopsworks. Most valuable at organisations with multiple ML teams building related products.

Question 5

How do you ensure ML experiment reproducibility?

Accepted Answer

Reproducibility requires versioning all inputs and capturing all configuration. Checklist: (1) Code — git commit hash logged with every experiment. (2) Data — immutable dataset versions, stored with checksums (DVC, Delta Lake, or raw S3 URIs). (3) Hyperparameters — all parameters logged (MLflow, W&B, Comet). (4) Environment — Docker image with pinned dependencies or a requirements.txt with hashes. (5) Random seeds — set for numpy, Python random, and deep learning frameworks. (6) Hardware — note GPU type if results are hardware-sensitive. Experiments that can't be reproduced can't be debugged, compared, or audited — this is a compliance requirement in regulated industries.

MLOps Engineer Interview Questions UK
Technical & Behavioural Guide 2026

The Interview Process

Stage 1: Recruiter screen (30 min)

Stage 2: Infrastructure coding (45–60 min)

Stage 3: ML system design (45–60 min)

Stage 4: Technical deep-dive (45 min)

Stage 5: Behavioural (45 min)

Technical Questions

Q1. How would you design a CI/CD pipeline for a machine learning model?

Q2. What is model drift and how do you detect and respond to it?

Q3. Explain how Kubernetes is used in ML infrastructure. What are pods, deployments, and services?

Q4. What is a feature store and why is it valuable in a production ML system?

Q5. How do you ensure ML experiment reproducibility?

Q6. How would you implement shadow deployment for a new ML model?

Q7. What is the difference between model monitoring and data monitoring in MLOps?

Q8. How do you manage model versioning and rollback in production?

Q9. What is infrastructure as code (IaC) and why does it matter in MLOps?

Q10. How do you approach cost management for a large-scale ML training and inference platform?

Behavioural Questions

Tell me about an ML system you built that had a production incident. What happened and what did you change afterwards?

How have you worked with ML scientists or data scientists who had less infrastructure experience?

Describe a time you had to significantly refactor an ML pipeline. What drove the decision and how did you manage the risk?

How do you balance standardisation (one platform, one framework) with the needs of different ML teams?

Walk me through how you set up observability for an ML system from scratch.

Red Flags to Watch For

No model registry or versioning

Training and serving pipelines disconnected

No alerting on model quality

Infrastructure managed through the console

No staging environment for ML

Preparation Resources

Ready to apply?

MLOps Engineer Interview Questions UKTechnical & Behavioural Guide 2026