Skills Guide

    MLflow for Experiment Tracking
    The 2026 Skills Guide

    MLflow is the most widely deployed open-source MLOps platform at UK companies. This guide covers all four components — Tracking, Projects, Models, and the Model Registry — and how they fit into a production ML pipeline.

    Why ML Experiments Need Tracking

    ML development is fundamentally experimental. Training a model involves dozens of decisions — hyperparameters, data preprocessing steps, feature engineering choices, architecture variants — each of which affects model quality in ways that are often non-obvious. Without systematic tracking, it is impossible to answer basic questions: which configuration produced the best model? What was different between the model we deployed last month and the one we're deploying now? Why did model quality drop when we changed the training data pipeline?

    MLflow solves this by providing a structured store for every experiment run: the code version, environment, input parameters, output metrics, and the trained model artifact — all linked together and searchable. This transforms ML development from ad hoc experimentation into a reproducible, auditable engineering process.

    MLflow Tracking: Core API

    MLflow Tracking organises runs into experiments (typically one experiment per model or project). Within a run:

    • mlflow.log_param(key, value) — Log a single hyperparameter (learning rate, batch size, model architecture). Parameters are fixed values that describe the run configuration; they don't change during training.
    • mlflow.log_metric(key, value, step=None) — Log a numeric metric, optionally at a specific training step. Metrics are time-series: calling log_metric with step allows tracking train_loss and val_loss curves across epochs. The MLflow UI plots these as interactive charts.
    • mlflow.log_artifact(local_path) — Log a file (confusion matrix image, feature importance plot, evaluation CSV) to the artifact store.
    • mlflow.log_dict(dictionary, artifact_file) — Log a Python dict as a JSON file. Useful for logging classification reports or nested config objects.
    • Context manager patternwith mlflow.start_run() as run: automatically ends the run even if an exception occurs. run.info.run_id gives the run ID for later reference.

    Tags are free-form string key-value pairs stored with a run. Useful for: git commit SHA, developer name, data version, environment (dev/staging/prod), and any categorical metadata that isn't a hyperparameter.

    Autologgingmlflow.sklearn.autolog(), mlflow.pytorch.autolog(), mlflow.transformers.autolog() automatically capture hyperparameters, metrics, and models from supported frameworks without explicit log calls. Enable at the start of your script before training. This is the recommended approach for standard workflows.

    The MLflow Model Registry

    The Model Registry provides a centralised store for managing the lifecycle of models across development, staging, and production environments. It addresses the operational question: how do we know which model is deployed in production, who approved it, and what training run produced it?

    Model lifecycle stages:

    • None → A model version exists in the registry but has not been formally assigned a deployment stage.
    • Staging → The model is being evaluated for production deployment. In most teams, this triggers automated integration tests, comparison against the current production model, and human review.
    • Production → The model is actively serving in production. A new version moves to Production only after review; the previous Production version is automatically archived.
    • Archived → The model is no longer in active use but its lineage is preserved for audit purposes.

    Aliases (introduced in MLflow 2.x) provide a more flexible alternative to stages: assign named aliases (e.g., champion, challenger) to specific model versions. This decouples deployment semantics from fixed stage names and supports more complex deployment patterns like A/B testing.

    Model flavors define how a model can be loaded and served. A model may have multiple flavors — e.g., a scikit-learn model has sklearn (load as an sklearn estimator) and python_function (load as a generic callable) flavors. MLflow's model serving infrastructure uses the python_function flavor by default, making it framework-agnostic.

    MLflow in Production: Server Setup and Integrations

    Remote tracking server: In production, all team members point to a shared MLflow server rather than logging locally. Deploy with: backend store (PostgreSQL for run metadata), artifact store (S3, GCS, or Azure Blob for models and artifacts), and the tracking server process. Databricks provides a fully managed MLflow service as part of its platform, which is why MLflow adoption is particularly high at Databricks customers.

    Integration with training pipelines:

    • Apache Airflow — Wrap your MLflow training job as an Airflow PythonOperator. Use mlflow.set_tracking_uri() inside the operator. The run ID can be passed between Airflow tasks via XCom for downstream artifact fetching.
    • Kubernetes Jobs — Set MLFLOW_TRACKING_URI as an environment variable in the Job spec. Use MLFLOW_S3_ENDPOINT_URL for custom S3-compatible artifact stores (MinIO on-premises).
    • GitHub Actions CI — Run model training as part of CI/CD. Compare new model metrics against the registered Production model; fail the pipeline if quality degrades below a threshold. Automate Model Registry transitions via the mlflow Python client.

    Frequently Asked Questions

    What are the four components of MLflow?

    (1) Tracking: log parameters, metrics, and artifacts from training runs into Experiments. (2) Projects: standard format for packaging reproducible ML code (MLproject YAML). (3) Models: standard packaging format with flavors (sklearn, pytorch, transformers) and model signatures. (4) Model Registry: centralised lifecycle management — register versions, transition stages (Staging → Production → Archived), add aliases and tags.

    How do you log a PyTorch model in MLflow?

    mlflow.pytorch.log_model(model, artifact_path='model', registered_model_name='my_model'). Or use mlflow.pytorch.autolog() for automatic metric/parameter/model logging. Load with mlflow.pytorch.load_model('runs:/<run_id>/model').

    What is a model signature and why does it matter?

    A model signature defines expected input/output schema (column names, dtypes, tensor shapes). It documents what the model expects, enables validation at serving time, and auto-generates API request schemas for MLflow's serving infrastructure. Infer with mlflow.models.infer_signature(X_train, predictions).

    How does MLflow compare to Weights & Biases?

    MLflow: open-source, self-hostable, strong model registry and deployment integration, widely used in enterprise/Databricks environments. W&B: commercial SaaS, richer experiment visualisation, better collaboration (dashboards, reports), superior hyperparameter sweeps. W&B is favoured at AI-native companies; MLflow at enterprises. Knowing both is advantageous.

    How do you set up a remote MLflow tracking server?

    Run mlflow server --host 0.0.0.0 --port 5000 --backend-store-uri postgresql://... --default-artifact-root s3://my-bucket/mlflow. Set MLFLOW_TRACKING_URI in client code or call mlflow.set_tracking_uri('http://server:5000'). For AWS, use SageMaker's managed MLflow or a self-hosted EC2 instance with S3 artifact store. Authentication via MLFLOW_TRACKING_USERNAME/PASSWORD or AWS IAM.

    Browse MLOps Jobs in the UK

    Find MLOps and ML engineering roles requiring MLflow and experiment tracking skills.

    Quick Facts

    Demand level
    High
    Difficulty
    Intermediate
    Time to proficiency2–4 weeks

    Key Concepts

    Experiments
    Runs
    Artifacts
    Model Registry
    Flavors
    Signatures
    Autologging
    Aliases

    Roles That Need This