Skills Guide

    Weights & Biases (W&B)
    The 2026 Skills Guide

    W&B is the preferred experiment tracking and ML observability platform at UK AI-native companies. This guide covers the core API, Sweeps for hyperparameter optimisation, Artifacts for dataset versioning, and Reports for communicating results.

    The Core W&B API

    Every W&B integration starts with wandb.init() and ends with wandb.finish(). Between them, you log metrics, parameters, and artifacts:

    • wandb.init(project="my-project", config=hyperparams, tags=["v2", "llm"]) — Initialises a new run. project groups related runs. config stores the hyperparameters for this run (dict or argparse Namespace). tags allow filtering runs in the dashboard.
    • wandb.log({"train_loss": loss, "val_accuracy": acc}, step=epoch) — Log a dict of metrics at the current step. step controls the x-axis in the W&B dashboard. Call once per training step or epoch.
    • wandb.watch(model, log="all", log_freq=100) — Automatically log model parameter gradients and weights as histograms every log_freq steps. log="all" logs both gradients and weights; log="gradients" or log="parameters" for one or the other. Useful for diagnosing vanishing/exploding gradients.
    • wandb.config — Accessible within a run to read the hyperparameter config. Supports dot notation: wandb.config.learning_rate. During a Sweep, W&B overwrites these values with the sweep's suggested configuration.

    Framework integrations: WandbCallback for Keras and HuggingFace Trainer; WandbLogger for PyTorch Lightning. These autolog all standard metrics, model checkpoints, and hyperparameters without explicit log calls.

    Sweeps: Hyperparameter Optimisation

    W&B Sweeps provide a managed hyperparameter search that coordinates multiple training runs across multiple machines. The sweep controller (run by W&B) tracks all completed runs, fits a probabilistic model, and serves next-configuration recommendations to agents.

    Sweep configuration: Define as a Python dict with method (bayes/random/grid), metric (the target metric to optimise, including goal: minimize or maximize), and parameters (the search space):

    • Continuous: {'distribution': 'log_uniform_values', 'min': 1e-5, 'max': 1e-2} for learning rate (log-uniform is appropriate for scale-varying parameters)
    • Discrete: {'values': [32, 64, 128, 256]} for batch size
    • Integer: {'distribution': 'int_uniform', 'min': 2, 'max': 16} for LoRA rank

    Early termination: Configure early_terminate with Hyperband to stop underperforming runs early, freeing compute for more promising configurations. Essential for expensive GPU training runs — can reduce total compute by 2–5× compared to running all configurations to completion.

    Parallel agents: Launch multiple agent processes (each calling wandb.agent(sweep_id, train_fn)) on different machines or GPU instances. Each agent requests a configuration from the controller, runs training, and reports results. The controller's Bayesian model improves with each completed run, directing subsequent agents to more promising regions of the search space.

    Artifacts and W&B Tables

    W&B Artifacts version and track any file used or produced by a training run. The key use cases in ML:

    • Dataset versioning: Log your training dataset as an artifact at the start of each experiment. W&B deduplicates content across versions — only changed files are uploaded. The run records which dataset version it used, creating full lineage: dataset v3 → training run abc → model checkpoint.
    • Model checkpoints: Log model weights as artifacts with type='model'. The Model Registry in W&B uses artifacts as the storage backend. Checkpoint artifacts are linked to training runs and can be promoted to the registry for deployment tracking.

    W&B Tables are interactive spreadsheets for visualising predictions, errors, and distributions. Log a Table with: table = wandb.Table(columns=["input", "prediction", "ground_truth", "confidence"]); table.add_data(text, pred, label, conf); wandb.log({"predictions": table}). In the W&B dashboard, filter, sort, and group table rows interactively — essential for error analysis: filtering by low confidence, examining mislabelled examples, or checking for systematic failure modes on specific input types.

    Frequently Asked Questions

    What is W&B and what problems does it solve?

    W&B is an MLOps platform for experiment tracking, dataset versioning, hyperparameter optimisation, and model evaluation. Solves: impossible to reproduce results or compare experiments without systematic tracking. Hosted dashboard with auto-logged metrics, hyperparameters, GPU stats, and model outputs. Preferred at AI-native companies and research teams for richer visualisation and collaboration.

    What is a W&B Sweep and how does Bayesian hyperparameter search work?

    Sweeps define a hyperparameter search space. Strategies: grid (exhaustive), random (sample from distributions), or Bayesian (builds a probabilistic model of hyperparameter → metric, suggests configurations with highest Expected Improvement). Bayesian significantly outperforms random for expensive experiments. Configure as YAML, register with wandb.sweep(), run agents with wandb.agent(sweep_id, train_fn).

    What are W&B Artifacts?

    Versioned storage for datasets, model checkpoints, preprocessing scripts, evaluation results. Content-addressed hashing with cross-version deduplication. Log: wandb.Artifact('dataset', type='dataset'), artifact.add_dir('./data'), run.log_artifact(artifact). Use: run.use_artifact('dataset:latest'), artifact.download('./data'). Full lineage tracking: dataset → model → predictions.

    How does W&B compare to MLflow for UK ML engineers?

    W&B: richer visualisation, better collaboration (shared dashboards, Reports), superior Sweeps, SaaS-only (data leaves your infrastructure). MLflow: open-source, self-hostable, strong model registry with stage transitions, deeper Databricks/AWS integration. W&B is preferred at AI-native companies; MLflow at enterprises with data residency requirements. Knowing both is a significant advantage.

    What are W&B Reports and when should you use them?

    Reports are W&B's notebook-like documents for communicating experiment findings. Embed live charts from runs and sweeps directly in a Report, with auto-updating visualisations. Used for: ML project post-mortems, sharing A/B test results with stakeholders, documenting model evaluation for model cards, and team learning documents. Reports replace manually-exported screenshots in slides and make findings reproducible (clicking a chart navigates to the underlying run).

    Browse ML Engineering Jobs

    Find roles at UK AI-native companies where W&B is part of the stack.

    Quick Facts

    Demand level
    High
    Difficulty
    Intermediate
    Time to proficiency1–3 weeks

    Key Features

    Experiment Tracking
    Sweeps
    Artifacts
    Reports
    Tables
    Model Registry
    Alerts