Question 1

What are the main components of AWS SageMaker?

Accepted Answer

SageMaker has several distinct components: SageMaker Studio (a web-based IDE integrating notebooks, experiments, and the model registry), Training Jobs (managed compute for running training scripts in containers), Processing Jobs (data preprocessing and evaluation jobs), Model Registry (lifecycle management for model versions), Endpoints (real-time inference deployment with autoscaling), Batch Transform (offline batch inference), SageMaker Pipelines (ML workflow orchestration), Feature Store (centralised feature storage for training and serving), and Hyperparameter Tuning Jobs (Bayesian optimisation over hyperparameter search spaces). Most ML engineering work focuses on Training Jobs, Endpoints, and Pipelines.

Question 2

How do SageMaker Training Jobs work?

Accepted Answer

A SageMaker Training Job provisions a managed EC2 instance (e.g., ml.p3.2xlarge for GPU training), pulls a Docker container from ECR, downloads training data from S3, runs the training script, and uploads the model artifacts back to S3. You provide: the training script (or a custom container), the instance type, the data channels (S3 paths for train/validation/test data), and hyperparameters. SageMaker manages instance provisioning, CloudWatch logging, and cleanup. You are billed only for the duration of the training job. Use SageMaker's built-in containers for popular frameworks (PyTorch, TensorFlow, HuggingFace) or bring your own custom container.

Question 3

When should you use a SageMaker real-time endpoint vs batch transform?

Accepted Answer

Real-time endpoints are for synchronous, low-latency inference where a client sends a request and waits for an immediate response — typically user-facing applications (chatbots, recommendation APIs, fraud detection). You pay for the endpoint instance continuously, whether or not it is serving traffic. Use auto-scaling to add instances under load. Batch Transform is for offline, asynchronous inference over large datasets stored in S3 — generating predictions for millions of records overnight, offline model evaluation, or processing historical data. Batch Transform provisions instances, processes all data, writes outputs to S3, and terminates — no idle cost. Async inference endpoints are a middle ground: queue-based, for requests that take over 1 second, with results written to S3.

Question 4

How does SageMaker compare to GCP Vertex AI and Azure ML?

Accepted Answer

All three are managed ML platforms offering training, serving, pipeline orchestration, and feature stores. SageMaker is the most mature and feature-rich, with the broadest range of managed algorithms and deepest AWS service integration (S3, ECR, IAM, CloudWatch). It is the most widely used cloud ML platform in the UK, particularly in finance and retail. Vertex AI is generally considered to have a cleaner, more unified API and better integrated MLOps tooling. Azure ML integrates well with the Microsoft ecosystem (Azure DevOps, Active Directory) and is dominant in financial services and enterprises with existing Microsoft agreements. Knowing SageMaker is most broadly applicable for UK job market purposes, but understanding the concepts transfers across platforms.

AWS SageMaker
ML Platform Skills Guide 2026

SageMaker Components

Training Jobs in Depth

Deployment: Endpoints and Inference Options

Frequently Asked Questions

What are the main SageMaker components ML engineers use?

How do SageMaker Training Jobs work?

When should you use real-time endpoint vs batch transform?

How does SageMaker compare to GCP Vertex AI and Azure ML?

How do you reduce SageMaker training costs?

Browse MLOps and Cloud ML Jobs

Quick Facts

Key Components

Related Skills

AWS SageMakerML Platform Skills Guide 2026

SageMaker Components

Training Jobs in Depth

Deployment: Endpoints and Inference Options

Frequently Asked Questions

What are the main SageMaker components ML engineers use?

How do SageMaker Training Jobs work?

When should you use real-time endpoint vs batch transform?

How does SageMaker compare to GCP Vertex AI and Azure ML?

How do you reduce SageMaker training costs?

Browse MLOps and Cloud ML Jobs

Quick Facts

Key Components

Related Skills

AWS SageMaker
ML Platform Skills Guide 2026