Deploy Models to SageMaker Endpoint#

Deploy your fine-tuned model to a SageMaker real-time endpoint for low-latency inference.

Deploy from an S3 Checkpoint#

Manually specify the S3 prefix when you have a raw checkpoint path (e.g., from an escrow bucket). This gives you explicit control over the inference image URI and environment variables.

from sagemaker.serve.model_builder import ModelBuilder

model_builder = ModelBuilder(
    s3_model_data_url={
        "S3DataSource": {
            "S3Uri": "s3://customer-escrow-.../checkpoints/step_10/",
            "S3DataType": "S3Prefix",
            "CompressionType": "None",
        }
    },
    image_uri="<your-inference-image-uri>",
    instance_type="ml.g5.48xlarge",
    role_arn="arn:aws:iam::123456789012:role/MySageMakerRole",
    env_vars={
        "CONTEXT_LENGTH": "8192",
        "MAX_CONCURRENCY": "4",
    },
)

model_builder.model_name = "my-finetuned-model"
model = model_builder.build()
endpoint = model_builder.deploy(endpoint_name="my-endpoint", wait=False)

When deploying Nova models, set CONTEXT_LENGTH and MAX_CONCURRENCY via env_vars to control the maximum input context window and concurrent request capacity. Values are validated at build time against per-(model, instance) tier bounds.

Deploy from a TrainingJob#

Pass a TrainingJob object directly — the SDK extracts the S3 model path automatically.

from sagemaker.core.resources import TrainingJob
from sagemaker.serve import ModelBuilder

training_job = TrainingJob.get(training_job_name="my-sft-job")

model_builder = ModelBuilder(model=training_job)
model = model_builder.build(model_name="my-finetuned-model")
endpoint = model_builder.deploy(endpoint_name="my-endpoint")

Deploy from a ModelPackage#

Pass a versioned ModelPackage from the SageMaker Model Registry for governed, production deployments.

from sagemaker.core.resources import ModelPackage
from sagemaker.serve import ModelBuilder

model_package = ModelPackage.get(
    model_package_name="arn:aws:sagemaker:us-east-1:123456789012:model-package/my-models/3"
)

model_builder = ModelBuilder(model=model_package)
model = model_builder.build(model_name="my-registered-model")
endpoint = model_builder.deploy(endpoint_name="my-endpoint")