Model

class sagemaker.model.Model(image_uri=None, model_data=None, role=None, predictor_cls=None, env=None, name=None, vpc_config=None, sagemaker_session=None, enable_network_isolation=None, model_kms_key=None, image_config=None, source_dir=None, code_location=None, entry_point=None, container_log_level=20, dependencies=None, git_config=None, resources=None, additional_model_data_sources=None, model_reference_arn=None)

Bases: ModelBase, InferenceRecommenderMixin

A SageMaker Model that can be deployed to an Endpoint.

Initialize an SageMaker Model.

Parameters
  • image_uri (str or PipelineVariable) – A Docker image URI.

  • model_data (str or PipelineVariable or dict) – Location of SageMaker model data (default: None).

  • role (str) – An AWS IAM role (either name or full ARN). The Amazon SageMaker training jobs and APIs that create Amazon SageMaker endpoints use this role to access training data and model artifacts. After the endpoint is created, the inference code might use the IAM role if it needs to access some AWS resources. It can be null if this is being used to create a Model to pass to a PipelineModel which has its own Role field. (default: None)

  • predictor_cls (callable[string, sagemaker.session.Session]) – A function to call to create a predictor (default: None). If not None, deploy will return the result of invoking this function on the created endpoint name.

  • env (dict[str, str] or dict[str, PipelineVariable]) – Environment variables to run with image_uri when hosted in SageMaker (default: None).

  • name (str) – The model name. If None, a default model name will be selected on each deploy.

  • vpc_config (dict[str, list[str]] or dict[str, list[PipelineVariable]]) – The VpcConfig set on the model (default: None) * ‘Subnets’ (list[str]): List of subnet ids. * ‘SecurityGroupIds’ (list[str]): List of security group ids.

  • sagemaker_session (sagemaker.session.Session) – A SageMaker Session object, used for SageMaker interactions (default: None). If not specified, one is created using the default AWS configuration chain.

  • enable_network_isolation (Boolean or PipelineVariable) – Default False. if True, enables network isolation in the endpoint, isolating the model container. No inbound or outbound network calls can be made to or from the model container.

  • model_kms_key (str) – KMS key ARN used to encrypt the repacked model archive file if the model is repacked

  • image_config (dict[str, str] or dict[str, PipelineVariable]) – Specifies whether the image of model container is pulled from ECR, or private registry in your VPC. By default it is set to pull model container image from ECR. (default: None).

  • source_dir (str) –

    The absolute, relative, or S3 URI Path to a directory with any other training source code dependencies aside from the entry point file (default: None). If source_dir is an S3 URI, it must point to a tar.gz file. Structure within this directory is preserved when training on Amazon SageMaker. If ‘git_config’ is provided, ‘source_dir’ should be a relative location to a directory in the Git repo. If the directory points to S3, no code is uploaded and the S3 location is used instead.

    Example

    With the following GitHub repo directory structure:

    >>> |----- README.md
    >>> |----- src
    >>>         |----- inference.py
    >>>         |----- test.py
    

    You can assign entry_point=’inference.py’, source_dir=’src’.

  • code_location (str) – Name of the S3 bucket where custom code is uploaded (default: None). If not specified, the default bucket created by sagemaker.session.Session is used.

  • entry_point (str) –

    The absolute or relative path to the local Python source file that should be executed as the entry point to model hosting. (Default: None). If source_dir is specified, then entry_point must point to a file located at the root of source_dir. If ‘git_config’ is provided, ‘entry_point’ should be a relative location to the Python source file in the Git repo.

    Example

    With the following GitHub repo directory structure:

    >>> |----- README.md
    >>> |----- src
    >>>         |----- inference.py
    >>>         |----- test.py
    

    You can assign entry_point=’src/inference.py’.

  • container_log_level (int or PipelineVariable) – Log level to use within the container (default: logging.INFO). Valid values are defined in the Python logging module.

  • dependencies (list[str]) –

    A list of absolute or relative paths to directories with any additional libraries that should be exported to the container (default: []). The library folders are copied to SageMaker in the same folder where the entrypoint is copied. If ‘git_config’ is provided, ‘dependencies’ should be a list of relative locations to directories with any additional libraries needed in the Git repo. If the `source_dir` points to S3, code will be uploaded and the S3 location will be used instead.

    Example

    The following call

    >>> Model(entry_point='inference.py',
    ...       dependencies=['my/libs/common', 'virtual-env'])
    

    results in the following structure inside the container:

    >>> $ ls
    
    >>> opt/ml/code
    >>>     |------ inference.py
    >>>     |------ common
    >>>     |------ virtual-env
    

    This is not supported with “local code” in Local Mode.

  • git_config (dict[str, str]) –

    Git configurations used for cloning files, including repo, branch, commit, 2FA_enabled, username, password and token. The repo field is required. All other fields are optional. repo specifies the Git repository where your training script is stored. If you don’t provide branch, the default value ‘master’ is used. If you don’t provide commit, the latest commit in the specified branch is used.

    Example

    The following config:

    >>> git_config = {'repo': 'https://github.com/aws/sagemaker-python-sdk.git',
    >>>               'branch': 'test-branch-git-config',
    >>>               'commit': '329bfcf884482002c05ff7f44f62599ebc9f445a'}
    

    results in cloning the repo specified in ‘repo’, then checking out the ‘master’ branch, and checking out the specified commit.

    2FA_enabled, username, password and token are used for authentication. For GitHub (or other Git) accounts, set 2FA_enabled to ‘True’ if two-factor authentication is enabled for the account, otherwise set it to ‘False’. If you do not provide a value for 2FA_enabled, a default value of ‘False’ is used. CodeCommit does not support two-factor authentication, so do not provide “2FA_enabled” with CodeCommit repositories.

    For GitHub and other Git repos, when SSH URLs are provided, it doesn’t matter whether 2FA is enabled or disabled. You should either have no passphrase for the SSH key pairs or have the ssh-agent configured so that you will not be prompted for the SSH passphrase when you run the ‘git clone’ command with SSH URLs. When HTTPS URLs are provided, if 2FA is disabled, then either token or username and password are be used for authentication if provided. Token is prioritized. If 2FA is enabled, only token is used for authentication if provided. If required authentication info is not provided, the SageMaker Python SDK attempts to use local credentials to authenticate. If that fails, an error message is thrown.

    For CodeCommit repos, 2FA is not supported, so 2FA_enabled should not be provided. There is no token in CodeCommit, so token should also not be provided. When repo is an SSH URL, the requirements are the same as GitHub repos. When repo is an HTTPS URL, username and password are used for authentication if they are provided. If they are not provided, the SageMaker Python SDK attempts to use either the CodeCommit credential helper or local credential storage for authentication.

  • resources (Optional[ResourceRequirements]) – The compute resource requirements for a model to be deployed to an endpoint. Only EndpointType.INFERENCE_COMPONENT_BASED supports this feature. (Default: None).

  • additional_model_data_sources (Optional[Dict[str, Any]]) – Additional location of SageMaker model data (default: None).

  • model_reference_arn (Optional [str]) – Hub Content Arn of a Model Reference type content (default: None).

add_tags(tags)

Add tags to this Model

Parameters

tags (Tags) – Tags to add.

Return type

None

remove_tag_with_key(key)

Remove a tag with the given key from the list of tags.

Parameters

key (str) – The key of the tag to remove.

Return type

None

classmethod attach(endpoint_name, inference_component_name=None, sagemaker_session=None)

Attaches a Model object to an existing SageMaker Endpoint.

Parameters
  • endpoint_name (str) –

  • inference_component_name (Optional[str]) –

Return type

Model

register(content_types=None, response_types=None, inference_instances=None, transform_instances=None, model_package_name=None, model_package_group_name=None, image_uri=None, model_metrics=None, metadata_properties=None, marketplace_cert=False, approval_status=None, description=None, drift_check_baselines=None, customer_metadata_properties=None, validation_specification=None, domain=None, task=None, sample_payload_url=None, framework=None, framework_version=None, nearest_model_name=None, data_input_configuration=None, skip_model_validation=None, source_uri=None, model_card=None, accept_eula=None, model_type=None)

Creates a model package for creating SageMaker models or listing on Marketplace.

Parameters
  • content_types (list[str] or list[PipelineVariable]) – The supported MIME types for the input data.

  • response_types (list[str] or list[PipelineVariable]) – The supported MIME types for the output data.

  • inference_instances (list[str] or list[PipelineVariable]) – A list of the instance types that are used to generate inferences in real-time (default: None).

  • transform_instances (list[str] or list[PipelineVariable]) – A list of the instance types on which a transformation job can be run or on which an endpoint can be deployed (default: None).

  • model_package_name (str or PipelineVariable) – Model Package name, exclusive to model_package_group_name, using model_package_name makes the Model Package un-versioned (default: None).

  • model_package_group_name (str or PipelineVariable) – Model Package Group name, exclusive to model_package_name, using model_package_group_name makes the Model Package versioned (default: None).

  • image_uri (str or PipelineVariable) – Inference image uri for the container. Model class’ self.image will be used if it is None (default: None).

  • model_metrics (ModelMetrics) – ModelMetrics object (default: None).

  • metadata_properties (MetadataProperties) – MetadataProperties object (default: None).

  • marketplace_cert (bool) – A boolean value indicating if the Model Package is certified for AWS Marketplace (default: False).

  • approval_status (str or PipelineVariable) – Model Approval Status, values can be “Approved”, “Rejected”, or “PendingManualApproval” (default: “PendingManualApproval”).

  • description (str) – Model Package description (default: None).

  • drift_check_baselines (DriftCheckBaselines) – DriftCheckBaselines object (default: None).

  • customer_metadata_properties (dict[str, str] or dict[str, PipelineVariable]) – A dictionary of key-value paired metadata properties (default: None).

  • domain (str or PipelineVariable) – Domain values can be “COMPUTER_VISION”, “NATURAL_LANGUAGE_PROCESSING”, “MACHINE_LEARNING” (default: None).

  • task (str or PipelineVariable) – Task values which are supported by Inference Recommender are “FILL_MASK”, “IMAGE_CLASSIFICATION”, “OBJECT_DETECTION”, “TEXT_GENERATION”, “IMAGE_SEGMENTATION”, “CLASSIFICATION”, “REGRESSION”, “OTHER” (default: None).

  • sample_payload_url (str or PipelineVariable) – The S3 path where the sample payload is stored (default: None).

  • framework (str or PipelineVariable) – Machine learning framework of the model package container image (default: None).

  • framework_version (str or PipelineVariable) – Framework version of the Model Package Container Image (default: None).

  • nearest_model_name (str or PipelineVariable) – Name of a pre-trained machine learning benchmarked by Amazon SageMaker Inference Recommender (default: None).

  • data_input_configuration (str or PipelineVariable) – Input object for the model (default: None).

  • skip_model_validation (str or PipelineVariable) – Indicates if you want to skip model validation. Values can be “All” or “None” (default: None).

  • source_uri (str or PipelineVariable) – The URI of the source for the model package (default: None).

  • model_card (ModeCard or ModelPackageModelCard) – document contains qualitative and quantitative information about a model (default: None).

  • validation_specification (Optional[Union[str, PipelineVariable]]) –

  • accept_eula (Optional[bool]) –

  • model_type (Optional[JumpStartModelType]) –

Returns

A sagemaker.model.ModelPackage instance or pipeline step arguments in case the Model instance is built with PipelineSession

create(instance_type=None, accelerator_type=None, serverless_inference_config=None, tags=None, accept_eula=None, model_reference_arn=None)

Create a SageMaker Model Entity

Parameters
  • instance_type (str) – The EC2 instance type that this Model will be used for, this is only used to determine if the image needs GPU support or not (default: None).

  • accelerator_type (str) – Type of Elastic Inference accelerator to attach to an endpoint for model loading and inference, for example, ‘ml.eia1.medium’. If not specified, no Elastic Inference accelerator will be attached to the endpoint (default: None).

  • serverless_inference_config (ServerlessInferenceConfig) – Specifies configuration related to serverless endpoint. Instance type is not provided in serverless inference. So this is used to find image URIs (default: None).

  • tags (Optional[Tags]) –

    Tags to add to the model (default: None). Example:

    tags = [{'Key': 'tagname', 'Value':'tagvalue'}]
    # Or
    tags = {'tagname', 'tagvalue'}
    

    For more information about tags, see boto3 documentation

  • accept_eula (bool) – For models that require a Model Access Config, specify True or False to indicate whether model terms of use have been accepted. The accept_eula value must be explicitly defined as True in order to accept the end-user license agreement (EULA) that some models require. (Default: None).

  • model_reference_arn (Optional[str]) –

Returns

None or pipeline step arguments in case the Model instance is built with PipelineSession

prepare_container_def(instance_type=None, accelerator_type=None, serverless_inference_config=None, accept_eula=None, model_reference_arn=None)

Return a dict created by sagemaker.container_def().

It is used for deploying this model to a specified instance type.

Subclasses can override this to provide custom container definitions for deployment to a specific instance type. Called by deploy().

Parameters
  • instance_type (str) – The EC2 instance type to deploy this Model to. For example, ‘ml.p2.xlarge’.

  • accelerator_type (str) – The Elastic Inference accelerator type to deploy to the instance for loading and making inferences to the model. For example, ‘ml.eia1.medium’.

  • serverless_inference_config (sagemaker.serverless.ServerlessInferenceConfig) – Specifies configuration related to serverless endpoint. Instance type is not provided in serverless inference. So this is used to find image URIs.

  • accept_eula (bool) – For models that require a Model Access Config, specify True or False to indicate whether model terms of use have been accepted. The accept_eula value must be explicitly defined as True in order to accept the end-user license agreement (EULA) that some models require. (Default: None).

Returns

A container definition object usable with the CreateModel API.

Return type

dict

is_repack()

Whether the source code needs to be repacked before uploading to S3.

Returns

if the source need to be repacked or not

Return type

bool

enable_network_isolation()

Whether to enable network isolation when creating this Model

Returns

If network isolation should be enabled or not.

Return type

bool

package_for_edge(output_path, model_name, model_version, role=None, job_name=None, resource_key=None, s3_kms_key=None, tags=None)

Package this Model with SageMaker Edge.

Creates a new EdgePackagingJob and wait for it to finish. model_data will now point to the packaged artifacts.

Parameters
  • output_path (str) – Specifies where to store the packaged model

  • role (str) – Execution role

  • model_name (str) – the name to attach to the model metadata

  • model_version (str) – the version to attach to the model metadata

  • job_name (str) – The name of the edge packaging job

  • resource_key (str) – the kms key to encrypt the disk with

  • s3_kms_key (str) – the kms key to encrypt the output with

  • tags (Optional[Tags]) – Tags for labeling an edge packaging job. For more, see https://docs.aws.amazon.com/sagemaker/latest/dg/API_Tag.html.

Returns

A SageMaker Model object. See Model() for full details.

Return type

sagemaker.model.Model

compile(target_instance_family, input_shape, output_path, role=None, tags=None, job_name=None, compile_max_run=900, framework=None, framework_version=None, target_platform_os=None, target_platform_arch=None, target_platform_accelerator=None, compiler_options=None)

Compile this Model with SageMaker Neo.

Parameters
Returns

A SageMaker Model object. See Model() for full details.

Return type

sagemaker.model.Model

deploy(initial_instance_count=None, instance_type=None, serializer=None, deserializer=None, accelerator_type=None, endpoint_name=None, tags=None, kms_key=None, wait=True, data_capture_config=None, async_inference_config=None, serverless_inference_config=None, volume_size=None, model_data_download_timeout=None, container_startup_health_check_timeout=None, inference_recommendation_id=None, explainer_config=None, accept_eula=None, endpoint_logging=False, resources=None, endpoint_type=EndpointType.MODEL_BASED, managed_instance_scaling=None, inference_component_name=None, routing_config=None, model_reference_arn=None, **kwargs)

Deploy this Model to an Endpoint and optionally return a Predictor.

Create a SageMaker Model and EndpointConfig, and deploy an Endpoint from this Model. If self.predictor_cls is not None, this method returns a the result of invoking self.predictor_cls on the created endpoint name.

The name of the created model is accessible in the name field of this Model after deploy returns

The name of the created endpoint is accessible in the endpoint_name field of this Model after deploy returns.

Parameters
  • initial_instance_count (int) – The initial number of instances to run in the Endpoint created from this Model. If not using serverless inference or the model has not called right_size(), then it need to be a number larger or equals to 1 (default: None).

  • instance_type (str) – The EC2 instance type to deploy this Model to. For example, ‘ml.p2.xlarge’, or ‘local’ for local mode. If not using serverless inference or the model has not called right_size(), then it is required to deploy a model. (default: None).

  • serializer (BaseSerializer) – A serializer object, used to encode data for an inference endpoint (default: None). If serializer is not None, then serializer will override the default serializer. The default serializer is set by the predictor_cls.

  • deserializer (BaseDeserializer) – A deserializer object, used to decode data from an inference endpoint (default: None). If deserializer is not None, then deserializer will override the default deserializer. The default deserializer is set by the predictor_cls.

  • accelerator_type (str) – Type of Elastic Inference accelerator to deploy this model for model loading and inference, for example, ‘ml.eia1.medium’. If not specified, no Elastic Inference accelerator will be attached to the endpoint. For more information: https://docs.aws.amazon.com/sagemaker/latest/dg/ei.html

  • endpoint_name (str) – The name of the endpoint to create (default: None). If not specified, a unique endpoint name will be created.

  • tags (Optional[Tags]) – Tags to attach to this specific endpoint.

  • kms_key (str) – The ARN of the KMS key that is used to encrypt the data on the storage volume attached to the instance hosting the endpoint.

  • wait (bool) – Whether the call should wait until the deployment of this model completes (default: True).

  • data_capture_config (sagemaker.model_monitor.DataCaptureConfig) – Specifies configuration related to Endpoint data capture for use with Amazon SageMaker Model Monitoring. (Default: None).

  • async_inference_config (sagemaker.model_monitor.AsyncInferenceConfig) – Specifies configuration related to async endpoint. Use this configuration when trying to create async endpoint and make async inference. If empty config object passed through, will use default config to deploy async endpoint. Deploy a real-time endpoint if it’s None. (default: None).

  • serverless_inference_config (sagemaker.serverless.ServerlessInferenceConfig) – Specifies configuration related to serverless endpoint. Use this configuration when trying to create serverless endpoint and make serverless inference. If empty object passed through, will use pre-defined values in ServerlessInferenceConfig class to deploy serverless endpoint. Deploy an instance based endpoint if it’s None. (default: None).

  • volume_size (int) – The size, in GB, of the ML storage volume attached to individual inference instance associated with the production variant. Currenly only Amazon EBS gp2 storage volumes are supported.

  • model_data_download_timeout (int) – The timeout value, in seconds, to download and extract model data from Amazon S3 to the individual inference instance associated with this production variant.

  • container_startup_health_check_timeout (int) – The timeout value, in seconds, for your inference container to pass health check by SageMaker Hosting. For more information about health check see: https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-inference-code.html#your-algorithms-inference-algo-ping-requests

  • inference_recommendation_id (str) – The recommendation id which specifies the recommendation you picked from inference recommendation job results and would like to deploy the model and endpoint with recommended parameters. This can also be a recommendation id returned from DescribeModel contained in a list of RealtimeInferenceRecommendations within DeploymentRecommendation

  • explainer_config (sagemaker.explainer.ExplainerConfig) – Specifies online explainability configuration for use with Amazon SageMaker Clarify. Default: None.

  • accept_eula (bool) – For models that require a Model Access Config, specify True or False to indicate whether model terms of use have been accepted. The accept_eula value must be explicitly defined as True in order to accept the end-user license agreement (EULA) that some models require. (Default: None).

  • endpoint_logging (Optiona[bool]) – If set to true, live logging will be emitted as the SageMaker Endpoint starts up. (Default: False).

  • resources (Optional[ResourceRequirements]) – The compute resource requirements for a model to be deployed to an endpoint. Only EndpointType.INFERENCE_COMPONENT_BASED supports this feature. (Default: None).

  • managed_instance_scaling (Optional[Dict]) – Managed instance scaling options, if configured Amazon SageMaker will manage the instance number behind the Endpoint. (Default: None).

  • endpoint_type (Optional[EndpointType]) – The type of an endpoint used to deploy models. (Default: EndpointType.MODEL_BASED).

  • routing_config (Optional[Dict[str, Any]) –

    Settings the control how the endpoint routes incoming traffic to the instances that the endpoint hosts. Currently, support dictionary key RoutingStrategy.

    {
        "RoutingStrategy":  sagemaker.enums.RoutingStrategy.RANDOM
    }
    

  • model_reference_arn (Optional [str]) – Hub Content Arn of a Model Reference type content (default: None).

Raises

ValueError – If arguments combination check failed in these circumstances: - If no role is specified or - If serverless inference config is not specified and instance type and instance count are also not specified or - If a wrong type of object is provided as serverless inference config or async inference config or - If inference recommendation id is specified along with incompatible parameters

Returns

Invocation of

self.predictor_cls on the created endpoint name, if self.predictor_cls is not None. Otherwise, return None.

Return type

callable[string, sagemaker.session.Session] or None

transformer(instance_count, instance_type, strategy=None, assemble_with=None, output_path=None, output_kms_key=None, accept=None, env=None, max_concurrent_transforms=None, max_payload=None, tags=None, volume_kms_key=None)

Return a Transformer that uses this Model.

Parameters
  • instance_count (int) – Number of EC2 instances to use.

  • instance_type (str) – Type of EC2 instance to use, for example, ‘ml.c4.xlarge’.

  • strategy (str) – The strategy used to decide how to batch records in a single request (default: None). Valid values: ‘MultiRecord’ and ‘SingleRecord’.

  • assemble_with (str) – How the output is assembled (default: None). Valid values: ‘Line’ or ‘None’.

  • output_path (str) – S3 location for saving the transform result. If not specified, results are stored to a default bucket.

  • output_kms_key (str) – Optional. KMS key ID for encrypting the transform output (default: None).

  • accept (str) – The accept header passed by the client to the inference endpoint. If it is supported by the endpoint, it will be the format of the batch transform output.

  • env (dict) – Environment variables to be set for use during the transform job (default: None).

  • max_concurrent_transforms (int) – The maximum number of HTTP requests to be made to each individual transform container at one time.

  • max_payload (int) – Maximum size of the payload in a single HTTP request to the container in MB.

  • tags (Optional[Tags]) – Tags for labeling a transform job. If none specified, then the tags used for the training job are used for the transform job.

  • volume_kms_key (str) – Optional. KMS key ID for encrypting the volume attached to the ML compute instance (default: None).

tune(max_tuning_duration=1800)

Tune a Model built in Mode.LOCAL_CONTAINER via ModelBuilder.

tune() is available for DJL Models using Huggingface IDs. In this use case, Tensor Parallel Degree is our tunable parameter. The tuning job first generates all admissible Tensor Parallel Degrees and then benchmarks on 10 invocations serially followed by 10 invocations concurrently. It starts first at the highest admissible Tensor Parallel Degree and then scales down until failure.

Example

Sample flow:

>>> sample_input = {
>>>                    "inputs": "sample_prompt",
>>>                    "parameters": {}
>>>                }
>>> sample_output = {
>>>                     "generated_text": "sample_text_generation"
>>>                 }
>>>
>>> builder = ModelBuilder(
>>>                        model=model,
>>>                        schema_builder=SchemaBuilder(sample_input, sample_output),
>>>                        model_path=path_to_model,
>>>                        mode=Mode.LOCAL_CONTAINER,
>>>                       )
>>>
>>> model = builder.build()
>>> tuned_model = model.tune()
>>> tuned_model.deploy()
Parameters

max_tuning_duration (int) – The time out for the Mode.LOCAL_CONTAINER tuning job. Defaults to 1800.

delete_model()

Delete an Amazon SageMaker Model.

Raises

ValueError – if the model is not created yet.

right_size(sample_payload_url=None, supported_content_types=None, supported_instance_types=None, job_name=None, framework=None, job_duration_in_seconds=None, hyperparameter_ranges=None, phases=None, traffic_type=None, max_invocations=None, model_latency_thresholds=None, max_tests=None, max_parallel_tests=None, log_level='Verbose')

Recommends an instance type for a SageMaker or BYOC model.

Create a SageMaker Model or use a registered ModelPackage, to start an Inference Recommender job.

The name of the created model is accessible in the name field of this Model after right_size returns.

Parameters
  • sample_payload_url (str) – The S3 path where the sample payload is stored.

  • supported_content_types (Optional[List[str]]) – (list[str]): The supported MIME types for the input data.

  • supported_instance_types (list[str]) – A list of the instance types that this model is expected to work on. (default: None).

  • job_name (str) – The name of the Inference Recommendations Job. (default: None).

  • framework (str) – The machine learning framework of the Image URI. Only required to specify if you bring your own custom containers (default: None).

  • job_duration_in_seconds (int) – The maximum job duration that a job can run for. (default: None).

  • hyperparameter_ranges (list[Dict[str, sagemaker.parameter.CategoricalParameter]]) –

    Specifies the hyper parameters to be used during endpoint load tests. instance_type must be specified as a hyperparameter range. env_vars can be specified as an optional hyperparameter range. (default: None). Example:

    hyperparameter_ranges = [{
        'instance_types': CategoricalParameter(['ml.c5.xlarge', 'ml.c5.2xlarge']),
        'OMP_NUM_THREADS': CategoricalParameter(['1', '2', '3', '4'])
    }]
    

  • phases (list[Phase]) – Shape of the traffic pattern to use in the load test (default: None).

  • traffic_type (str) – Specifies the traffic pattern type. Currently only supports one type ‘PHASES’ (default: None).

  • max_invocations (str) – defines the minimum invocations per minute for the endpoint to support (default: None).

  • model_latency_thresholds (list[ModelLatencyThreshold]) – defines the maximum response latency for endpoints to support (default: None).

  • max_tests (int) – restricts how many endpoints in total are allowed to be spun up for this job (default: None).

  • max_parallel_tests (int) – restricts how many concurrent endpoints this job is allowed to spin up (default: None).

  • log_level (str) – specifies the inline output when waiting for right_size to complete (default: “Verbose”).

Returns

A SageMaker Model object. See Model() for full details.

Return type

sagemaker.model.Model

class sagemaker.jumpstart.model.JumpStartModel(model_id=None, model_version=None, hub_name=None, tolerate_vulnerable_model=None, tolerate_deprecated_model=None, region=None, instance_type=None, image_uri=None, model_data=None, role=None, predictor_cls=None, env=None, name=None, vpc_config=None, sagemaker_session=None, enable_network_isolation=None, model_kms_key=None, image_config=None, source_dir=None, code_location=None, entry_point=None, container_log_level=None, dependencies=None, git_config=None, model_package_arn=None, resources=None, config_name=None, additional_model_data_sources=None)

Bases: Model

JumpStartModel class.

This class sets defaults based on the model ID and version.

Initializes a JumpStartModel.

This method sets model-specific defaults for the Model.__init__ method.

Only model ID is required to instantiate this class, however any field can be overriden.

Any field set to None does not get passed to the parent class method.

Parameters
  • model_id (Optional[str]) – JumpStart model ID to use. See https://sagemaker.readthedocs.io/en/stable/doc_utils/pretrainedmodels.html for list of model IDs.

  • model_version (Optional[str]) – Version for JumpStart model to use (Default: None).

  • hub_name (Optional[str]) – Hub name or arn where the model is stored (Default: None).

  • tolerate_vulnerable_model (Optional[bool]) – True if vulnerable versions of model specifications should be tolerated (exception not raised). If False, raises an exception if the script used by this version of the model has dependencies with known security vulnerabilities. (Default: None).

  • tolerate_deprecated_model (Optional[bool]) – True if deprecated models should be tolerated (exception not raised). False if these models should raise an exception. (Default: None).

  • region (Optional[str]) – The AWS region in which to launch the model. (Default: None).

  • instance_type (Optional[str]) – The EC2 instance type to use when provisioning a hosting endpoint. (Default: None).

  • image_uri (Optional[Union[str, PipelineVariable]]) – A Docker image URI. (Default: None).

  • model_data (Optional[Union[str, PipelineVariable, dict]]) – Location of SageMaker model data. (Default: None).

  • role (Optional[str]) – An AWS IAM role (either name or full ARN). The Amazon SageMaker training jobs and APIs that create Amazon SageMaker endpoints use this role to access training data and model artifacts. After the endpoint is created, the inference code might use the IAM role if it needs to access some AWS resources. It can be null if this is being used to create a Model to pass to a PipelineModel which has its own Role field. (Default: None).

  • predictor_cls (Optional[callable[string, sagemaker.session.Session]]) – A function to call to create a predictor (Default: None). If not None, deploy will return the result of invoking this function on the created endpoint name. (Default: None).

  • env (Optional[dict[str, str] or dict[str, PipelineVariable]]) – Environment variables to run with image_uri when hosted in SageMaker. (Default: None).

  • name (Optional[str]) – The model name. If None, a default model name will be selected on each deploy. (Default: None).

  • vpc_config (Optional[Union[dict[str, list[str]],dict[str, list[PipelineVariable]]]]) – The VpcConfig set on the model (Default: None) * ‘Subnets’ (list[str]): List of subnet ids. * ‘SecurityGroupIds’ (list[str]): List of security group ids. (Default: None).

  • sagemaker_session (Optional[sagemaker.session.Session]) – A SageMaker Session object, used for SageMaker interactions (Default: None). If not specified, one is created using the default AWS configuration chain. (Default: None).

  • enable_network_isolation (Optional[Union[bool, PipelineVariable]]) – If True, enables network isolation in the endpoint, isolating the model container. No inbound or outbound network calls can be made to or from the model container. (Default: None).

  • model_kms_key (Optional[str]) – KMS key ARN used to encrypt the repacked model archive file if the model is repacked. (Default: None).

  • image_config (Optional[Union[dict[str, str], dict[str, PipelineVariable]]]) – Specifies whether the image of model container is pulled from ECR, or private registry in your VPC. By default it is set to pull model container image from ECR. (Default: None).

  • source_dir (Optional[str]) –

    The absolute, relative, or S3 URI Path to a directory with any other training source code dependencies aside from the entry point file (Default: None). If source_dir is an S3 URI, it must point to a tar.gz file. Structure within this directory is preserved when training on Amazon SageMaker. If ‘git_config’ is provided, ‘source_dir’ should be a relative location to a directory in the Git repo. If the directory points to S3, no code is uploaded and the S3 location is used instead. (Default: None).

    Example

    With the following GitHub repo directory structure:

    >>> |----- README.md
    >>> |----- src
    >>>         |----- inference.py
    >>>         |----- test.py
    

    You can assign entry_point=’inference.py’, source_dir=’src’.

  • code_location (Optional[str]) – Name of the S3 bucket where custom code is uploaded (Default: None). If not specified, the default bucket created by sagemaker.session.Session is used. (Default: None).

  • entry_point (Optional[str]) –

    The absolute or relative path to the local Python source file that should be executed as the entry point to model hosting. (Default: None). If source_dir is specified, then entry_point must point to a file located at the root of source_dir. If ‘git_config’ is provided, ‘entry_point’ should be a relative location to the Python source file in the Git repo. (Default: None).

    Example With the following GitHub repo directory structure:

    >>> |----- README.md
    >>> |----- src
    >>>         |----- inference.py
    >>>         |----- test.py
    

    You can assign entry_point=’src/inference.py’.

  • container_log_level (Optional[Union[int, PipelineVariable]]) – Log level to use within the container. Valid values are defined in the Python logging module. (Default: None).

  • dependencies (Optional[list[str]]) –

    A list of absolute or relative paths to directories with any additional libraries that should be exported to the container (default: []). The library folders are copied to SageMaker in the same folder where the entrypoint is copied. If ‘git_config’ is provided, ‘dependencies’ should be a list of relative locations to directories with any additional libraries needed in the Git repo. If the `source_dir` points to S3, code will be uploaded and the S3 location will be used instead. This is not supported with “local code” in Local Mode. (Default: None).

    Example

    The following call

    >>> Model(entry_point='inference.py',
    ...       dependencies=['my/libs/common', 'virtual-env'])
    

    results in the following structure inside the container:

    >>> $ ls
    
    >>> opt/ml/code
    >>>     |------ inference.py
    >>>     |------ common
    >>>     |------ virtual-env
    

  • git_config (Optional[dict[str, str]]) –

    Git configurations used for cloning files, including repo, branch, commit, 2FA_enabled, username, password and token. The repo field is required. All other fields are optional. repo specifies the Git repository where your training script is stored. If you don’t provide branch, the default value ‘master’ is used. If you don’t provide commit, the latest commit in the specified branch is used.

    2FA_enabled, username, password and token are used for authentication. For GitHub (or other Git) accounts, set 2FA_enabled to ‘True’ if two-factor authentication is enabled for the account, otherwise set it to ‘False’. If you do not provide a value for 2FA_enabled, a default value of ‘False’ is used. CodeCommit does not support two-factor authentication, so do not provide “2FA_enabled” with CodeCommit repositories.

    For GitHub and other Git repos, when SSH URLs are provided, it doesn’t matter whether 2FA is enabled or disabled. You should either have no passphrase for the SSH key pairs or have the ssh-agent configured so that you will not be prompted for the SSH passphrase when you run the ‘git clone’ command with SSH URLs. When HTTPS URLs are provided, if 2FA is disabled, then either token or username and password are be used for authentication if provided. Token is prioritized. If 2FA is enabled, only token is used for authentication if provided. If required authentication info is not provided, the SageMaker Python SDK attempts to use local credentials to authenticate. If that fails, an error message is thrown.

    For CodeCommit repos, 2FA is not supported, so 2FA_enabled should not be provided. There is no token in CodeCommit, so token should also not be provided. When repo is an SSH URL, the requirements are the same as GitHub repos. When repo is an HTTPS URL, username and password are used for authentication if they are provided. If they are not provided, the SageMaker Python SDK attempts to use either the CodeCommit credential helper or local credential storage for authentication. (Default: None).

    Example

    The following config results in cloning the repo specified in ‘repo’, then checking out the ‘master’ branch, and checking out the specified commit.

    >>> git_config = {'repo': 'https://github.com/aws/sagemaker-python-sdk.git',
    >>>               'branch': 'test-branch-git-config',
    >>>               'commit': '329bfcf884482002c05ff7f44f62599ebc9f445a'}
    

  • model_package_arn (Optional[str]) – An existing SageMaker Model Package arn, can be just the name if your account owns the Model Package. model_data is not required. (Default: None).

  • resources (Optional[ResourceRequirements]) – The compute resource requirements for a model to be deployed to an endpoint. Only EndpointType.INFERENCE_COMPONENT_BASED supports this feature. (Default: None).

  • config_name (Optional[str]) – The name of the JumpStart config that can be optionally applied to the model.

  • additional_model_data_sources (Optional[Dict[str, Any]]) – Additional location of SageMaker model data (default: None).

Raises

ValueError – If the model ID is not recognized by JumpStart.

log_subscription_warning()

Log message prompting the customer to subscribe to the proprietary model.

Return type

None

retrieve_all_examples()

Returns all example payloads associated with the model.

Raises
  • NotImplementedError – If the scope is not supported.

  • ValueError – If the combination of arguments specified is not supported.

  • VulnerableJumpStartModelError – If any of the dependencies required by the script have known security vulnerabilities.

  • DeprecatedJumpStartModelError – If the version of the model is deprecated.

Return type

Optional[List[JumpStartSerializablePayload]]

retrieve_example_payload()

Returns the example payload associated with the model.

Payload can be directly used with the sagemaker.predictor.Predictor.predict(…) function.

Raises
  • NotImplementedError – If the scope is not supported.

  • ValueError – If the combination of arguments specified is not supported.

  • VulnerableJumpStartModelError – If any of the dependencies required by the script have known security vulnerabilities.

  • DeprecatedJumpStartModelError – If the version of the model is deprecated.

Return type

JumpStartSerializablePayload

set_deployment_config(config_name, instance_type)

Sets the deployment config to apply to the model.

Parameters
  • config_name (str) – The name of the deployment config to apply to the model. Call list_deployment_configs to see the list of config names.

  • instance_type (str) – The instance_type that the model will use after setting the config.

Return type

None

property deployment_config: Optional[Dict[str, Any]]

The deployment config that will be applied to This model.

Returns

Deployment config.

Return type

Optional[Dict[str, Any]]

property benchmark_metrics: DataFrame

Benchmark Metrics for deployment configs.

Returns

Pandas DataFrame object.

Return type

Benchmark Metrics

display_benchmark_metrics(**kwargs)

Display deployment configs benchmark metrics.

Return type

None

list_deployment_configs()

List deployment configs for This model.

Returns

A list of deployment configs.

Return type

List[Dict[str, Any]]

classmethod attach(endpoint_name, inference_component_name=None, model_id=None, model_version=None, sagemaker_session=<sagemaker.session.Session object>, hub_name=None)

Attaches a JumpStartModel object to an existing SageMaker Endpoint.

The model id, version (and inference component name) can be inferred from the tags.

Parameters
Return type

JumpStartModel

deploy(initial_instance_count=None, instance_type=None, serializer=None, deserializer=None, accelerator_type=None, endpoint_name=None, inference_component_name=None, tags=None, kms_key=None, wait=True, data_capture_config=None, async_inference_config=None, serverless_inference_config=None, volume_size=None, model_data_download_timeout=None, container_startup_health_check_timeout=None, inference_recommendation_id=None, explainer_config=None, accept_eula=None, endpoint_logging=False, resources=None, managed_instance_scaling=None, endpoint_type=EndpointType.MODEL_BASED, routing_config=None)

Creates endpoint by calling base Model class deploy method.

Create a SageMaker Model and EndpointConfig, and deploy an Endpoint from this Model.

Any field set to None does not get passed to the parent class method.

Parameters
  • initial_instance_count (Optional[int]) – The initial number of instances to run in the Endpoint created from this Model. If not using serverless inference or the model has not called right_size(), then it need to be a number larger or equals to 1. (Default: None)

  • instance_type (Optional[str]) – The EC2 instance type to deploy this Model to. For example, ‘ml.p2.xlarge’, or ‘local’ for local mode. If not using serverless inference or the model has not called right_size(), then it is required to deploy a model. (Default: None)

  • serializer (Optional[BaseSerializer]) – A serializer object, used to encode data for an inference endpoint (Default: None). If serializer is not None, then serializer will override the default serializer. The default serializer is set by the predictor_cls. (Default: None).

  • deserializer (Optional[BaseDeserializer]) – A deserializer object, used to decode data from an inference endpoint (Default: None). If deserializer is not None, then deserializer will override the default deserializer. The default deserializer is set by the predictor_cls. (Default: None).

  • accelerator_type (Optional[str]) – Type of Elastic Inference accelerator to deploy this model for model loading and inference, for example, ‘ml.eia1.medium’. If not specified, no Elastic Inference accelerator will be attached to the endpoint. For more information: https://docs.aws.amazon.com/sagemaker/latest/dg/ei.html (Default: None).

  • endpoint_name (Optional[str]) – The name of the endpoint to create (default: None). If not specified, a unique endpoint name will be created. (Default: None).

  • tags (Optional[Tags]) – Tags to attach to this specific endpoint. (Default: None).

  • kms_key (Optional[str]) – The ARN of the KMS key that is used to encrypt the data on the storage volume attached to the instance hosting the endpoint. (Default: None).

  • wait (Optional[bool]) – Whether the call should wait until the deployment of this model completes. (Default: True).

  • data_capture_config (Optional[sagemaker.model_monitor.DataCaptureConfig]) – Specifies configuration related to Endpoint data capture for use with Amazon SageMaker Model Monitoring. (Default: None).

  • async_inference_config (Optional[sagemaker.model_monitor.AsyncInferenceConfig]) – Specifies configuration related to async endpoint. Use this configuration when trying to create async endpoint and make async inference. If empty config object passed through, will use default config to deploy async endpoint. Deploy a real-time endpoint if it’s None. (Default: None)

  • serverless_inference_config (Optional[sagemaker.serverless.ServerlessInferenceConfig]) – Specifies configuration related to serverless endpoint. Use this configuration when trying to create serverless endpoint and make serverless inference. If empty object passed through, will use pre-defined values in ServerlessInferenceConfig class to deploy serverless endpoint. Deploy an instance based endpoint if it’s None. (Default: None)

  • volume_size (Optional[int]) – The size, in GB, of the ML storage volume attached to individual inference instance associated with the production variant. Currenly only Amazon EBS gp2 storage volumes are supported. (Default: None).

  • model_data_download_timeout (Optional[int]) – The timeout value, in seconds, to download and extract model data from Amazon S3 to the individual inference instance associated with this production variant. (Default: None).

  • container_startup_health_check_timeout (Optional[int]) – The timeout value, in seconds, for your inference container to pass health check by SageMaker Hosting. For more information about health check see: https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-inference-code.html#your-algorithms-inference-algo-ping-requests (Default: None).

  • inference_recommendation_id (Optional[str]) – The recommendation id which specifies the recommendation you picked from inference recommendation job results and would like to deploy the model and endpoint with recommended parameters. (Default: None).

  • explainer_config (Optional[sagemaker.explainer.ExplainerConfig]) – Specifies online explainability configuration for use with Amazon SageMaker Clarify. (Default: None).

  • accept_eula (bool) – For models that require a Model Access Config, specify True or False to indicate whether model terms of use have been accepted. The accept_eula value must be explicitly defined as True in order to accept the end-user license agreement (EULA) that some models require. (Default: None).

  • endpoint_logging (Optiona[bool]) – If set to true, live logging will be emitted as the SageMaker Endpoint starts up. (Default: False).

  • resources (Optional[ResourceRequirements]) – The compute resource requirements for a model to be deployed to an endpoint. Only EndpointType.INFERENCE_COMPONENT_BASED supports this feature. (Default: None).

  • managed_instance_scaling (Optional[Dict]) – Managed intance scaling options, if configured Amazon SageMaker will manage the instance number behind the endpoint.

  • endpoint_type (EndpointType) – The type of endpoint used to deploy models. (Default: EndpointType.MODEL_BASED).

  • routing_config (Optional[Dict]) – Settings the control how the endpoint routes incoming traffic to the instances that the endpoint hosts.

  • inference_component_name (Optional[str]) –

Raises

MarketplaceModelSubscriptionError – If the caller is not subscribed to the model.

Return type

PredictorBase

register(content_types=None, response_types=None, inference_instances=None, transform_instances=None, model_package_group_name=None, image_uri=None, model_metrics=None, metadata_properties=None, approval_status=None, description=None, drift_check_baselines=None, customer_metadata_properties=None, validation_specification=None, domain=None, task=None, sample_payload_url=None, framework=None, framework_version=None, nearest_model_name=None, data_input_configuration=None, skip_model_validation=None, source_uri=None, model_card=None, accept_eula=None)

Creates a model package for creating SageMaker models or listing on Marketplace.

Parameters
  • content_types (list[str] or list[PipelineVariable]) – The supported MIME types for the input data.

  • response_types (list[str] or list[PipelineVariable]) – The supported MIME types for the output data.

  • inference_instances (list[str] or list[PipelineVariable]) – A list of the instance types that are used to generate inferences in real-time (default: None).

  • transform_instances (list[str] or list[PipelineVariable]) – A list of the instance types on which a transformation job can be run or on which an endpoint can be deployed (default: None).

  • model_package_group_name (str or PipelineVariable) – Model Package Group name, exclusive to model_package_name, using model_package_group_name makes the Model Package versioned. Defaults to None.

  • image_uri (str or PipelineVariable) – Inference image URI for the container. Model class’ self.image will be used if it is None. Defaults to None.

  • model_metrics (ModelMetrics) – ModelMetrics object. Defaults to None.

  • metadata_properties (MetadataProperties) – MetadataProperties object. Defaults to None.

  • approval_status (str or PipelineVariable) – Model Approval Status, values can be “Approved”, “Rejected”, or “PendingManualApproval”. Defaults to PendingManualApproval.

  • description (str) – Model Package description. Defaults to None.

  • drift_check_baselines (DriftCheckBaselines) – DriftCheckBaselines object (default: None).

  • customer_metadata_properties (dict[str, str] or dict[str, PipelineVariable]) – A dictionary of key-value paired metadata properties (default: None).

  • domain (str or PipelineVariable) – Domain values can be “COMPUTER_VISION”, “NATURAL_LANGUAGE_PROCESSING”, “MACHINE_LEARNING” (default: None).

  • sample_payload_url (str or PipelineVariable) – The S3 path where the sample payload is stored (default: None).

  • task (str or PipelineVariable) – Task values which are supported by Inference Recommender are “FILL_MASK”, “IMAGE_CLASSIFICATION”, “OBJECT_DETECTION”, “TEXT_GENERATION”, “IMAGE_SEGMENTATION”, “CLASSIFICATION”, “REGRESSION”, “OTHER” (default: None).

  • framework (str or PipelineVariable) – Machine learning framework of the model package container image (default: None).

  • framework_version (str or PipelineVariable) – Framework version of the Model Package Container Image (default: None).

  • nearest_model_name (str or PipelineVariable) – Name of a pre-trained machine learning benchmarked by Amazon SageMaker Inference Recommender (default: None).

  • data_input_configuration (str or PipelineVariable) – Input object for the model (default: None).

  • skip_model_validation (str or PipelineVariable) – Indicates if you want to skip model validation. Values can be “All” or “None” (default: None).

  • source_uri (str or PipelineVariable) – The URI of the source for the model package (default: None).

  • model_card (ModeCard or ModelPackageModelCard) – document contains qualitative and quantitative information about a model (default: None).

  • accept_eula (bool) – For models that require a Model Access Config, specify True or False to indicate whether model terms of use have been accepted. The accept_eula value must be explicitly defined as True in order to accept the end-user license agreement (EULA) that some models require. (Default: None).

  • validation_specification (Optional[Union[str, PipelineVariable]]) –

Returns

A sagemaker.model.ModelPackage instance.

add_tags(tags)

Add tags to this Model

Parameters

tags (Tags) – Tags to add.

Return type

None

compile(target_instance_family, input_shape, output_path, role=None, tags=None, job_name=None, compile_max_run=900, framework=None, framework_version=None, target_platform_os=None, target_platform_arch=None, target_platform_accelerator=None, compiler_options=None)

Compile this Model with SageMaker Neo.

Parameters
Returns

A SageMaker Model object. See Model() for full details.

Return type

sagemaker.model.Model

create(instance_type=None, accelerator_type=None, serverless_inference_config=None, tags=None, accept_eula=None, model_reference_arn=None)

Create a SageMaker Model Entity

Parameters
  • instance_type (str) – The EC2 instance type that this Model will be used for, this is only used to determine if the image needs GPU support or not (default: None).

  • accelerator_type (str) – Type of Elastic Inference accelerator to attach to an endpoint for model loading and inference, for example, ‘ml.eia1.medium’. If not specified, no Elastic Inference accelerator will be attached to the endpoint (default: None).

  • serverless_inference_config (ServerlessInferenceConfig) – Specifies configuration related to serverless endpoint. Instance type is not provided in serverless inference. So this is used to find image URIs (default: None).

  • tags (Optional[Tags]) –

    Tags to add to the model (default: None). Example:

    tags = [{'Key': 'tagname', 'Value':'tagvalue'}]
    # Or
    tags = {'tagname', 'tagvalue'}
    

    For more information about tags, see boto3 documentation

  • accept_eula (bool) – For models that require a Model Access Config, specify True or False to indicate whether model terms of use have been accepted. The accept_eula value must be explicitly defined as True in order to accept the end-user license agreement (EULA) that some models require. (Default: None).

  • model_reference_arn (Optional[str]) –

Returns

None or pipeline step arguments in case the Model instance is built with PipelineSession

delete_model()

Delete an Amazon SageMaker Model.

Raises

ValueError – if the model is not created yet.

enable_network_isolation()

Whether to enable network isolation when creating this Model

Returns

If network isolation should be enabled or not.

Return type

bool

is_repack()

Whether the source code needs to be repacked before uploading to S3.

Returns

if the source need to be repacked or not

Return type

bool

package_for_edge(output_path, model_name, model_version, role=None, job_name=None, resource_key=None, s3_kms_key=None, tags=None)

Package this Model with SageMaker Edge.

Creates a new EdgePackagingJob and wait for it to finish. model_data will now point to the packaged artifacts.

Parameters
  • output_path (str) – Specifies where to store the packaged model

  • role (str) – Execution role

  • model_name (str) – the name to attach to the model metadata

  • model_version (str) – the version to attach to the model metadata

  • job_name (str) – The name of the edge packaging job

  • resource_key (str) – the kms key to encrypt the disk with

  • s3_kms_key (str) – the kms key to encrypt the output with

  • tags (Optional[Tags]) – Tags for labeling an edge packaging job. For more, see https://docs.aws.amazon.com/sagemaker/latest/dg/API_Tag.html.

Returns

A SageMaker Model object. See Model() for full details.

Return type

sagemaker.model.Model

prepare_container_def(instance_type=None, accelerator_type=None, serverless_inference_config=None, accept_eula=None, model_reference_arn=None)

Return a dict created by sagemaker.container_def().

It is used for deploying this model to a specified instance type.

Subclasses can override this to provide custom container definitions for deployment to a specific instance type. Called by deploy().

Parameters
  • instance_type (str) – The EC2 instance type to deploy this Model to. For example, ‘ml.p2.xlarge’.

  • accelerator_type (str) – The Elastic Inference accelerator type to deploy to the instance for loading and making inferences to the model. For example, ‘ml.eia1.medium’.

  • serverless_inference_config (sagemaker.serverless.ServerlessInferenceConfig) – Specifies configuration related to serverless endpoint. Instance type is not provided in serverless inference. So this is used to find image URIs.

  • accept_eula (bool) – For models that require a Model Access Config, specify True or False to indicate whether model terms of use have been accepted. The accept_eula value must be explicitly defined as True in order to accept the end-user license agreement (EULA) that some models require. (Default: None).

Returns

A container definition object usable with the CreateModel API.

Return type

dict

remove_tag_with_key(key)

Remove a tag with the given key from the list of tags.

Parameters

key (str) – The key of the tag to remove.

Return type

None

right_size(sample_payload_url=None, supported_content_types=None, supported_instance_types=None, job_name=None, framework=None, job_duration_in_seconds=None, hyperparameter_ranges=None, phases=None, traffic_type=None, max_invocations=None, model_latency_thresholds=None, max_tests=None, max_parallel_tests=None, log_level='Verbose')

Recommends an instance type for a SageMaker or BYOC model.

Create a SageMaker Model or use a registered ModelPackage, to start an Inference Recommender job.

The name of the created model is accessible in the name field of this Model after right_size returns.

Parameters
  • sample_payload_url (str) – The S3 path where the sample payload is stored.

  • supported_content_types (Optional[List[str]]) – (list[str]): The supported MIME types for the input data.

  • supported_instance_types (list[str]) – A list of the instance types that this model is expected to work on. (default: None).

  • job_name (str) – The name of the Inference Recommendations Job. (default: None).

  • framework (str) – The machine learning framework of the Image URI. Only required to specify if you bring your own custom containers (default: None).

  • job_duration_in_seconds (int) – The maximum job duration that a job can run for. (default: None).

  • hyperparameter_ranges (list[Dict[str, sagemaker.parameter.CategoricalParameter]]) –

    Specifies the hyper parameters to be used during endpoint load tests. instance_type must be specified as a hyperparameter range. env_vars can be specified as an optional hyperparameter range. (default: None). Example:

    hyperparameter_ranges = [{
        'instance_types': CategoricalParameter(['ml.c5.xlarge', 'ml.c5.2xlarge']),
        'OMP_NUM_THREADS': CategoricalParameter(['1', '2', '3', '4'])
    }]
    

  • phases (list[Phase]) – Shape of the traffic pattern to use in the load test (default: None).

  • traffic_type (str) – Specifies the traffic pattern type. Currently only supports one type ‘PHASES’ (default: None).

  • max_invocations (str) – defines the minimum invocations per minute for the endpoint to support (default: None).

  • model_latency_thresholds (list[ModelLatencyThreshold]) – defines the maximum response latency for endpoints to support (default: None).

  • max_tests (int) – restricts how many endpoints in total are allowed to be spun up for this job (default: None).

  • max_parallel_tests (int) – restricts how many concurrent endpoints this job is allowed to spin up (default: None).

  • log_level (str) – specifies the inline output when waiting for right_size to complete (default: “Verbose”).

Returns

A SageMaker Model object. See Model() for full details.

Return type

sagemaker.model.Model

transformer(instance_count, instance_type, strategy=None, assemble_with=None, output_path=None, output_kms_key=None, accept=None, env=None, max_concurrent_transforms=None, max_payload=None, tags=None, volume_kms_key=None)

Return a Transformer that uses this Model.

Parameters
  • instance_count (int) – Number of EC2 instances to use.

  • instance_type (str) – Type of EC2 instance to use, for example, ‘ml.c4.xlarge’.

  • strategy (str) – The strategy used to decide how to batch records in a single request (default: None). Valid values: ‘MultiRecord’ and ‘SingleRecord’.

  • assemble_with (str) – How the output is assembled (default: None). Valid values: ‘Line’ or ‘None’.

  • output_path (str) – S3 location for saving the transform result. If not specified, results are stored to a default bucket.

  • output_kms_key (str) – Optional. KMS key ID for encrypting the transform output (default: None).

  • accept (str) – The accept header passed by the client to the inference endpoint. If it is supported by the endpoint, it will be the format of the batch transform output.

  • env (dict) – Environment variables to be set for use during the transform job (default: None).

  • max_concurrent_transforms (int) – The maximum number of HTTP requests to be made to each individual transform container at one time.

  • max_payload (int) – Maximum size of the payload in a single HTTP request to the container in MB.

  • tags (Optional[Tags]) – Tags for labeling a transform job. If none specified, then the tags used for the training job are used for the transform job.

  • volume_kms_key (str) – Optional. KMS key ID for encrypting the volume attached to the ML compute instance (default: None).

tune(max_tuning_duration=1800)

Tune a Model built in Mode.LOCAL_CONTAINER via ModelBuilder.

tune() is available for DJL Models using Huggingface IDs. In this use case, Tensor Parallel Degree is our tunable parameter. The tuning job first generates all admissible Tensor Parallel Degrees and then benchmarks on 10 invocations serially followed by 10 invocations concurrently. It starts first at the highest admissible Tensor Parallel Degree and then scales down until failure.

Example

Sample flow:

>>> sample_input = {
>>>                    "inputs": "sample_prompt",
>>>                    "parameters": {}
>>>                }
>>> sample_output = {
>>>                     "generated_text": "sample_text_generation"
>>>                 }
>>>
>>> builder = ModelBuilder(
>>>                        model=model,
>>>                        schema_builder=SchemaBuilder(sample_input, sample_output),
>>>                        model_path=path_to_model,
>>>                        mode=Mode.LOCAL_CONTAINER,
>>>                       )
>>>
>>> model = builder.build()
>>> tuned_model = model.tune()
>>> tuned_model.deploy()
Parameters

max_tuning_duration (int) – The time out for the Mode.LOCAL_CONTAINER tuning job. Defaults to 1800.

class sagemaker.model.FrameworkModel(model_data, image_uri, role=None, entry_point=None, source_dir=None, predictor_cls=None, env=None, name=None, container_log_level=20, code_location=None, sagemaker_session=None, dependencies=None, git_config=None, **kwargs)

Bases: Model

A Model for working with an SageMaker Framework.

This class hosts user-defined code in S3 and sets code location and configuration in model environment variables.

Initialize a FrameworkModel.

Parameters
  • model_data (str or PipelineVariable or dict) – The S3 location of SageMaker model data.

  • image_uri (str or PipelineVariable) – A Docker image URI.

  • role (str) – An IAM role name or ARN for SageMaker to access AWS resources on your behalf.

  • entry_point (str) –

    Path (absolute or relative) to the Python source file which should be executed as the entry point to model hosting. If source_dir is specified, then entry_point must point to a file located at the root of source_dir. If ‘git_config’ is provided, ‘entry_point’ should be a relative location to the Python source file in the Git repo.

    Example

    With the following GitHub repo directory structure:

    >>> |----- README.md
    >>> |----- src
    >>>         |----- inference.py
    >>>         |----- test.py
    

    You can assign entry_point=’src/inference.py’.

  • source_dir (str) –

    Path (absolute, relative or an S3 URI) to a directory with any other training source code dependencies aside from the entry point file (default: None). If source_dir is an S3 URI, it must point to a tar.gz file. Structure within this directory are preserved when training on Amazon SageMaker. If ‘git_config’ is provided, ‘source_dir’ should be a relative location to a directory in the Git repo. If the directory points to S3, no code will be uploaded and the S3 location will be used instead.

    Example

    With the following GitHub repo directory structure:

    >>> |----- README.md
    >>> |----- src
    >>>         |----- inference.py
    >>>         |----- test.py
    

    You can assign entry_point=’inference.py’, source_dir=’src’.

  • predictor_cls (callable[string, sagemaker.session.Session]) – A function to call to create a predictor (default: None). If not None, deploy will return the result of invoking this function on the created endpoint name.

  • env (dict[str, str] or dict[str, PipelineVariable]) – Environment variables to run with image_uri when hosted in SageMaker (default: None).

  • name (str) – The model name. If None, a default model name will be selected on each deploy.

  • container_log_level (int or PipelineVariable) – Log level to use within the container (default: logging.INFO). Valid values are defined in the Python logging module.

  • code_location (str) – Name of the S3 bucket where custom code is uploaded (default: None). If not specified, default bucket created by sagemaker.session.Session is used.

  • sagemaker_session (sagemaker.session.Session) – A SageMaker Session object, used for SageMaker interactions (default: None). If not specified, one is created using the default AWS configuration chain.

  • dependencies (list[str]) –

    A list of paths to directories (absolute or relative) with any additional libraries that will be exported to the container (default: []). The library folders will be copied to SageMaker in the same folder where the entrypoint is copied. If ‘git_config’ is provided, ‘dependencies’ should be a list of relative locations to directories with any additional libraries needed in the Git repo. If the `source_dir` points to S3, code will be uploaded and the S3 location will be used instead.

    Example

    The following call

    >>> Model(entry_point='inference.py',
    ...       dependencies=['my/libs/common', 'virtual-env'])
    

    results in the following inside the container:

    >>> $ ls
    
    >>> opt/ml/code
    >>>     |------ inference.py
    >>>     |------ common
    >>>     |------ virtual-env
    

    This is not supported with “local code” in Local Mode.

  • git_config (dict[str, str]) –

    Git configurations used for cloning files, including repo, branch, commit, 2FA_enabled, username, password and token. The repo field is required. All other fields are optional. repo specifies the Git repository where your training script is stored. If you don’t provide branch, the default value ‘master’ is used. If you don’t provide commit, the latest commit in the specified branch is used.

    Example

    The following config:

    >>> git_config = {'repo': 'https://github.com/aws/sagemaker-python-sdk.git',
    >>>               'branch': 'test-branch-git-config',
    >>>               'commit': '329bfcf884482002c05ff7f44f62599ebc9f445a'}
    

    results in cloning the repo specified in ‘repo’, then checkout the ‘master’ branch, and checkout the specified commit.

    2FA_enabled, username, password and token are used for authentication. For GitHub (or other Git) accounts, set 2FA_enabled to ‘True’ if two-factor authentication is enabled for the account, otherwise set it to ‘False’. If you do not provide a value for 2FA_enabled, a default value of ‘False’ is used. CodeCommit does not support two-factor authentication, so do not provide “2FA_enabled” with CodeCommit repositories.

    For GitHub and other Git repos, when SSH URLs are provided, it doesn’t matter whether 2FA is enabled or disabled; you should either have no passphrase for the SSH key pairs, or have the ssh-agent configured so that you will not be prompted for SSH passphrase when you do ‘git clone’ command with SSH URLs. When HTTPS URLs are provided: if 2FA is disabled, then either token or username+password will be used for authentication if provided (token prioritized); if 2FA is enabled, only token will be used for authentication if provided. If required authentication info is not provided, python SDK will try to use local credentials storage to authenticate. If that fails either, an error message will be thrown.

    For CodeCommit repos, 2FA is not supported, so ‘2FA_enabled’ should not be provided. There is no token in CodeCommit, so ‘token’ should not be provided too. When ‘repo’ is an SSH URL, the requirements are the same as GitHub-like repos. When ‘repo’ is an HTTPS URL, username+password will be used for authentication if they are provided; otherwise, python SDK will try to use either CodeCommit credential helper or local credential storage for authentication.

  • **kwargs – Keyword arguments passed to the superclass Model.

Tip

You can find additional parameters for initializing this class at Model.

is_repack()

Whether the source code needs to be repacked before uploading to S3.

Returns

if the source need to be repacked or not

Return type

bool

class sagemaker.model.ModelPackage(role=None, model_data=None, algorithm_arn=None, model_package_arn=None, **kwargs)

Bases: Model

A SageMaker Model that can be deployed to an Endpoint.

Initialize a SageMaker ModelPackage.

Parameters
  • role (str) – An AWS IAM role (either name or full ARN). The Amazon SageMaker training jobs and APIs that create Amazon SageMaker endpoints use this role to access training data and model artifacts. After the endpoint is created, the inference code might use the IAM role, if it needs to access an AWS resource.

  • model_data (str or dict[str, Any]) – The S3 location of a SageMaker model data .tar.gz file or a dictionary representing a ModelDataSource object. Must be provided if algorithm_arn is provided.

  • algorithm_arn (str) – algorithm arn used to train the model, can be just the name if your account owns the algorithm. Must also provide model_data.

  • model_package_arn (str) – An existing SageMaker Model Package arn, can be just the name if your account owns the Model Package. model_data is not required.

  • **kwargs – Additional kwargs passed to the Model constructor.

enable_network_isolation()

Whether to enable network isolation when creating a model out of this ModelPackage

Returns

If network isolation should be enabled or not.

Return type

bool

update_approval_status(approval_status, approval_description=None)

Update the approval status for the model package

Parameters
  • approval_status (str) – Model Approval Status, values can be “Approved”, “Rejected”, or “PendingManualApproval”.

  • approval_description (str) – Optional. Description for the approval status of the model (default: None).

update_customer_metadata(customer_metadata_properties)

Updating customer metadata properties for the model package

Parameters

customer_metadata_properties (dict[str, str]) – A dictionary of key-value paired metadata properties (default: None).

update_inference_specification(containers=None, image_uris=None, content_types=None, response_types=None, inference_instances=None, transform_instances=None)

Inference specification to be set for the model package

Parameters
  • containers (dict) – The Amazon ECR registry path of the Docker image that contains the inference code.

  • image_uris (List[str]) – The ECR path where inference code is stored.

  • content_types (list[str]) – The supported MIME types for the input data.

  • response_types (list[str]) – The supported MIME types for the output data.

  • inference_instances (list[str]) – A list of the instance types that are used to generate inferences in real-time (default: None).

  • transform_instances (list[str]) – A list of the instance types on which a transformation job can be run or on which an endpoint can be deployed (default: None).

update_source_uri(source_uri)

Source uri to be set for the model package

Parameters

source_uri (str) – The URI of the source for the model package.

remove_customer_metadata_properties(customer_metadata_properties_to_remove)

Removes the specified keys from customer metadata properties

Parameters
  • customer_metadata_properties (list[str, str]) – list of keys of customer metadata properties to remove.

  • customer_metadata_properties_to_remove (List[str]) –

add_inference_specification(name, containers=None, image_uris=None, description=None, content_types=None, response_types=None, inference_instances=None, transform_instances=None)

Additional inference specification to be added for the model package

Parameters
  • name (str) – Name to identify the additional inference specification

  • containers (dict) – The Amazon ECR registry path of the Docker image that contains the inference code.

  • image_uris (List[str]) – The ECR path where inference code is stored.

  • description (str) – Description for the additional inference specification

  • content_types (list[str]) – The supported MIME types for the input data.

  • response_types (list[str]) – The supported MIME types for the output data.

  • inference_instances (list[str]) – A list of the instance types that are used to generate inferences in real-time (default: None).

  • transform_instances (list[str]) – A list of the instance types on which a transformation job can be run or on which an endpoint can be deployed (default: None).

update_model_card(model_card)

Updates Created model card content which created with model package

Parameters

model_card (ModelCard | ModelPackageModelCard) – Updated Model Card content