Predictors¶

Make real-time predictions against SageMaker endpoints with Python objects

class sagemaker.predictor.Predictor(endpoint_name, sagemaker_session=None, serializer=<sagemaker.base_serializers.IdentitySerializer object>, deserializer=<sagemaker.base_deserializers.BytesDeserializer object>, component_name=None, **kwargs)¶

Bases: PredictorBase

Make prediction requests to an Amazon SageMaker endpoint.

Initialize a Predictor.

Behavior for serialization of input data and deserialization of result data can be configured through initializer arguments. If not specified, a sequence of bytes is expected and the API sends it in the request body without modifications. In response, the API returns the sequence of bytes from the prediction result without any modifications.

Parameters

endpoint_name (str) – Name of the Amazon SageMaker endpoint to which requests are sent.
sagemaker_session (sagemaker.session.Session) – A SageMaker Session object, used for SageMaker interactions (default: None). If not specified, one is created using the default AWS configuration chain.
serializer (BaseSerializer) – A serializer object, used to encode data for an inference endpoint (default: IdentitySerializer).
deserializer (BaseDeserializer) – A deserializer object, used to decode data from an inference endpoint (default: BytesDeserializer).
component_name (str) – Name of the Amazon SageMaker inference component corresponding the predictor.

predict(data, initial_args=None, target_model=None, target_variant=None, inference_id=None, custom_attributes=None, component_name=None)¶

Return the inference from the specified endpoint.

Parameters

data (object) – Input data for which you want the model to provide inference. If a serializer was specified when creating the Predictor, the result of the serializer is sent as input data. Otherwise the data must be sequence of bytes, and the predict method then sends the bytes in the request body as is.
initial_args (dict[str,str]) – Optional. Default arguments for boto3 invoke_endpoint call. Default is None (no default arguments).
target_model (str) – S3 model artifact path to run an inference request on, in case of a multi model endpoint. Does not apply to endpoints hosting single model (Default: None)
target_variant (str) – The name of the production variant to run an inference request on (Default: None). Note that the ProductionVariant identifies the model you want to host and the resources you want to deploy for hosting it.
inference_id (str) – If you provide a value, it is added to the captured data when you enable data capture on the endpoint (Default: None).
custom_attributes (str) –
Provides additional information about a request for an inference submitted to a model hosted at an Amazon SageMaker endpoint. The information is an opaque value that is forwarded verbatim. You could use this value, for example, to provide an ID that you can use to track a request or to provide other metadata that a service endpoint was programmed to process. The value must consist of no more than 1024 visible US-ASCII characters.

The code in your model is responsible for setting or updating any custom attributes in the response. If your code does not set this value in the response, an empty value is returned. For example, if a custom attribute represents the trace ID, your model can prepend the custom attribute with Trace ID: in your post-processing function (Default: None).
component_name (str) – Optional. Name of the Amazon SageMaker inference component corresponding the predictor.

Returns

Inference for the given input. If a deserializer was specified when creating: the Predictor, the result of the deserializer is returned. Otherwise the response returns the sequence of bytes as is.

Return type

object

predict_stream(data, initial_args=None, target_variant=None, inference_id=None, custom_attributes=None, component_name=None, target_container_hostname=None, iterator=<class 'sagemaker.iterators.ByteIterator'>)¶

Return the inference from the specified endpoint.

Parameters

data (object) – Input data for which you want the model to provide inference. If a serializer was specified when creating the Predictor, the result of the serializer is sent as input data. Otherwise the data must be sequence of bytes, and the predict method then sends the bytes in the request body as is.
initial_args (dict[str,str]) – Optional. Default arguments for boto3 invoke_endpoint_with_response_stream call. Default is None (no default arguments). (Default: None)
target_variant (str) – Optional. The name of the production variant to run an inference request on (Default: None). Note that the ProductionVariant identifies the model you want to host and the resources you want to deploy for hosting it.
inference_id (str) – Optional. If you provide a value, it is added to the captured data when you enable data capture on the endpoint (Default: None).
custom_attributes (str) –
Optional. Provides additional information about a request for an inference submitted to a model hosted at an Amazon SageMaker endpoint. The information is an opaque value that is forwarded verbatim. You could use this value, for example, to provide an ID that you can use to track a request or to provide other metadata that a service endpoint was programmed to process. The value must consist of no more than 1024 visible US-ASCII characters.

The code in your model is responsible for setting or updating any custom attributes in the response. If your code does not set this value in the response, an empty value is returned. For example, if a custom attribute represents the trace ID, your model can prepend the custom attribute with Trace ID: in your post-processing function (Default: None).
component_name (str) – Optional. Name of the Amazon SageMaker inference component corresponding the predictor. (Default: None)
target_container_hostname (str) – Optional. If the endpoint hosts multiple containers and is configured to use direct invocation, this parameter specifies the host name of the container to invoke. (Default: None).
iterator (BaseIterator) – An iterator class which provides an iterable interface to iterate Event stream response from Inference Endpoint. An object of the iterator class provided will be returned by the predict_stream method (Default:ByteIterator). Iterators defined in iterators or custom iterators (needs to inherit BaseIterator) can be specified as an input.

Returns

An iterator object which would allow iteration on EventStream response will be returned. The object would be instantiated from predict_stream method’s iterator parameter.

Return type

object (BaseIterator)

update_endpoint(initial_instance_count=None, instance_type=None, accelerator_type=None, model_name=None, tags=None, kms_key=None, data_capture_config_dict=None, max_instance_count=None, min_instance_count=None, wait=True)¶

Update the existing endpoint with the provided attributes.

This creates a new EndpointConfig in the process. If initial_instance_count, instance_type, accelerator_type, or model_name is specified, then a new ProductionVariant configuration is created; values from the existing configuration are not preserved if any of those parameters are specified.

Parameters

initial_instance_count (int) – The initial number of instances to run in the endpoint. This is required if instance_type, accelerator_type, or model_name is specified. Otherwise, the values from the existing endpoint configuration’s ProductionVariants are used.
instance_type (str) – The EC2 instance type to deploy the endpoint to. This is required if initial_instance_count or accelerator_type is specified. Otherwise, the values from the existing endpoint configuration’s ProductionVariants are used.
accelerator_type (str) – The type of Elastic Inference accelerator to attach to the endpoint, e.g. “ml.eia1.medium”. If not specified, and initial_instance_count, instance_type, and model_name are also None, the values from the existing endpoint configuration’s ProductionVariants are used. Otherwise, no Elastic Inference accelerator is attached to the endpoint.
model_name (str) – The name of the model to be associated with the endpoint. This is required if initial_instance_count, instance_type, or accelerator_type is specified and if there is more than one model associated with the endpoint. Otherwise, the existing model for the endpoint is used.
tags (list[dict[str, str]]) – The list of tags to add to the endpoint config. If not specified, the tags of the existing endpoint configuration are used. If any of the existing tags are reserved AWS ones (i.e. begin with “aws”), they are not carried over to the new endpoint configuration.
kms_key (str) – The KMS key that is used to encrypt the data on the storage volume attached to the instance hosting the endpoint If not specified, the KMS key of the existing endpoint configuration is used.
data_capture_config_dict (dict) – The endpoint data capture configuration for use with Amazon SageMaker Model Monitoring. If not specified, the data capture configuration of the existing endpoint configuration is used.
max_instance_count (int) – The maximum instance count used for scaling instance.
min_instance_count (int) – The minimum instance count used for scaling instance.

Raises

ValueError – If there is not enough information to create a new ProductionVariant: - If initial_instance_count, accelerator_type, or model_name is specified, but instance_type is None. - If initial_instance_count, instance_type, or accelerator_type is specified and either model_name is None or there are multiple models associated with the endpoint.

delete_endpoint(delete_endpoint_config=True)¶

Delete the Amazon SageMaker endpoint backing this predictor.

This also delete the endpoint configuration attached to it if delete_endpoint_config is True.

Parameters: delete_endpoint_config (bool, optional) – Flag to indicate whether to delete endpoint configuration together with endpoint. Defaults to True. If True, both endpoint and endpoint configuration will be deleted. If False, only endpoint will be deleted.

delete_predictor(wait=False)¶

Delete the Amazon SageMaker inference component or endpoint backing this predictor.

Delete the corresponding inference component if the endpoint is a inference component based endpoint. Otherwise delete the endpoint where this predictor is hosted.

Parameters: wait (bool) –
Return type: None

update_predictor(model_name=None, image_uri=None, model_data=None, env=None, model_data_download_timeout=None, container_startup_health_check_timeout=None, resources=None)¶

Updates the predictor.

You can deploy a new Model specification or apply new configurations. The SDK applies your updates by updating the inference component that’s associated with the model.

Parameters

model_name (Optional[str]) – The model name to use to update the predictor. (Default: None).
image_uri (Optional[str]) – A Docker image URI. (Default: None).
model_data (Optional[Union[str, dict]]) – Location of SageMaker model data. (Default: None).
env (Optional[dict[str, str]]) – Environment variables to run with image_uri when hosted in SageMaker. (Default: None).
model_data_download_timeout (Optional[int]) – The timeout value, in seconds, to download and extract model data from Amazon S3 to the individual inference instance associated with this production variant. (Default: None).
container_startup_health_check_timeout (Optional[int]) – The timeout value, in seconds, for your inference container to pass health check by SageMaker Hosting. For more information about health check see: https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-inference-code.html#your-algorithms-inference-algo-ping-requests (Default: None).
resources (Optional[ResourceRequirements]) – The compute resource requirements for a model to be deployed to an endpoint. Only EndpointType.INFERENCE_COMPONENT_BASED supports this feature. (Default: None).

list_related_models(variant_name_equals=None, name_contains=None, creation_time_after=None, creation_time_before=None, last_modified_time_after=None, last_modified_time_before=None, status_equals=None, sort_order=None, sort_by=None, max_results=None, next_token=None)¶

List the deployed models co-located with this predictor.

Calls SageMaker:ListInferenceComponents on the endpoint associated with: the predictor.

Parameters

variant_name_equals (str) – Optional. A string that matches the name of the variant that was assigned to the inference component. (Default: None).
name_contains (str) – Optional. A string that partially matches the names of one or more inference components. Filters inference components by name. (Default: None).
creation_time_after (datetime.datetime) – Optional. Use this parameter to search for inference components created after a specific date and time. (Default: None).
creation_time_before (datetime.datetime) – Optional. Use this parameter to search for inference components created before a specific date and time. (Default: None).
last_modified_time_after (datetime.datetime) – Optional. Use this parameter to search for inference components that were last modified after a specific date and time. (Default: None).
last_modified_time_before (datetime.datetime) – Optional. Use this parameter to search for inference components that were last modified before a specific date and time. (Default: None).
status_equals (str) – Optional. The inference component status. Filters inference components by status. (Default: None).
sort_order (str) – Optional. The order in which inference components are listed. (Default: None).
sort_order – Optional. The order in which inference components are listed in the response. (Default: None).
max_results (int) – Optional. The maximum number of results returned by list_related_models. (Default: None).
next_token (str) – Optional. A token to resume pagination of list_related_models results. (Default: None).
sort_by (Optional[str]) –

Returns

A list of Amazon SageMaker inference: component objects associated with the endpoint. If a next token is returned, there are more results available. The value of the next token is a unique pagination token.

Return type

Tuple[List[Dict[str, Any]], Optional[str]]

delete_model()¶: Delete the Amazon SageMaker model backing this predictor.

enable_data_capture()¶

Enables data capture by updating DataCaptureConfig.

This function updates the DataCaptureConfig for the Predictor’s associated Amazon SageMaker Endpoint to enable data capture. For a more customized experience, refer to update_data_capture_config, instead.

disable_data_capture()¶

Disables data capture by updating DataCaptureConfig.

This function updates the DataCaptureConfig for the Predictor’s associated Amazon SageMaker Endpoint to disable data capture. For a more customized experience, refer to update_data_capture_config, instead.

update_data_capture_config(data_capture_config)¶

Updates the DataCaptureConfig for the Predictor’s associated Amazon SageMaker Endpoint.

Update is done using the provided DataCaptureConfig.

Parameters: data_capture_config (sagemaker.model_monitor.DataCaptureConfig) – The DataCaptureConfig to update the predictor’s endpoint to use.

list_monitors()¶

Generates ModelMonitor objects (or DefaultModelMonitors).

Objects are generated based on the schedule(s) associated with the endpoint that this predictor refers to.

Returns

A list of: ModelMonitor (or DefaultModelMonitor) objects.

Return type

[sagemaker.model_monitor.model_monitoring.ModelMonitor]

endpoint_context()¶

Retrieves the lineage context object representing the endpoint.

Examples

predictor = Predictor() … context = predictor.endpoint_context() models = context.models()

Returns: The context for the endpoint.
Return type: ContextEndpoint

property content_type¶: The MIME type of the data sent to the inference endpoint.

property accept¶: The content type(s) that are expected from the inference endpoint.

property endpoint¶: Deprecated attribute. Please use endpoint_name.