Async Inference

This module contains classes related to Amazon Sagemaker Async Inference

A class for AsyncInferenceConfig

Used for configuring async inference endpoint. Use AsyncInferenceConfig when deploying the model to the async inference endpoints.

class sagemaker.async_inference.async_inference_config.AsyncInferenceConfig(output_path=None, max_concurrent_invocations_per_instance=None, kms_key_id=None, notification_config=None)

Bases: object

Configuration object passed in when deploying models to Amazon SageMaker Endpoints.

This object specifies configuration related to async endpoint. Use this configuration when trying to create async endpoint and make async inference

Initialize an AsyncInferenceConfig object for async inference configuration.

Parameters
  • output_path (str) – Optional. The Amazon S3 location that endpoints upload inference responses to. If no value is provided, Amazon SageMaker will use default Amazon S3 Async Inference output path. (Default: None)

  • max_concurrent_invocations_per_instance (int) – Optional. The maximum number of concurrent requests sent by the SageMaker client to the model container. If no value is provided, Amazon SageMaker will choose an optimal value for you. (Default: None)

  • kms_key_id (str) – Optional. The Amazon Web Services Key Management Service (Amazon Web Services KMS) key that Amazon SageMaker uses to encrypt the asynchronous inference output in Amazon S3. (Default: None)

  • notification_config (dict) – Optional. Specifies the configuration for notifications of inference results for asynchronous inference. Only one notification is generated per invocation request (Default: None): * success_topic (str): Amazon SNS topic to post a notification to when inference completes successfully. If no topic is provided, no notification is sent on success. The key in notification_config is ‘SuccessTopic’. * error_topic (str): Amazon SNS topic to post a notification to when inference fails. If no topic is provided, no notification is sent on failure. The key in notification_config is ‘ErrorTopic’.

A class for AsyncInferenceResponse

class sagemaker.async_inference.async_inference_response.AsyncInferenceResponse(predictor_async, output_path)

Bases: object

Response from Async Inference endpoint

This response object provides a method to check for an async inference result in the Amazon S3 output path specified. If result object exists in that path, get and return the result

Initialize an AsyncInferenceResponse object.

AsyncInferenceResponse can help users to get async inference result from the Amazon S3 output path

Parameters
  • predictor_async (sagemaker.predictor.AsyncPredictor) – The AsyncPredictor that return this response.

  • output_path (str) – The Amazon S3 location that endpoints upload inference responses to.

get_result(waiter_config=None)

Get async inference result in the Amazon S3 output path specified

Parameters

waiter_config (sagemaker.async_inference.waiter_config.WaiterConfig) – Configuration for the waiter. The pre-defined value for the delay between poll is 15 seconds and the default max attempts is 60

Raises

ValueError – If a wrong type of object is provided as waiter_config

Returns

Inference result in the given Amazon S3 output path. If a deserializer was

specified when creating the AsyncPredictor, the result of the deserializer is returned. Otherwise the response returns the sequence of bytes as is.

Return type

object

A class for WaiterConfig used in async inference

Use it when using async inference and wait for the result.

class sagemaker.async_inference.waiter_config.WaiterConfig(max_attempts=60, delay=15)

Bases: object

Configuration object passed in when using async inference and wait for the result.

Initialize a WaiterConfig object that provides parameters to control waiting behavior.

Parameters
  • max_attempts (int) – The maximum number of attempts to be made. If the max attempts is

  • Amazon SageMaker will raise PollingTimeoutError. (Default (exceeded,) –

  • delay (int) – The amount of time in seconds to wait between attempts. (Default: 15)