Serverless Inference¶
This module contains classes related to Amazon Sagemaker Serverless Inference
This module contains code related to the ServerlessInferenceConfig class.
Codes are used for configuring async inference endpoint. Use it when deploying the model to the endpoints.
-
class
sagemaker.serverless.serverless_inference_config.
ServerlessInferenceConfig
(memory_size_in_mb=2048, max_concurrency=5)¶ Bases:
object
Configuration object passed in when deploying models to Amazon SageMaker Endpoints.
This object specifies configuration related to serverless endpoint. Use this configuration when trying to create serverless endpoint and make serverless inference
Initialize a ServerlessInferenceConfig object for serverless inference configuration.
- Parameters
memory_size_in_mb (int) – Optional. The memory size of your serverless endpoint. Valid values are in 1 GB increments: 1024 MB, 2048 MB, 3072 MB, 4096 MB, 5120 MB, or 6144 MB. If no value is provided, Amazon SageMaker will choose the default value for you. (Default: 2048)
max_concurrency (int) – Optional. The maximum number of concurrent invocations your serverless endpoint can process. If no value is provided, Amazon SageMaker will choose the default value for you. (Default: 5)