AutoMLV2

A class for SageMaker AutoML V2 Jobs.

class sagemaker.automl.automlv2.AutoMLTabularConfig(target_attribute_name, algorithms_config=None, feature_specification_s3_uri=None, generate_candidate_definitions_only=None, mode=None, problem_type=None, sample_weight_attribute_name=None, max_candidates=None, max_runtime_per_training_job_in_seconds=None, max_total_job_runtime_in_seconds=None)

Bases: object

Configuration of a tabular problem.

Parameters
  • target_attribute_name (str) – The name of the column in the tabular dataset that contains the values to be predicted.

  • algorithms_config (list(str)) – The selection of algorithms run on a dataset to train the model candidates of an Autopilot job.

  • feature_specification_s3_uri (str) – A URL to the Amazon S3 data source containing selected features and specified data types from the input data source of an AutoML job.

  • generate_candidate_definitions_only (bool) – Whether to generates possible candidates without training the models.

  • mode (str) – The method that AutoML job uses to train the model. Valid values: AUTO or ENSEMBLING or HYPERPARAMETER_TUNING.

  • problem_type (str) – Defines the type of supervised learning available for the candidates. Available problem types are: BinaryClassification, MulticlassClassification and Regression.

  • sample_weight_attribute_name (str) – The name of dataset column representing sample weights.

  • max_candidates (int) – The maximum number of training jobs allowed to run.

  • max_runtime_per_training_job_in_seconds (int) – The maximum time, in seconds, that each training job executed inside hyperparameter tuning is allowed to run as part of a hyperparameter tuning job.

  • max_total_job_runtime_in_seconds (int) – The total wait time of an AutoML job.

target_attribute_name: str
algorithms_config: Optional[List[str]] = None
feature_specification_s3_uri: Optional[str] = None
generate_candidate_definitions_only: Optional[bool] = None
mode: Optional[str] = None
problem_type: Optional[str] = None
sample_weight_attribute_name: Optional[str] = None
max_candidates: Optional[int] = None
max_runtime_per_training_job_in_seconds: Optional[int] = None
max_total_job_runtime_in_seconds: Optional[int] = None
classmethod from_response_dict(api_problem_type_config)

Convert the API response to the native object.

Parameters

api_problem_type_config (dict) –

to_request_dict()

Convert the native object to the API request format.

class sagemaker.automl.automlv2.AutoMLImageClassificationConfig(max_candidates=None, max_runtime_per_training_job_in_seconds=None, max_total_job_runtime_in_seconds=None)

Bases: object

Configuration of an image classification problem.

Parameters
  • max_candidates (int) – The maximum number of training jobs allowed to run.

  • max_runtime_per_training_job_in_seconds (int) – The maximum time, in seconds, that each training job executed inside hyperparameter tuning is allowed to run as part of a hyperparameter tuning job.

  • max_total_job_runtime_in_seconds (int) – The total wait time of an AutoML job.

max_candidates: Optional[int] = None
max_runtime_per_training_job_in_seconds: Optional[int] = None
max_total_job_runtime_in_seconds: Optional[int] = None
classmethod from_response_dict(api_problem_type_config)

Convert the API response to the native object.

Parameters

api_problem_type_config (dict) –

to_request_dict()

Convert the native object to the API request format.

class sagemaker.automl.automlv2.AutoMLTextClassificationConfig(content_column, target_label_column, max_candidates=None, max_runtime_per_training_job_in_seconds=None, max_total_job_runtime_in_seconds=None)

Bases: object

Configuration of a text classification problem.

Parameters
  • content_column (str) – The name of the column used to provide the text to be classified. It should not be the same as the target label column.

  • target_label_column (str) – The name of the column used to provide the class labels. It should not be same as the content column.

  • max_candidates (int) – The maximum number of training jobs allowed to run.

  • max_runtime_per_training_job_in_seconds (int) – The maximum time, in seconds, that each training job executed inside hyperparameter tuning is allowed to run as part of a hyperparameter tuning job.

  • max_total_job_runtime_in_seconds (int) – The total wait time of an AutoML job.

content_column: str
target_label_column: str
max_candidates: Optional[int] = None
max_runtime_per_training_job_in_seconds: Optional[int] = None
max_total_job_runtime_in_seconds: Optional[int] = None
classmethod from_response_dict(api_problem_type_config)

Convert the API response to the native object.

Parameters

api_problem_type_config (dict) –

to_request_dict()

Convert the native object to the API request format.

class sagemaker.automl.automlv2.AutoMLTextGenerationConfig(base_model_name=None, accept_eula=None, text_generation_hyper_params=None, max_candidates=None, max_runtime_per_training_job_in_seconds=None, max_total_job_runtime_in_seconds=None)

Bases: object

Configuration of a text generation problem.

Parameters
  • base_model_name (str) – The name of the base model to fine-tune. Autopilot supports fine-tuning a variety of large language models. For information on the list of supported models, see Text generation models supporting fine-tuning in Autopilot: https://docs.aws.amazon.com/sagemaker/latest/dg/autopilot-llms-finetuning-models.html#autopilot-llms-finetuning-supported-llms. If no BaseModelName is provided, the default model used is Falcon7BInstruct.

  • accept_eula (bool) – Specifies agreement to the model end-user license agreement (EULA). The AcceptEula value must be explicitly defined as True in order to accept the EULA that this model requires. For example, LLAMA2 requires to accept EULA. You are responsible for reviewing and complying with any applicable license terms and making sure they are acceptable for your use case before downloading or using a model.

  • text_generation_hyper_params (dict) –

    The hyperparameters used to configure and optimize the learning process of the base model. You can set any combination of the following hyperparameters for all base models. Supported parameters are:

    • epochCount: The number of times the model goes through the entire training dataset.

    • batchSize: The number of data samples used in each iteration of training.

    • learningRate: The step size at which a model’s parameters are updated during training.

    • learningRateWarmupSteps: The number of training steps during which the learning rate

      gradually increases before reaching its target or maximum value.

  • max_candidates (int) – The maximum number of training jobs allowed to run.

  • max_runtime_per_training_job_in_seconds (int) – The maximum time, in seconds, that each training job executed inside hyperparameter tuning is allowed to run as part of a hyperparameter tuning job.

  • max_total_job_runtime_in_seconds (int) – The total wait time of an AutoML job.

base_model_name: Optional[str] = None
accept_eula: Optional[bool] = None
text_generation_hyper_params: Optional[Dict[str, str]] = None
max_candidates: Optional[int] = None
max_runtime_per_training_job_in_seconds: Optional[int] = None
max_total_job_runtime_in_seconds: Optional[int] = None
classmethod from_response_dict(api_problem_type_config)

Convert the API response to the native object.

Parameters

api_problem_type_config (dict) –

to_request_dict()

Convert the native object to the API request format.

class sagemaker.automl.automlv2.AutoMLTimeSeriesForecastingConfig(forecast_frequency, forecast_horizon, item_identifier_attribute_name, target_attribute_name, timestamp_attribute_name, grouping_attribute_names=None, feature_specification_s3_uri=None, forecast_quantiles=None, holiday_config=None, aggregation=None, filling=None, max_candidates=None, max_runtime_per_training_job_in_seconds=None, max_total_job_runtime_in_seconds=None)

Bases: object

Configuration of a time series forecasting problem.

Parameters
  • forecast_frequency (str) – The frequency of predictions in a forecast. Valid intervals are an integer followed by Y (Year), M (Month), W (Week), D (Day), H (Hour), and min (Minute). For example, 1D indicates every day and 15min indicates every 15 minutes. The value of a frequency must not overlap with the next larger frequency. For example, you must use a frequency of 1H instead of 60min.

  • forecast_horizon (int) – The number of time-steps that the model predicts. The forecast horizon is also called the prediction length. The maximum forecast horizon is the lesser of 500 time-steps or 1/4 of the time-steps in the dataset.

  • item_identifier_attribute_name (str) – The name of the column that represents the set of item identifiers for which you want to predict the target value.

  • target_attribute_name (str) – The name of the column representing the target variable that you want to predict for each item in your dataset. The data type of the target variable must be numerical.

  • timestamp_attribute_name (str) – The name of the column indicating a point in time at which the target value of a given item is recorded.

  • grouping_attribute_names (list(str)) – A set of columns names that can be grouped with the item identifier column to create a composite key for which a target value is predicted.

  • feature_specification_s3_uri (str) – A URL to the Amazon S3 data source containing selected features and specified data types from the input data source of an AutoML job.

  • forecast_quantiles (list(str)) – The quantiles used to train the model for forecasts at a specified quantile. You can specify quantiles from 0.01 (p1) to 0.99 (p99), by increments of 0.01 or higher. Up to five forecast quantiles can be specified. When ForecastQuantiles is not provided, the AutoML job uses the quantiles p10, p50, and p90 as default.

  • holiday_config (list(str)) – The country code for the holiday calendar. For the list of public holiday calendars supported by AutoML job V2, see Country Codes: https://docs.aws.amazon.com/sagemaker/latest/dg/autopilot-timeseries-forecasting-holiday-calendars.html#holiday-country-codes. Use the country code corresponding to the country of your choice.

  • aggregation (dict) – A key value pair defining the aggregation method for a column, where the key is the column name and the value is the aggregation method. Aggregation is only supported for the target column. The supported aggregation methods are sum (default), avg, first, min, max.

  • filling (dict) –

    A key value pair defining the filling method for a column, where the key is the column name and the value is an object which defines the filling logic. You can specify multiple filling methods for a single column. The supported filling methods and their corresponding options are:

    • frontfill: none (Supported only for target column)

    • middlefill: zero, value, median, mean, min, max

    • backfill: zero, value, median, mean, min, max

    • futurefill: zero, value, median, mean, min, max

    To set a filling method to a specific value, set the fill parameter to the chosen filling method value (for example “backfill” : “value”), and define the filling value in an additional parameter prefixed with “_value”. For example, to set backfill to a value of 2, you must include two parameters: “backfill”: “value” and “backfill_value”:”2”.

  • max_candidates (int) – The maximum number of training jobs allowed to run.

  • max_runtime_per_training_job_in_seconds (int) – The maximum time, in seconds, that each training job executed inside hyperparameter tuning is allowed to run as part of a hyperparameter tuning job.

  • max_total_job_runtime_in_seconds (int) – The total wait time of an AutoML job.

forecast_frequency: str
forecast_horizon: int
item_identifier_attribute_name: str
target_attribute_name: str
timestamp_attribute_name: str
grouping_attribute_names: Optional[List[str]] = None
feature_specification_s3_uri: Optional[str] = None
forecast_quantiles: Optional[List[str]] = None
holiday_config: Optional[List[str]] = None
aggregation: Optional[Dict[str, str]] = None
filling: Optional[Dict[str, str]] = None
max_candidates: Optional[int] = None
max_runtime_per_training_job_in_seconds: Optional[int] = None
max_total_job_runtime_in_seconds: Optional[int] = None
classmethod from_response_dict(api_problem_type_config)

Convert the API response to the native object.

Parameters

api_problem_type_config (dict) –

to_request_dict()

Convert the native object to the API request format.

class sagemaker.automl.automlv2.AutoMLDataChannel(s3_data_type, s3_uri, channel_type=None, compression_type=None, content_type=None)

Bases: object

Class to represnt the datasource which will be used for mode training.

Parameters
  • s3_data_type (str) – The data type for S3 data source. Valid values: ManifestFile, AugmentedManifestFile or S3Prefix.

  • s3_uri (str) – The URL to the Amazon S3 data source. The Uri refers to the Amazon S3 prefix or ManifestFile depending on the data type.

  • channel_type (str) – The type of channel. Valid values: training or validation. Defines whether the data are used for training or validation. The default value is training. Channels for training and validation must share the same content_type.

  • compression_type (str) – The compression type for input data. Gzip or None.

  • content_type (str) – The content type of the data from the input source.

s3_data_type: str
s3_uri: str
channel_type: Optional[str] = None
compression_type: Optional[str] = None
content_type: Optional[str] = None
classmethod from_response_dict(data_channel)

Convert the API response to the native object.

Parameters

data_channel (dict) –

to_request_dict()

Convert the native object to the API request format.

class sagemaker.automl.automlv2.LocalAutoMLDataChannel(data_type, path, channel_type=None, compression_type=None, content_type=None)

Bases: object

Class to represnt a local datasource which will be uploaded to S3.

Parameters
  • data_type (str) – The data type for S3 data source. Valid values: ManifestFile, AugmentedManifestFile or S3Prefix.

  • path (str) – The path to the local data which will be uploaded to S3.

  • channel_type (str) – The type of channel. Valid values: training or validation. Defines whether the data are used for training or validation. The default value is training. Channels for training and validation must share the same content_type.

  • compression_type (str) – The compression type for input data. Gzip or None.

  • content_type (str) – The content type of the data from the input source.

data_type: str
path: str
channel_type: Optional[str] = None
compression_type: Optional[str] = None
content_type: Optional[str] = None
class sagemaker.automl.automlv2.AutoMLV2(problem_config, base_job_name=None, output_path=None, job_objective=None, validation_fraction=None, auto_generate_endpoint_name=None, endpoint_name=None, output_kms_key=None, role=None, volume_kms_key=None, encrypt_inter_container_traffic=None, vpc_config=None, tags=None, sagemaker_session=None)

Bases: object

A class for creating and interacting with SageMaker AutoMLV2 jobs.

Initialize an AutoMLV2 object.

Parameters
  • problem_config (object) –

    A collection of settings specific to the problem type used to configure an AutoML job V2. There must be one and only one config of the following type. Supported problem types are:

    • Image Classification (sagemaker.automl.automlv2.ImageClassificationJobConfig),

    • Tabular (sagemaker.automl.automlv2.TabularJobConfig),

    • Text Classification (sagemaker.automl.automlv2.TextClassificationJobConfig),

    • Text Generation (TextGenerationJobConfig),

    • Time Series Forecasting (

      sagemaker.automl.automlv2.TimeSeriesForecastingJobConfig).

  • base_job_name (str) – The name of AutoML job. The name must be unique to within the AWS account and is case-insensitive.

  • output_path (str) – The Amazon S3 output path. Must be 128 characters or less.

  • job_objective (dict[str, str]) – Defines the objective metric used to measure the predictive quality of an AutoML job. In the format of: {“MetricName”: str}. Available metrics are listed here: https://docs.aws.amazon.com/sagemaker/latest/dg/autopilot-metrics-validation.html

  • validation_fraction (float) – A float that specifies the portion of the input dataset to be used for validation. The value should be in (0, 1) range.

  • auto_generate_endpoint_name (bool) – Whether to automatically generate an endpoint name for a one-click Autopilot model deployment. If set auto_generate_endpoint_name to True, do not specify the endpoint_name.

  • endpoint_name (str) – Specifies the endpoint name to use for a one-click AutoML model deployment if the endpoint name is not generated automatically. Specify the endpoint_name if and only if auto_generate_endpoint_name is set to False

  • output_kms_key (str) – The AWS KMS encryption key ID for output data configuration

  • role (str) – The ARN of the role that is used to create the job and access the data.

  • volume_kms_key (str) – The key used to encrypt stored data.

  • encrypt_inter_container_traffic (bool) – whether to use traffic encryption between the container layers.

  • vpc_config (dict) – Specifies a VPC that your training jobs and hosted models have access to. Contents include “SecurityGroupIds” and “Subnets”.

  • tags (Optional[Tags]) – Tags to attach to this specific endpoint.

  • sagemaker_session (sagemaker.session.Session) – A SageMaker Session object, used for SageMaker interactions.

Returns

AutoMLV2 object.

classmethod from_auto_ml(auto_ml)

Create an AutoMLV2 object from an AutoML object.

This method maps AutoML properties into an AutoMLV2 object, so you can create AutoMLV2 jobs from the existing AutoML objects.

Parameters

auto_ml (sagemaker.automl.automl.AutoML) – An AutoML object from which an AutoMLV2 object will be created.

Return type

AutoMLV2

fit(inputs, wait=True, logs=True, job_name=None)

Create an AutoML Job with the input dataset.

Parameters
  • inputs (LocalAutoMLDataChannel or list(LocalAutoMLDataChannel) – or list(AutoMLDataChannel)): Local path or S3 Uri where the training data is stored. Or an AutoMLDataChannel object. Or a list of AutoMLDataChannel objects. If a local path in LocalAutoMLDataChannel is provided, the dataset will be uploaded to an S3 location. The list of AutoMLDataChannel objects is to specify the training or the validation input source. Input source for training and validation must share the same content type and target attribute name. Minimum number of 1 item. Maximum number of 2 items for list[AutoMLDataChannel].

  • wait (bool) – Whether the call should wait until the job completes (default: True).

  • logs (bool) – Whether to show the logs produced by the job. Only meaningful when wait is True (default: True). if wait is False, logs will be set to False as well.

  • job_name (str) – The job name. If not specified, the estimator generates a default job name, based on the training image name and current timestamp.

classmethod attach(auto_ml_job_name, sagemaker_session=None)

Attach to an existing AutoML job.

Creates and returns a AutoML bound to an existing automl job.

Parameters
  • auto_ml_job_name (str) – AutoML job name

  • sagemaker_session (sagemaker.session.Session) – A SageMaker Session object, used for SageMaker interactions (default: None). If not specified, the one originally associated with the AutoML instance is used.

Returns

A AutoMLV2 instance with the attached automl job.

Return type

sagemaker.automl.AutoML

describe_auto_ml_job(job_name=None)

Returns the job description of an AutoML job for the given job name.

Parameters

job_name (str) – The name of the AutoML job to describe. If None, will use object’s latest_auto_ml_job name.

Returns

A dictionary response with the AutoML Job description.

Return type

dict

best_candidate(job_name=None)

Returns the best candidate of an AutoML job for a given name.

Parameters

job_name (str) – The name of the AutoML job. If None, object’s _current_auto_ml_job_name will be used.

Returns

A dictionary with information of the best candidate.

Return type

dict

list_candidates(job_name=None, status_equals=None, candidate_name=None, candidate_arn=None, sort_order=None, sort_by=None, max_results=None)

Returns the list of candidates of an AutoML job for a given name.

Parameters
  • job_name (str) – The name of the AutoML job. If None, will use object’s _current_job name.

  • status_equals (str) – Filter the result with candidate status, values could be “Completed”, “InProgress”, “Failed”, “Stopped”, “Stopping”

  • candidate_name (str) – The name of a specified candidate to list. Default to None.

  • candidate_arn (str) – The Arn of a specified candidate to list. Default to None.

  • sort_order (str) – The order that the candidates will be listed in result. Default to None.

  • sort_by (str) – The value that the candidates will be sorted by. Default to None.

  • max_results (int) – The number of candidates will be listed in results, between 1 to 100. Default to None. If None, will return all the candidates.

Returns

A list of dictionaries with candidates information.

Return type

list

create_model(name, sagemaker_session=None, candidate=None, vpc_config=None, enable_network_isolation=False, model_kms_key=None, predictor_cls=None, inference_response_keys=None)

Creates a model from a given candidate or the best candidate from the job.

Parameters
  • name (str) – The pipeline model name.

  • sagemaker_session (sagemaker.session.Session) – A SageMaker Session object, used for SageMaker interactions (default: None). If not specified, the one originally associated with the AutoML instance is used.:

  • candidate (CandidateEstimator or dict) – a CandidateEstimator used for deploying to a SageMaker Inference Pipeline. If None, the best candidate will be used. If the candidate input is a dict, a CandidateEstimator will be created from it.

  • vpc_config (dict) – Specifies a VPC that your training jobs and hosted models have access to. Contents include “SecurityGroupIds” and “Subnets”.

  • enable_network_isolation (bool) – Isolates the training container. No inbound or outbound network calls can be made, except for calls between peers within a training cluster for distributed training. Default: False

  • model_kms_key (str) – KMS key ARN used to encrypt the repacked model archive file if the model is repacked

  • predictor_cls (callable[string, sagemaker.session.Session]) – A function to call to create a predictor (default: None). If specified, deploy() returns the result of invoking this function on the created endpoint name.

  • inference_response_keys (list) – List of keys for response content. The order of the keys will dictate the content order in the response.

Returns

PipelineModel object.

deploy(initial_instance_count, instance_type, serializer=None, deserializer=None, candidate=None, sagemaker_session=None, name=None, endpoint_name=None, tags=None, wait=True, vpc_config=None, enable_network_isolation=False, model_kms_key=None, predictor_cls=None, inference_response_keys=None, volume_size=None, model_data_download_timeout=None, container_startup_health_check_timeout=None)

Deploy a candidate to a SageMaker Inference Pipeline.

Parameters
  • initial_instance_count (int) – The initial number of instances to run in the Endpoint created from this Model.

  • instance_type (str) – The EC2 instance type to deploy this Model to. For example, ‘ml.p2.xlarge’.

  • serializer (BaseSerializer) – A serializer object, used to encode data for an inference endpoint (default: None). If serializer is not None, then serializer will override the default serializer. The default serializer is set by the predictor_cls.

  • deserializer (BaseDeserializer) – A deserializer object, used to decode data from an inference endpoint (default: None). If deserializer is not None, then deserializer will override the default deserializer. The default deserializer is set by the predictor_cls.

  • candidate (CandidateEstimator or dict) – a CandidateEstimator used for deploying to a SageMaker Inference Pipeline. If None, the best candidate will be used. If the candidate input is a dict, a CandidateEstimator will be created from it.

  • sagemaker_session (sagemaker.session.Session) – A SageMaker Session object, used for SageMaker interactions (default: None). If not specified, the one originally associated with the AutoML instance is used.

  • name (str) – The pipeline model name. If None, a default model name will be selected on each deploy.

  • endpoint_name (str) – The name of the endpoint to create (default: None). If not specified, a unique endpoint name will be created.

  • tags (Optional[Tags]) – The list of tags to attach to this specific endpoint.

  • wait (bool) – Whether the call should wait until the deployment of model completes (default: True).

  • vpc_config (dict) – Specifies a VPC that your training jobs and hosted models have access to. Contents include “SecurityGroupIds” and “Subnets”.

  • enable_network_isolation (bool) – Isolates the training container. No inbound or outbound network calls can be made, except for calls between peers within a training cluster for distributed training. Default: False

  • model_kms_key (str) – KMS key ARN used to encrypt the repacked model archive file if the model is repacked

  • predictor_cls (callable[string, sagemaker.session.Session]) – A function to call to create a predictor (default: None). If specified, deploy() returns the result of invoking this function on the created endpoint name.

  • inference_response_keys (list) – List of keys for response content. The order of the keys will dictate the content order in the response.

  • volume_size (int) – The size, in GB, of the ML storage volume attached to individual inference instance associated with the production variant. Currenly only Amazon EBS gp2 storage volumes are supported.

  • model_data_download_timeout (int) – The timeout value, in seconds, to download and extract model data from Amazon S3 to the individual inference instance associated with this production variant.

  • container_startup_health_check_timeout (int) – The timeout value, in seconds, for your inference container to pass health check by SageMaker Hosting. For more information about health check see: https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-inference-code.html#your-algorithms-inference-algo-ping-requests

Returns

If predictor_cls is specified, the invocation of self.predictor_cls on the created endpoint name. Otherwise, None.

Return type

callable[string, sagemaker.session.Session] or None

classmethod validate_and_update_inference_response(inference_containers, inference_response_keys)

Validates the requested inference keys and updates response content.

On validation, also updates the inference containers to emit appropriate response content in the inference response.

Parameters
  • inference_containers (list) – list of inference containers

  • inference_response_keys (list) – list of inference response keys

Raises

ValueError – if one or more of inference_response_keys are unsupported by the model

class sagemaker.automl.automlv2.AutoMLJobV2(sagemaker_session, job_name, inputs)

Bases: _Job

A class for interacting with CreateAutoMLJobV2 API.

Placeholder docstring

classmethod start_new(auto_ml, inputs)

Create a new Amazon SageMaker AutoMLV2 job from auto_ml_v2 object.

Parameters
Returns

Constructed object that captures all information about the started AutoMLV2 job.

Return type

sagemaker.automl.AutoMLJobV2

describe()

Returns a response from the DescribeAutoMLJobV2 API call.

wait(logs=True)

Wait for the AutoML job to finish.

Parameters

logs (bool) – indicate whether to output logs.