AutoMLV2¶
A class for SageMaker AutoML V2 Jobs.
- class sagemaker.automl.automlv2.AutoMLTabularConfig(target_attribute_name, algorithms_config=None, feature_specification_s3_uri=None, generate_candidate_definitions_only=None, mode=None, problem_type=None, sample_weight_attribute_name=None, max_candidates=None, max_runtime_per_training_job_in_seconds=None, max_total_job_runtime_in_seconds=None)¶
Bases:
object
Configuration of a tabular problem.
- Parameters
target_attribute_name (str) – The name of the column in the tabular dataset that contains the values to be predicted.
algorithms_config (list(str)) – The selection of algorithms run on a dataset to train the model candidates of an Autopilot job.
feature_specification_s3_uri (str) – A URL to the Amazon S3 data source containing selected features and specified data types from the input data source of an AutoML job.
generate_candidate_definitions_only (bool) – Whether to generates possible candidates without training the models.
mode (str) – The method that AutoML job uses to train the model. Valid values: AUTO or ENSEMBLING or HYPERPARAMETER_TUNING.
problem_type (str) – Defines the type of supervised learning available for the candidates. Available problem types are: BinaryClassification, MulticlassClassification and Regression.
sample_weight_attribute_name (str) – The name of dataset column representing sample weights.
max_candidates (int) – The maximum number of training jobs allowed to run.
max_runtime_per_training_job_in_seconds (int) – The maximum time, in seconds, that each training job executed inside hyperparameter tuning is allowed to run as part of a hyperparameter tuning job.
max_total_job_runtime_in_seconds (int) – The total wait time of an AutoML job.
- classmethod from_response_dict(api_problem_type_config)¶
Convert the API response to the native object.
- Parameters
api_problem_type_config (dict) –
- to_request_dict()¶
Convert the native object to the API request format.
- class sagemaker.automl.automlv2.AutoMLImageClassificationConfig(max_candidates=None, max_runtime_per_training_job_in_seconds=None, max_total_job_runtime_in_seconds=None)¶
Bases:
object
Configuration of an image classification problem.
- Parameters
max_candidates (int) – The maximum number of training jobs allowed to run.
max_runtime_per_training_job_in_seconds (int) – The maximum time, in seconds, that each training job executed inside hyperparameter tuning is allowed to run as part of a hyperparameter tuning job.
max_total_job_runtime_in_seconds (int) – The total wait time of an AutoML job.
- classmethod from_response_dict(api_problem_type_config)¶
Convert the API response to the native object.
- Parameters
api_problem_type_config (dict) –
- to_request_dict()¶
Convert the native object to the API request format.
- class sagemaker.automl.automlv2.AutoMLTextClassificationConfig(content_column, target_label_column, max_candidates=None, max_runtime_per_training_job_in_seconds=None, max_total_job_runtime_in_seconds=None)¶
Bases:
object
Configuration of a text classification problem.
- Parameters
content_column (str) – The name of the column used to provide the text to be classified. It should not be the same as the target label column.
target_label_column (str) – The name of the column used to provide the class labels. It should not be same as the content column.
max_candidates (int) – The maximum number of training jobs allowed to run.
max_runtime_per_training_job_in_seconds (int) – The maximum time, in seconds, that each training job executed inside hyperparameter tuning is allowed to run as part of a hyperparameter tuning job.
max_total_job_runtime_in_seconds (int) – The total wait time of an AutoML job.
- classmethod from_response_dict(api_problem_type_config)¶
Convert the API response to the native object.
- Parameters
api_problem_type_config (dict) –
- to_request_dict()¶
Convert the native object to the API request format.
- class sagemaker.automl.automlv2.AutoMLTextGenerationConfig(base_model_name=None, accept_eula=None, text_generation_hyper_params=None, max_candidates=None, max_runtime_per_training_job_in_seconds=None, max_total_job_runtime_in_seconds=None)¶
Bases:
object
Configuration of a text generation problem.
- Parameters
base_model_name (str) – The name of the base model to fine-tune. Autopilot supports fine-tuning a variety of large language models. For information on the list of supported models, see Text generation models supporting fine-tuning in Autopilot: https://docs.aws.amazon.com/sagemaker/latest/dg/autopilot-llms-finetuning-models.html#autopilot-llms-finetuning-supported-llms. If no BaseModelName is provided, the default model used is Falcon7BInstruct.
accept_eula (bool) – Specifies agreement to the model end-user license agreement (EULA). The AcceptEula value must be explicitly defined as True in order to accept the EULA that this model requires. For example, LLAMA2 requires to accept EULA. You are responsible for reviewing and complying with any applicable license terms and making sure they are acceptable for your use case before downloading or using a model.
text_generation_hyper_params (dict) –
The hyperparameters used to configure and optimize the learning process of the base model. You can set any combination of the following hyperparameters for all base models. Supported parameters are:
epochCount: The number of times the model goes through the entire training dataset.
batchSize: The number of data samples used in each iteration of training.
learningRate: The step size at which a model’s parameters are updated during training.
- learningRateWarmupSteps: The number of training steps during which the learning rate
gradually increases before reaching its target or maximum value.
max_candidates (int) – The maximum number of training jobs allowed to run.
max_runtime_per_training_job_in_seconds (int) – The maximum time, in seconds, that each training job executed inside hyperparameter tuning is allowed to run as part of a hyperparameter tuning job.
max_total_job_runtime_in_seconds (int) – The total wait time of an AutoML job.
- classmethod from_response_dict(api_problem_type_config)¶
Convert the API response to the native object.
- Parameters
api_problem_type_config (dict) –
- to_request_dict()¶
Convert the native object to the API request format.
- class sagemaker.automl.automlv2.AutoMLTimeSeriesForecastingConfig(forecast_frequency, forecast_horizon, item_identifier_attribute_name, target_attribute_name, timestamp_attribute_name, grouping_attribute_names=None, feature_specification_s3_uri=None, forecast_quantiles=None, holiday_config=None, aggregation=None, filling=None, max_candidates=None, max_runtime_per_training_job_in_seconds=None, max_total_job_runtime_in_seconds=None)¶
Bases:
object
Configuration of a time series forecasting problem.
- Parameters
forecast_frequency (str) – The frequency of predictions in a forecast. Valid intervals are an integer followed by Y (Year), M (Month), W (Week), D (Day), H (Hour), and min (Minute). For example, 1D indicates every day and 15min indicates every 15 minutes. The value of a frequency must not overlap with the next larger frequency. For example, you must use a frequency of 1H instead of 60min.
forecast_horizon (int) – The number of time-steps that the model predicts. The forecast horizon is also called the prediction length. The maximum forecast horizon is the lesser of 500 time-steps or 1/4 of the time-steps in the dataset.
item_identifier_attribute_name (str) – The name of the column that represents the set of item identifiers for which you want to predict the target value.
target_attribute_name (str) – The name of the column representing the target variable that you want to predict for each item in your dataset. The data type of the target variable must be numerical.
timestamp_attribute_name (str) – The name of the column indicating a point in time at which the target value of a given item is recorded.
grouping_attribute_names (list(str)) – A set of columns names that can be grouped with the item identifier column to create a composite key for which a target value is predicted.
feature_specification_s3_uri (str) – A URL to the Amazon S3 data source containing selected features and specified data types from the input data source of an AutoML job.
forecast_quantiles (list(str)) – The quantiles used to train the model for forecasts at a specified quantile. You can specify quantiles from 0.01 (p1) to 0.99 (p99), by increments of 0.01 or higher. Up to five forecast quantiles can be specified. When ForecastQuantiles is not provided, the AutoML job uses the quantiles p10, p50, and p90 as default.
holiday_config (list(str)) – The country code for the holiday calendar. For the list of public holiday calendars supported by AutoML job V2, see Country Codes: https://docs.aws.amazon.com/sagemaker/latest/dg/autopilot-timeseries-forecasting-holiday-calendars.html#holiday-country-codes. Use the country code corresponding to the country of your choice.
aggregation (dict) – A key value pair defining the aggregation method for a column, where the key is the column name and the value is the aggregation method. Aggregation is only supported for the target column. The supported aggregation methods are sum (default), avg, first, min, max.
filling (dict) –
A key value pair defining the filling method for a column, where the key is the column name and the value is an object which defines the filling logic. You can specify multiple filling methods for a single column. The supported filling methods and their corresponding options are:
frontfill: none (Supported only for target column)
middlefill: zero, value, median, mean, min, max
backfill: zero, value, median, mean, min, max
futurefill: zero, value, median, mean, min, max
To set a filling method to a specific value, set the fill parameter to the chosen filling method value (for example “backfill” : “value”), and define the filling value in an additional parameter prefixed with “_value”. For example, to set backfill to a value of 2, you must include two parameters: “backfill”: “value” and “backfill_value”:”2”.
max_candidates (int) – The maximum number of training jobs allowed to run.
max_runtime_per_training_job_in_seconds (int) – The maximum time, in seconds, that each training job executed inside hyperparameter tuning is allowed to run as part of a hyperparameter tuning job.
max_total_job_runtime_in_seconds (int) – The total wait time of an AutoML job.
- classmethod from_response_dict(api_problem_type_config)¶
Convert the API response to the native object.
- Parameters
api_problem_type_config (dict) –
- to_request_dict()¶
Convert the native object to the API request format.
- class sagemaker.automl.automlv2.AutoMLDataChannel(s3_data_type, s3_uri, channel_type=None, compression_type=None, content_type=None)¶
Bases:
object
Class to represnt the datasource which will be used for mode training.
- Parameters
s3_data_type (str) – The data type for S3 data source. Valid values: ManifestFile, AugmentedManifestFile or S3Prefix.
s3_uri (str) – The URL to the Amazon S3 data source. The Uri refers to the Amazon S3 prefix or ManifestFile depending on the data type.
channel_type (str) – The type of channel. Valid values: training or validation. Defines whether the data are used for training or validation. The default value is training. Channels for training and validation must share the same content_type.
compression_type (str) – The compression type for input data. Gzip or None.
content_type (str) – The content type of the data from the input source.
- classmethod from_response_dict(data_channel)¶
Convert the API response to the native object.
- Parameters
data_channel (dict) –
- to_request_dict()¶
Convert the native object to the API request format.
- class sagemaker.automl.automlv2.LocalAutoMLDataChannel(data_type, path, channel_type=None, compression_type=None, content_type=None)¶
Bases:
object
Class to represnt a local datasource which will be uploaded to S3.
- Parameters
data_type (str) – The data type for S3 data source. Valid values: ManifestFile, AugmentedManifestFile or S3Prefix.
path (str) – The path to the local data which will be uploaded to S3.
channel_type (str) – The type of channel. Valid values: training or validation. Defines whether the data are used for training or validation. The default value is training. Channels for training and validation must share the same content_type.
compression_type (str) – The compression type for input data. Gzip or None.
content_type (str) – The content type of the data from the input source.
- class sagemaker.automl.automlv2.AutoMLV2(problem_config, base_job_name=None, output_path=None, job_objective=None, validation_fraction=None, auto_generate_endpoint_name=None, endpoint_name=None, output_kms_key=None, role=None, volume_kms_key=None, encrypt_inter_container_traffic=None, vpc_config=None, tags=None, sagemaker_session=None)¶
Bases:
object
A class for creating and interacting with SageMaker AutoMLV2 jobs.
Initialize an AutoMLV2 object.
- Parameters
problem_config (object) –
A collection of settings specific to the problem type used to configure an AutoML job V2. There must be one and only one config of the following type. Supported problem types are:
Image Classification (sagemaker.automl.automlv2.ImageClassificationJobConfig),
Tabular (sagemaker.automl.automlv2.TabularJobConfig),
Text Classification (sagemaker.automl.automlv2.TextClassificationJobConfig),
Text Generation (TextGenerationJobConfig),
- Time Series Forecasting (
sagemaker.automl.automlv2.TimeSeriesForecastingJobConfig).
base_job_name (str) – The name of AutoML job. The name must be unique to within the AWS account and is case-insensitive.
output_path (str) – The Amazon S3 output path. Must be 128 characters or less.
job_objective (dict[str, str]) – Defines the objective metric used to measure the predictive quality of an AutoML job. In the format of: {“MetricName”: str}. Available metrics are listed here: https://docs.aws.amazon.com/sagemaker/latest/dg/autopilot-metrics-validation.html
validation_fraction (float) – A float that specifies the portion of the input dataset to be used for validation. The value should be in (0, 1) range.
auto_generate_endpoint_name (bool) – Whether to automatically generate an endpoint name for a one-click Autopilot model deployment. If set auto_generate_endpoint_name to True, do not specify the endpoint_name.
endpoint_name (str) – Specifies the endpoint name to use for a one-click AutoML model deployment if the endpoint name is not generated automatically. Specify the endpoint_name if and only if auto_generate_endpoint_name is set to False
output_kms_key (str) – The AWS KMS encryption key ID for output data configuration
role (str) – The ARN of the role that is used to create the job and access the data.
volume_kms_key (str) – The key used to encrypt stored data.
encrypt_inter_container_traffic (bool) – whether to use traffic encryption between the container layers.
vpc_config (dict) – Specifies a VPC that your training jobs and hosted models have access to. Contents include “SecurityGroupIds” and “Subnets”.
tags (Optional[Tags]) – Tags to attach to this specific endpoint.
sagemaker_session (sagemaker.session.Session) – A SageMaker Session object, used for SageMaker interactions.
- Returns
AutoMLV2 object.
- classmethod from_auto_ml(auto_ml)¶
Create an AutoMLV2 object from an AutoML object.
This method maps AutoML properties into an AutoMLV2 object, so you can create AutoMLV2 jobs from the existing AutoML objects.
- Parameters
auto_ml (sagemaker.automl.automl.AutoML) – An AutoML object from which an AutoMLV2 object will be created.
- Return type
- fit(inputs, wait=True, logs=True, job_name=None)¶
Create an AutoML Job with the input dataset.
- Parameters
inputs (LocalAutoMLDataChannel or list(LocalAutoMLDataChannel) – or list(AutoMLDataChannel)): Local path or S3 Uri where the training data is stored. Or an AutoMLDataChannel object. Or a list of AutoMLDataChannel objects. If a local path in LocalAutoMLDataChannel is provided, the dataset will be uploaded to an S3 location. The list of AutoMLDataChannel objects is to specify the training or the validation input source. Input source for training and validation must share the same content type and target attribute name. Minimum number of 1 item. Maximum number of 2 items for list[AutoMLDataChannel].
wait (bool) – Whether the call should wait until the job completes (default: True).
logs (bool) – Whether to show the logs produced by the job. Only meaningful when wait is True (default: True). if
wait
is False,logs
will be set to False as well.job_name (str) – The job name. If not specified, the estimator generates a default job name, based on the training image name and current timestamp.
- classmethod attach(auto_ml_job_name, sagemaker_session=None)¶
Attach to an existing AutoML job.
Creates and returns a AutoML bound to an existing automl job.
- Parameters
auto_ml_job_name (str) – AutoML job name
sagemaker_session (sagemaker.session.Session) – A SageMaker Session object, used for SageMaker interactions (default: None). If not specified, the one originally associated with the
AutoML
instance is used.
- Returns
A
AutoMLV2
instance with the attached automl job.- Return type
sagemaker.automl.AutoML
- describe_auto_ml_job(job_name=None)¶
Returns the job description of an AutoML job for the given job name.
- best_candidate(job_name=None)¶
Returns the best candidate of an AutoML job for a given name.
- list_candidates(job_name=None, status_equals=None, candidate_name=None, candidate_arn=None, sort_order=None, sort_by=None, max_results=None)¶
Returns the list of candidates of an AutoML job for a given name.
- Parameters
job_name (str) – The name of the AutoML job. If None, will use object’s _current_job name.
status_equals (str) – Filter the result with candidate status, values could be “Completed”, “InProgress”, “Failed”, “Stopped”, “Stopping”
candidate_name (str) – The name of a specified candidate to list. Default to None.
candidate_arn (str) – The Arn of a specified candidate to list. Default to None.
sort_order (str) – The order that the candidates will be listed in result. Default to None.
sort_by (str) – The value that the candidates will be sorted by. Default to None.
max_results (int) – The number of candidates will be listed in results, between 1 to 100. Default to None. If None, will return all the candidates.
- Returns
A list of dictionaries with candidates information.
- Return type
- create_model(name, sagemaker_session=None, candidate=None, vpc_config=None, enable_network_isolation=False, model_kms_key=None, predictor_cls=None, inference_response_keys=None)¶
Creates a model from a given candidate or the best candidate from the job.
- Parameters
name (str) – The pipeline model name.
sagemaker_session (sagemaker.session.Session) – A SageMaker Session object, used for SageMaker interactions (default: None). If not specified, the one originally associated with the
AutoML
instance is used.:candidate (CandidateEstimator or dict) – a CandidateEstimator used for deploying to a SageMaker Inference Pipeline. If None, the best candidate will be used. If the candidate input is a dict, a CandidateEstimator will be created from it.
vpc_config (dict) – Specifies a VPC that your training jobs and hosted models have access to. Contents include “SecurityGroupIds” and “Subnets”.
enable_network_isolation (bool) – Isolates the training container. No inbound or outbound network calls can be made, except for calls between peers within a training cluster for distributed training. Default: False
model_kms_key (str) – KMS key ARN used to encrypt the repacked model archive file if the model is repacked
predictor_cls (callable[string, sagemaker.session.Session]) – A function to call to create a predictor (default: None). If specified,
deploy()
returns the result of invoking this function on the created endpoint name.inference_response_keys (list) – List of keys for response content. The order of the keys will dictate the content order in the response.
- Returns
PipelineModel object.
- deploy(initial_instance_count, instance_type, serializer=None, deserializer=None, candidate=None, sagemaker_session=None, name=None, endpoint_name=None, tags=None, wait=True, vpc_config=None, enable_network_isolation=False, model_kms_key=None, predictor_cls=None, inference_response_keys=None, volume_size=None, model_data_download_timeout=None, container_startup_health_check_timeout=None)¶
Deploy a candidate to a SageMaker Inference Pipeline.
- Parameters
initial_instance_count (int) – The initial number of instances to run in the
Endpoint
created from thisModel
.instance_type (str) – The EC2 instance type to deploy this Model to. For example, ‘ml.p2.xlarge’.
serializer (
BaseSerializer
) – A serializer object, used to encode data for an inference endpoint (default: None). Ifserializer
is not None, thenserializer
will override the default serializer. The default serializer is set by thepredictor_cls
.deserializer (
BaseDeserializer
) – A deserializer object, used to decode data from an inference endpoint (default: None). Ifdeserializer
is not None, thendeserializer
will override the default deserializer. The default deserializer is set by thepredictor_cls
.candidate (CandidateEstimator or dict) – a CandidateEstimator used for deploying to a SageMaker Inference Pipeline. If None, the best candidate will be used. If the candidate input is a dict, a CandidateEstimator will be created from it.
sagemaker_session (sagemaker.session.Session) – A SageMaker Session object, used for SageMaker interactions (default: None). If not specified, the one originally associated with the
AutoML
instance is used.name (str) – The pipeline model name. If None, a default model name will be selected on each
deploy
.endpoint_name (str) – The name of the endpoint to create (default: None). If not specified, a unique endpoint name will be created.
tags (Optional[Tags]) – The list of tags to attach to this specific endpoint.
wait (bool) – Whether the call should wait until the deployment of model completes (default: True).
vpc_config (dict) – Specifies a VPC that your training jobs and hosted models have access to. Contents include “SecurityGroupIds” and “Subnets”.
enable_network_isolation (bool) – Isolates the training container. No inbound or outbound network calls can be made, except for calls between peers within a training cluster for distributed training. Default: False
model_kms_key (str) – KMS key ARN used to encrypt the repacked model archive file if the model is repacked
predictor_cls (callable[string, sagemaker.session.Session]) – A function to call to create a predictor (default: None). If specified,
deploy()
returns the result of invoking this function on the created endpoint name.inference_response_keys (list) – List of keys for response content. The order of the keys will dictate the content order in the response.
volume_size (int) – The size, in GB, of the ML storage volume attached to individual inference instance associated with the production variant. Currenly only Amazon EBS gp2 storage volumes are supported.
model_data_download_timeout (int) – The timeout value, in seconds, to download and extract model data from Amazon S3 to the individual inference instance associated with this production variant.
container_startup_health_check_timeout (int) – The timeout value, in seconds, for your inference container to pass health check by SageMaker Hosting. For more information about health check see: https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-inference-code.html#your-algorithms-inference-algo-ping-requests
- Returns
If
predictor_cls
is specified, the invocation ofself.predictor_cls
on the created endpoint name. Otherwise,None
.- Return type
callable[string, sagemaker.session.Session] or
None
- classmethod validate_and_update_inference_response(inference_containers, inference_response_keys)¶
Validates the requested inference keys and updates response content.
On validation, also updates the inference containers to emit appropriate response content in the inference response.
- Parameters
- Raises
ValueError – if one or more of inference_response_keys are unsupported by the model
- class sagemaker.automl.automlv2.AutoMLJobV2(sagemaker_session, job_name, inputs)¶
Bases:
_Job
A class for interacting with CreateAutoMLJobV2 API.
Placeholder docstring
- classmethod start_new(auto_ml, inputs)¶
Create a new Amazon SageMaker AutoMLV2 job from auto_ml_v2 object.
- Parameters
auto_ml (sagemaker.automl.AutoMLV2) – AutoMLV2 object created by the user.
inputs (AutoMLDataChannel or list[AutoMLDataChannel]) – Parameters used when called
fit()
.
- Returns
Constructed object that captures all information about the started AutoMLV2 job.
- Return type
sagemaker.automl.AutoMLJobV2
- describe()¶
Returns a response from the DescribeAutoMLJobV2 API call.