AutoML

A class for SageMaker AutoML Jobs.

class sagemaker.automl.automl.AutoMLInput(inputs, target_attribute_name, compression=None, channel_type=None, content_type=None, s3_data_type=None, sample_weight_attribute_name=None)

Bases: object

Accepts parameters that specify an S3 input for an auto ml job

Provides a method to turn those parameters into a dictionary.

Convert an S3 Uri or a list of S3 Uri to an AutoMLInput object.

Parameters
  • inputs (str, list[str], PipelineVariable) – a string or a list of string or a PipelineVariable that points to (a) S3 location(s) where input data is stored.

  • target_attribute_name (str, PipelineVariable) – the target attribute name for regression or classification.

  • compression (str, PipelineVariable) – if training data is compressed, the compression type. The default value is None.

  • channel_type (str, PipelineVariable) – The channel type an enum to specify whether the input resource is for training or validation. Valid values: training or validation.

  • content_type (str, PipelineVariable) – The content type of the data from the input source.

  • s3_data_type (str, PipelineVariable) – The data type for S3 data source. Valid values: ManifestFile or S3Prefix.

  • sample_weight_attribute_name (str, PipelineVariable) – the name of the dataset column representing sample weights

to_request_dict()

Generates a request dictionary using the parameters provided to the class.

class sagemaker.automl.automl.AutoML(role=None, target_attribute_name=None, output_kms_key=None, output_path=None, base_job_name=None, compression_type=None, sagemaker_session=None, volume_kms_key=None, encrypt_inter_container_traffic=None, vpc_config=None, problem_type=None, max_candidates=None, max_runtime_per_training_job_in_seconds=None, total_job_runtime_in_seconds=None, job_objective=None, generate_candidate_definitions_only=False, tags=None, content_type=None, s3_data_type=None, feature_specification_s3_uri=None, validation_fraction=None, mode=None, auto_generate_endpoint_name=None, endpoint_name=None, sample_weight_attribute_name=None)

Bases: object

A class for creating and interacting with SageMaker AutoML jobs.

Initialize the an AutoML object.

Parameters
  • role (str) – The ARN of the role that is used to create the job and access the data.

  • target_attribute_name (str) – The name of the target variable in supervised learning.

  • output_kms_key (str) – The AWS KMS encryption key ID for output data configuration

  • output_path (str) – The Amazon S3 output path. Must be 128 characters or less.

  • base_job_name (str) – The name of AutoML job. The name must be unique to within the AWS account and is case-insensitive.

  • compression_type (str) – The compression type for input data. Gzip or None.

  • sagemaker_session (sagemaker.session.Session) – A SageMaker Session object, used for SageMaker interactions.

  • volume_kms_key (str) – The key used to encrypt stored data.

  • encrypt_inter_container_traffic (bool) – whether to use traffic encryption between the container layers.

  • vpc_config (dict) – Specifies a VPC that your training jobs and hosted models have access to. Contents include “SecurityGroupIds” and “Subnets”.

  • problem_type (str) – Defines the type of supervised learning available for the candidates.

  • max_candidates (int) – The maximum number of times a training job is allowed to run.

  • max_runtime_per_training_job_in_seconds (int) – The maximum time, in seconds, that each training job executed inside hyperparameter tuning is allowed to run as part of a hyperparameter tuning job.

  • total_job_runtime_in_seconds (int) – the total wait time of an AutoML job.

  • job_objective (dict[str, str]) – Defines the objective metric used to measure the predictive quality of an AutoML job. In the format of: {“MetricName”: str}

  • generate_candidate_definitions_only (bool) – Whether to generates possible candidates without training the models.

  • tags (Optional[Tags]) – Tags to attach to this specific endpoint.

  • content_type (str) – The content type of the data from the input source.

  • s3_data_type (str) – The data type for S3 data source. Valid values: ManifestFile or S3Prefix.

  • feature_specification_s3_uri (str) – A URL to the Amazon S3 data source containing selected features and specified data types from the input data source of an AutoML job.

  • validation_fraction (float) – A float that specifies the portion of the input dataset to be used for validation.

  • mode (str) – The method that AutoML job uses to train the model. Valid values: AUTO or ENSEMBLING or HYPERPARAMETER_TUNING.

  • auto_generate_endpoint_name (bool) – Whether to automatically generate an endpoint name for a one-click Autopilot model deployment. If set auto_generate_endpoint_name to True, do not specify the endpoint_name.

  • endpoint_name (str) – Specifies the endpoint name to use for a one-click AutoML model deployment if the endpoint name is not generated automatically. Specify the endpoint_name if and only if auto_generate_endpoint_name is set to False

  • sample_weight_attribute_name (str) – The name of dataset column representing sample weights.

Returns

AutoML object.

fit(inputs=None, wait=True, logs=True, job_name=None)

Create an AutoML Job with the input dataset.

Parameters
  • inputs (str or list[str] or AutoMLInput or list[AutoMLInput]) – Local path or S3 Uri where the training data is stored. Or an AutoMLInput object. Or a list of AutoMLInput objects. If a local path is provided, the dataset will be uploaded to an S3 location. The list of AutoMLInput objects is to specify the training or the validation input source. Input source for training and validation must share the same content type and target attribute name. Minimum number of 1 item. Maximum number of 2 items for list[AutoMLInput].

  • wait (bool) – Whether the call should wait until the job completes (default: True).

  • logs (bool) – Whether to show the logs produced by the job. Only meaningful when wait is True (default: True). if wait is False, logs will be set to False as well.

  • job_name (str) – Training job name. If not specified, the estimator generates a default job name, based on the training image name and current timestamp.

classmethod attach(auto_ml_job_name, sagemaker_session=None)

Attach to an existing AutoML job.

Creates and returns a AutoML bound to an existing automl job.

Parameters
  • auto_ml_job_name (str) – AutoML job name

  • sagemaker_session (sagemaker.session.Session) – A SageMaker Session object, used for SageMaker interactions (default: None). If not specified, the one originally associated with the AutoML instance is used.

Returns

A AutoML instance with the attached automl job.

Return type

sagemaker.automl.AutoML

describe_auto_ml_job(job_name=None)

Returns the job description of an AutoML job for the given job name.

Parameters

job_name (str) – The name of the AutoML job to describe. If None, will use object’s latest_auto_ml_job name.

Returns

A dictionary response with the AutoML Job description.

Return type

dict

best_candidate(job_name=None)

Returns the best candidate of an AutoML job for a given name.

Parameters

job_name (str) – The name of the AutoML job. If None, will use object’s _current_auto_ml_job_name.

Returns

A dictionary with information of the best candidate.

Return type

dict

list_candidates(job_name=None, status_equals=None, candidate_name=None, candidate_arn=None, sort_order=None, sort_by=None, max_results=None)

Returns the list of candidates of an AutoML job for a given name.

Parameters
  • job_name (str) – The name of the AutoML job. If None, will use object’s _current_job name.

  • status_equals (str) – Filter the result with candidate status, values could be “Completed”, “InProgress”, “Failed”, “Stopped”, “Stopping”

  • candidate_name (str) – The name of a specified candidate to list. Default to None.

  • candidate_arn (str) – The Arn of a specified candidate to list. Default to None.

  • sort_order (str) – The order that the candidates will be listed in result. Default to None.

  • sort_by (str) – The value that the candidates will be sorted by. Default to None.

  • max_results (int) – The number of candidates will be listed in results, between 1 to 100. Default to None. If None, will return all the candidates.

Returns

A list of dictionaries with candidates information.

Return type

list

create_model(name, sagemaker_session=None, candidate=None, vpc_config=None, enable_network_isolation=False, model_kms_key=None, predictor_cls=None, inference_response_keys=None)

Creates a model from a given candidate or the best candidate from the job.

Parameters
  • name (str) – The pipeline model name.

  • sagemaker_session (sagemaker.session.Session) – A SageMaker Session object, used for SageMaker interactions (default: None). If not specified, the one originally associated with the AutoML instance is used.:

  • candidate (CandidateEstimator or dict) – a CandidateEstimator used for deploying to a SageMaker Inference Pipeline. If None, the best candidate will be used. If the candidate input is a dict, a CandidateEstimator will be created from it.

  • vpc_config (dict) – Specifies a VPC that your training jobs and hosted models have access to. Contents include “SecurityGroupIds” and “Subnets”.

  • enable_network_isolation (bool) – Isolates the training container. No inbound or outbound network calls can be made, except for calls between peers within a training cluster for distributed training. Default: False

  • model_kms_key (str) – KMS key ARN used to encrypt the repacked model archive file if the model is repacked

  • predictor_cls (callable[string, sagemaker.session.Session]) – A function to call to create a predictor (default: None). If specified, deploy() returns the result of invoking this function on the created endpoint name.

  • inference_response_keys (list) – List of keys for response content. The order of the keys will dictate the content order in the response.

Returns

PipelineModel object.

deploy(initial_instance_count, instance_type, serializer=None, deserializer=None, candidate=None, sagemaker_session=None, name=None, endpoint_name=None, tags=None, wait=True, vpc_config=None, enable_network_isolation=False, model_kms_key=None, predictor_cls=None, inference_response_keys=None, volume_size=None, model_data_download_timeout=None, container_startup_health_check_timeout=None)

Deploy a candidate to a SageMaker Inference Pipeline.

Parameters
  • initial_instance_count (int) – The initial number of instances to run in the Endpoint created from this Model.

  • instance_type (str) – The EC2 instance type to deploy this Model to. For example, ‘ml.p2.xlarge’.

  • serializer (BaseSerializer) – A serializer object, used to encode data for an inference endpoint (default: None). If serializer is not None, then serializer will override the default serializer. The default serializer is set by the predictor_cls.

  • deserializer (BaseDeserializer) – A deserializer object, used to decode data from an inference endpoint (default: None). If deserializer is not None, then deserializer will override the default deserializer. The default deserializer is set by the predictor_cls.

  • candidate (CandidateEstimator or dict) – a CandidateEstimator used for deploying to a SageMaker Inference Pipeline. If None, the best candidate will be used. If the candidate input is a dict, a CandidateEstimator will be created from it.

  • sagemaker_session (sagemaker.session.Session) – A SageMaker Session object, used for SageMaker interactions (default: None). If not specified, the one originally associated with the AutoML instance is used.

  • name (str) – The pipeline model name. If None, a default model name will be selected on each deploy.

  • endpoint_name (str) – The name of the endpoint to create (default: None). If not specified, a unique endpoint name will be created.

  • tags (Optional[Tags]) – The list of tags to attach to this specific endpoint.

  • wait (bool) – Whether the call should wait until the deployment of model completes (default: True).

  • vpc_config (dict) – Specifies a VPC that your training jobs and hosted models have access to. Contents include “SecurityGroupIds” and “Subnets”.

  • enable_network_isolation (bool) – Isolates the training container. No inbound or outbound network calls can be made, except for calls between peers within a training cluster for distributed training. Default: False

  • model_kms_key (str) – KMS key ARN used to encrypt the repacked model archive file if the model is repacked

  • predictor_cls (callable[string, sagemaker.session.Session]) – A function to call to create a predictor (default: None). If specified, deploy() returns the result of invoking this function on the created endpoint name.

  • inference_response_keys (list) – List of keys for response content. The order of the keys will dictate the content order in the response.

  • volume_size (int) – The size, in GB, of the ML storage volume attached to individual inference instance associated with the production variant. Currenly only Amazon EBS gp2 storage volumes are supported.

  • model_data_download_timeout (int) – The timeout value, in seconds, to download and extract model data from Amazon S3 to the individual inference instance associated with this production variant.

  • container_startup_health_check_timeout (int) – The timeout value, in seconds, for your inference container to pass health check by SageMaker Hosting. For more information about health check see: https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-inference-code.html#your-algorithms-inference-algo-ping-requests

Returns

If predictor_cls is specified, the invocation of self.predictor_cls on the created endpoint name. Otherwise, None.

Return type

callable[string, sagemaker.session.Session] or None

classmethod validate_and_update_inference_response(inference_containers, inference_response_keys)

Validates the requested inference keys and updates response content.

On validation, also updates the inference containers to emit appropriate response content in the inference response.

Parameters
  • inference_containers (list) – list of inference containers

  • inference_response_keys (list) – list of inference response keys

Raises

ValueError – if one or more of inference_response_keys are unsupported by the model

class sagemaker.automl.automl.AutoMLJob(sagemaker_session, job_name, inputs)

Bases: _Job

A class for interacting with CreateAutoMLJob API.

Placeholder docstring

classmethod start_new(auto_ml, inputs)

Create a new Amazon SageMaker AutoML job from auto_ml.

Parameters
Returns

Constructed object that captures all information about the started AutoML job.

Return type

sagemaker.automl.AutoMLJob

describe()

Prints out a response from the DescribeAutoMLJob API call.

wait(logs=True)

Wait for the AutoML job to finish.

Parameters

logs (bool) – indicate whether to output logs.

A class for AutoML Job’s Candidate.

class sagemaker.automl.candidate_estimator.CandidateEstimator(candidate, sagemaker_session=None)

Bases: object

A class for SageMaker AutoML Job Candidate

Constructor of CandidateEstimator.

Parameters
  • candidate (dict) – a dictionary of candidate returned by AutoML.list_candidates() or AutoML.best_candidate().

  • sagemaker_session (sagemaker.session.Session) – A SageMaker Session object, used for SageMaker interactions (default: None). If not specified, one is created using the default AWS configuration chain.

get_steps()

Get the step job of a candidate so that users can construct estimators/transformers

Returns

a list of dictionaries that provide information about each step job’s name,

type, inputs and description

Return type

list

fit(inputs, candidate_name=None, volume_kms_key=None, encrypt_inter_container_traffic=None, vpc_config=None, wait=True, logs=True)

Rerun a candidate’s step jobs with new input datasets or security config.

Parameters
  • inputs (str or list[str]) – Local path or S3 Uri where the training data is stored. If a local path is provided, the dataset will be uploaded to an S3 location.

  • candidate_name (str) – name of the candidate to be rerun, if None, candidate’s original name will be used.

  • volume_kms_key (str) – The KMS key id to encrypt data on the storage volume attached to the ML compute instance(s).

  • encrypt_inter_container_traffic (bool) – To encrypt all communications between ML compute instances in distributed training. If not passed, will be fetched from sagemaker_config if a value is defined there. Default: False.

  • vpc_config (dict) – Specifies a VPC that jobs and hosted models have access to. Control access to and from training and model containers by configuring the VPC

  • wait (bool) – Whether the call should wait until all jobs completes (default: True).

  • logs (bool) – Whether to show the logs produced by the job. Only meaningful when wait is True (default: True).

class sagemaker.automl.candidate_estimator.CandidateStep(name, inputs, step_type, description)

Bases: object

A class that maintains an AutoML Candidate step’s name, inputs, type, and description.

property name

Name of the candidate step -> (str)

property inputs

Inputs of the candidate step -> (dict)

property type

Type of the candidate step, Training or Transform -> (str)

property description

Description of candidate step job -> (dict)