AutoML

A class for SageMaker AutoML Jobs.

class sagemaker.automl.automl.AutoML(role, target_attribute_name, output_kms_key=None, output_path=None, base_job_name=None, compression_type=None, sagemaker_session=None, volume_kms_key=None, encrypt_inter_container_traffic=False, vpc_config=None, problem_type=None, max_candidates=500, max_runtime_per_training_job_in_seconds=None, total_job_runtime_in_seconds=None, job_objective=None, generate_candidate_definitions_only=False, tags=None)

Bases: object

A class for creating and interacting with SageMaker AutoML jobs

fit(inputs=None, wait=True, logs=True, job_name=None)

Create an AutoML Job with the input dataset.

Parameters
  • inputs (str or list[str] or AutoMLInput) – Local path or S3 Uri where the training data is stored. Or an AutoMLInput object. If a local path is provided, the dataset will be uploaded to an S3 location.

  • wait (bool) – Whether the call should wait until the job completes (default: True).

  • logs (bool) – Whether to show the logs produced by the job. Only meaningful when wait is True (default: True).

  • job_name (str) – Training job name. If not specified, the estimator generates a default job name, based on the training image name and current timestamp.

describe_auto_ml_job(job_name=None)

Returns the job description of an AutoML job for the given job name.

Parameters

job_name (str) – The name of the AutoML job to describe. If None, will use object’s latest_auto_ml_job name.

Returns

A dictionary response with the AutoML Job description.

Return type

dict

best_candidate(job_name=None)

Returns the best candidate of an AutoML job for a given name

Parameters

job_name (str) – The name of the AutoML job. If None, will use object’s _current_auto_ml_job_name.

Returns

a dictionary with information of the best candidate

Return type

dict

list_candidates(job_name=None, status_equals=None, candidate_name=None, candidate_arn=None, sort_order=None, sort_by=None, max_results=None)

Returns the list of candidates of an AutoML job for a given name.

Parameters
  • job_name (str) – The name of the AutoML job. If None, will use object’s _current_job name.

  • status_equals (str) – Filter the result with candidate status, values could be “Completed”, “InProgress”, “Failed”, “Stopped”, “Stopping”

  • candidate_name (str) – The name of a specified candidate to list. Default to None.

  • candidate_arn (str) – The Arn of a specified candidate to list. Default to None.

  • sort_order (str) – The order that the candidates will be listed in result. Default to None.

  • sort_by (str) – The value that the candidates will be sorted by. Default to None.

  • max_results (int) – The number of candidates will be listed in results, between 1 to 100. Default to None. If None, will return all the candidates.

Returns

A list of dictionaries with candidates information

Return type

list

deploy(initial_instance_count, instance_type, candidate=None, sagemaker_session=None, name=None, endpoint_name=None, tags=None, wait=True, update_endpoint=False, vpc_config=None, enable_network_isolation=False, model_kms_key=None, predictor_cls=None)

Deploy a candidate to a SageMaker Inference Pipeline and return a Predictor

Parameters
  • initial_instance_count (int) – The initial number of instances to run in the Endpoint created from this Model.

  • instance_type (str) – The EC2 instance type to deploy this Model to. For example, ‘ml.p2.xlarge’.

  • candidate (CandidateEstimator or dict) – a CandidateEstimator used for deploying to a SageMaker Inference Pipeline. If None, the best candidate will be used. If the candidate input is a dict, a CandidateEstimator will be created from it.

  • sagemaker_session (sagemaker.session.Session) – A SageMaker Session object, used for SageMaker interactions (default: None). If not specified, the one originally associated with the AutoML instance is used.

  • name (str) – The pipeline model name. If None, a default model name will be selected on each deploy.

  • endpoint_name (str) – The name of the endpoint to create (default: None). If not specified, a unique endpoint name will be created.

  • tags (List[dict[str, str]]) – The list of tags to attach to this specific endpoint.

  • wait (bool) – Whether the call should wait until the deployment of model completes (default: True).

  • update_endpoint (bool) – Flag to update the model in an existing Amazon SageMaker endpoint. If True, this will deploy a new EndpointConfig to an already existing endpoint and delete resources corresponding to the previous EndpointConfig. If False, a new endpoint will be created. Default: False

  • vpc_config (dict) – Specifies a VPC that your training jobs and hosted models have access to. Contents include “SecurityGroupIds” and “Subnets”.

  • enable_network_isolation (bool) – Isolates the training container. No inbound or outbound network calls can be made, except for calls between peers within a training cluster for distributed training. Default: False

  • model_kms_key (str) – KMS key ARN used to encrypt the repacked model archive file if the model is repacked

  • predictor_cls (callable[string, sagemaker.session.Session]) – A function to call to create a predictor (default: None). If specified, deploy() returns the result of invoking this function on the created endpoint name.

Returns

If predictor_cls is specified, the invocation of self.predictor_cls on the created endpoint name. Otherwise, None.

Return type

callable[string, sagemaker.session.Session] or None

class sagemaker.automl.automl.AutoMLInput(inputs, target_attribute_name, compression=None)

Bases: object

Accepts parameters that specify an S3 input for an auto ml job and provides a method to turn those parameters into a dictionary.

Convert an S3 Uri or a list of S3 Uri to an AutoMLInput object.

Parameters
  • (str, list[str]) (inputs) – a string or a list of string that points to (a) S3 location(s) where input data is stored.

  • (str) (compression) – the target attribute name for regression or classification.

  • (str) – if training data is compressed, the compression type. The default value is None.

to_request_dict()

Generates a request dictionary using the parameters provided to the class.

class sagemaker.automl.automl.AutoMLJob(sagemaker_session, job_name, inputs)

Bases: sagemaker.job._Job

A class for interacting with CreateAutoMLJob API.

Args: sagemaker_session: job_name:

classmethod start_new(auto_ml, inputs)

Create a new Amazon SageMaker AutoML job from auto_ml.

Parameters
  • auto_ml (sagemaker.automl.AutoML) – AutoML object created by the user.

  • inputs (str, list[str]) – Parameters used when called fit().

Returns

Constructed object that captures all information about the started AutoML job.

Return type

sagemaker.automl.AutoMLJob

describe()

Prints out a response from the DescribeAutoMLJob API call.

wait(logs=True)

Wait for the AutoML job to finish. :param logs: indicate whether to output logs. :type logs: bool

A class for AutoML Job’s Candidate.

class sagemaker.automl.candidate_estimator.CandidateEstimator(candidate, sagemaker_session=None)

Bases: object

A class for SageMaker AutoML Job Candidate

Constructor of CandidateEstimator.

Parameters
  • candidate (dict) – a dictionary of candidate returned by AutoML.list_candidates() or AutoML.best_candidate().

  • sagemaker_session (sagemaker.session.Session) – A SageMaker Session object, used for SageMaker interactions (default: None). If not specified, one is created using the default AWS configuration chain.

get_steps()

Get the step job of a candidate so that users can construct estimators/transformers

Returns

a list of dictionaries that provide information about each step job’s name,

type, inputs and description

Return type

list

fit(inputs, candidate_name=None, volume_kms_key=None, encrypt_inter_container_traffic=False, vpc_config=None, wait=True, logs=True)

Rerun a candidate’s step jobs with new input datasets or security config.

Parameters
  • inputs (str or list[str]) – Local path or S3 Uri where the training data is stored. If a local path is provided, the dataset will be uploaded to an S3 location.

  • candidate_name (str) – name of the candidate to be rerun, if None, candidate’s original name will be used.

  • volume_kms_key (str) – The KMS key id to encrypt data on the storage volume attached to the ML compute instance(s).

  • encrypt_inter_container_traffic (bool) – To encrypt all communications between ML compute instances in distributed training. Default: False.

  • vpc_config (dict) – Specifies a VPC that jobs and hosted models have access to. Control access to and from training and model containers by configuring the VPC

  • wait (bool) – Whether the call should wait until all jobs completes (default: True).

  • logs (bool) – Whether to show the logs produced by the job. Only meaningful when wait is True (default: True).

class sagemaker.automl.candidate_estimator.CandidateStep(name, inputs, step_type, description)

Bases: object

A class that maintains an AutoML Candidate step’s name, inputs, type, and description.

property name

Name of the candidate step -> (str)

property inputs

Inputs of the candidate step -> (dict)

property type

Type of the candidate step, Training or Transform -> (str)

property description

Description of candidate step job -> (dict)