AutoML¶
A class for SageMaker AutoML Jobs.
-
class
sagemaker.automl.automl.
AutoML
(role, target_attribute_name, output_kms_key=None, output_path=None, base_job_name=None, compression_type=None, sagemaker_session=None, volume_kms_key=None, encrypt_inter_container_traffic=False, vpc_config=None, problem_type=None, max_candidates=500, max_runtime_per_training_job_in_seconds=None, total_job_runtime_in_seconds=None, job_objective=None, generate_candidate_definitions_only=False, tags=None)¶ Bases:
object
A class for creating and interacting with SageMaker AutoML jobs
-
fit
(inputs=None, wait=True, logs=True, job_name=None)¶ Create an AutoML Job with the input dataset.
Parameters: - inputs (str or list[str] or AutoMLInput) – Local path or S3 Uri where the training data is stored. Or an AutoMLInput object. If a local path is provided, the dataset will be uploaded to an S3 location.
- wait (bool) – Whether the call should wait until the job completes (default: True).
- logs (bool) – Whether to show the logs produced by the job. Only meaningful when wait is True (default: True).
- job_name (str) – Training job name. If not specified, the estimator generates a default job name, based on the training image name and current timestamp.
-
describe_auto_ml_job
(job_name=None)¶ Returns the job description of an AutoML job for the given job name.
Parameters: job_name (str) – The name of the AutoML job to describe. If None, will use object’s latest_auto_ml_job name. Returns: A dictionary response with the AutoML Job description. Return type: dict
-
best_candidate
(job_name=None)¶ Returns the best candidate of an AutoML job for a given name
Parameters: job_name (str) – The name of the AutoML job. If None, will use object’s _current_auto_ml_job_name. Returns: a dictionary with information of the best candidate Return type: dict
-
list_candidates
(job_name=None, status_equals=None, candidate_name=None, candidate_arn=None, sort_order=None, sort_by=None, max_results=None)¶ Returns the list of candidates of an AutoML job for a given name.
Parameters: - job_name (str) – The name of the AutoML job. If None, will use object’s _current_job name.
- status_equals (str) – Filter the result with candidate status, values could be “Completed”, “InProgress”, “Failed”, “Stopped”, “Stopping”
- candidate_name (str) – The name of a specified candidate to list. Default to None.
- candidate_arn (str) – The Arn of a specified candidate to list. Default to None.
- sort_order (str) – The order that the candidates will be listed in result. Default to None.
- sort_by (str) – The value that the candidates will be sorted by. Default to None.
- max_results (int) – The number of candidates will be listed in results, between 1 to 100. Default to None. If None, will return all the candidates.
Returns: A list of dictionaries with candidates information
Return type:
-
deploy
(initial_instance_count, instance_type, candidate=None, sagemaker_session=None, name=None, endpoint_name=None, tags=None, wait=True, update_endpoint=False, vpc_config=None, enable_network_isolation=False, model_kms_key=None, predictor_cls=None)¶ Deploy a candidate to a SageMaker Inference Pipeline and return a Predictor
Parameters: - initial_instance_count (int) – The initial number of instances to run
in the
Endpoint
created from thisModel
. - instance_type (str) – The EC2 instance type to deploy this Model to. For example, ‘ml.p2.xlarge’.
- candidate (CandidateEstimator or dict) – a CandidateEstimator used for deploying to a SageMaker Inference Pipeline. If None, the best candidate will be used. If the candidate input is a dict, a CandidateEstimator will be created from it.
- sagemaker_session (sagemaker.session.Session) – A SageMaker Session object, used for SageMaker interactions (default: None). If not specified, one is created using the default AWS configuration chain.
- name (str) – The pipeline model name. If None, a default model name will
be selected on each
deploy
. - endpoint_name (str) – The name of the endpoint to create (default: None). If not specified, a unique endpoint name will be created.
- tags (List[dict[str, str]]) – The list of tags to attach to this specific endpoint.
- wait (bool) – Whether the call should wait until the deployment of model completes (default: True).
- update_endpoint (bool) – Flag to update the model in an existing Amazon SageMaker endpoint. If True, this will deploy a new EndpointConfig to an already existing endpoint and delete resources corresponding to the previous EndpointConfig. If False, a new endpoint will be created. Default: False
- vpc_config (dict) – Specifies a VPC that your training jobs and hosted models have access to. Contents include “SecurityGroupIds” and “Subnets”.
- enable_network_isolation (bool) – Isolates the training container. No inbound or outbound network calls can be made, except for calls between peers within a training cluster for distributed training. Default: False
- model_kms_key (str) – KMS key ARN used to encrypt the repacked model archive file if the model is repacked
- predictor_cls (callable[string, sagemaker.session.Session]) – A
function to call to create a predictor (default: None). If
specified,
deploy()
returns the result of invoking this function on the created endpoint name.
Returns: If
predictor_cls
is specified, the invocation ofself.predictor_cls
on the created endpoint name. Otherwise,None
.Return type: callable[string, sagemaker.session.Session] or
None
- initial_instance_count (int) – The initial number of instances to run
in the
-
-
class
sagemaker.automl.automl.
AutoMLInput
(inputs, target_attribute_name, compression=None)¶ Bases:
object
Accepts parameters that specify an S3 input for an auto ml job and provides a method to turn those parameters into a dictionary.
Convert an S3 Uri or a list of S3 Uri to an AutoMLInput object.
Parameters: - (str, list[str]) (inputs) – a string or a list of string that points to (a) S3 location(s) where input data is stored.
- (str) (compression) – the target attribute name for regression or classification.
- (str) – if training data is compressed, the compression type. The default value is None.
-
to_request_dict
()¶ Generates a request dictionary using the parameters provided to the class.
-
class
sagemaker.automl.automl.
AutoMLJob
(sagemaker_session, job_name, inputs)¶ Bases:
sagemaker.job._Job
A class for interacting with CreateAutoMLJob API.
-
classmethod
start_new
(auto_ml, inputs)¶ Create a new Amazon SageMaker AutoML job from auto_ml.
Parameters: Returns: Constructed object that captures all information about the started AutoML job.
Return type: sagemaker.automl.AutoMLJob
-
describe
()¶ Prints out a response from the DescribeAutoMLJob API call.
-
wait
(logs=True)¶ Wait for the AutoML job to finish. :param logs: indicate whether to output logs. :type logs: bool
-
classmethod
A class for AutoML Job’s Candidate.
-
class
sagemaker.automl.candidate_estimator.
CandidateEstimator
(candidate, sagemaker_session=None)¶ Bases:
object
A class for SageMaker AutoML Job Candidate
Constructor of CandidateEstimator.
Parameters: - candidate (dict) – a dictionary of candidate returned by AutoML.list_candidates() or AutoML.best_candidate().
- sagemaker_session (sagemaker.session.Session) – A SageMaker Session object, used for SageMaker interactions (default: None). If not specified, one is created using the default AWS configuration chain.
-
get_steps
()¶ Get the step job of a candidate so that users can construct estimators/transformers
Returns: - a list of dictionaries that provide information about each step job’s name,
- type, inputs and description
Return type: list
-
fit
(inputs, candidate_name=None, volume_kms_key=None, encrypt_inter_container_traffic=False, vpc_config=None, wait=True, logs=True)¶ Rerun a candidate’s step jobs with new input datasets or security config.
Parameters: - inputs (str or list[str]) – Local path or S3 Uri where the training data is stored. If a local path is provided, the dataset will be uploaded to an S3 location.
- candidate_name (str) – name of the candidate to be rerun, if None, candidate’s original name will be used.
- volume_kms_key (str) – The KMS key id to encrypt data on the storage volume attached to the ML compute instance(s).
- encrypt_inter_container_traffic (bool) – To encrypt all communications between ML compute instances in distributed training. Default: False.
- vpc_config (dict) – Specifies a VPC that jobs and hosted models have access to. Control access to and from training and model containers by configuring the VPC
- wait (bool) – Whether the call should wait until all jobs completes (default: True).
- logs (bool) – Whether to show the logs produced by the job. Only meaningful when wait is True (default: True).
-
class
sagemaker.automl.candidate_estimator.
CandidateStep
(name, inputs, step_type, description)¶ Bases:
object
A class that maintains an AutoML Candidate step’s name, inputs, type, and description.
-
name
¶ Name of the candidate step -> (str)
-
inputs
¶ Inputs of the candidate step -> (dict)
-
type
¶ Type of the candidate step, Training or Transform -> (str)
-
description
¶ Description of candidate step job -> (dict)
-