Amazon Estimators¶

Base class for Amazon Estimator implementations

class sagemaker.amazon.amazon_estimator.AmazonAlgorithmEstimatorBase(role=None, instance_count=None, instance_type=None, data_location=None, enable_network_isolation=False, **kwargs)¶

Bases: EstimatorBase

Base class for Amazon first-party Estimator implementations.

This class isn’t intended to be instantiated directly.

Initialize an AmazonAlgorithmEstimatorBase.

Parameters

role (str) – An AWS IAM role (either name or full ARN). The Amazon SageMaker training jobs and APIs that create Amazon SageMaker endpoints use this role to access training data and model artifacts. After the endpoint is created, the inference code might use the IAM role, if it needs to access an AWS resource.
instance_count (int or PipelineVariable) – Number of Amazon EC2 instances to use for training. Required.
instance_type (str or PipelineVariable) – Type of EC2 instance to use for training, for example, ‘ml.c4.xlarge’. Required.
data_location (str or None) – The s3 prefix to upload RecordSet objects to, expressed as an S3 url. For example “s3://example-bucket/some-key-prefix/”. Objects will be saved in a unique sub-directory of the specified location. If None, a default data location will be used.
enable_network_isolation (bool or PipelineVariable) – Specifies whether container will run in network isolation mode. Network isolation mode restricts the container access to outside networks (such as the internet). Also known as internet-free mode (default: False).
**kwargs – Additional parameters passed to EstimatorBase.

Tip

You can find additional parameters for initializing this class at EstimatorBase.

feature_dim: Hyperparameter¶

An algorithm hyperparameter with optional validation.

Implemented as a python descriptor object.

mini_batch_size: Hyperparameter¶

An algorithm hyperparameter with optional validation.

Implemented as a python descriptor object.

repo_name: Optional[str] = None¶

repo_version: Optional[str] = None¶

DEFAULT_MINI_BATCH_SIZE: Optional[int] = None¶

training_image_uri()¶: Placeholder docstring

hyperparameters()¶: Placeholder docstring

property data_location¶: Placeholder docstring

prepare_workflow_for_training(records=None, mini_batch_size=None, job_name=None)¶

Calls _prepare_for_training. Used when setting up a workflow.

Parameters

records (RecordSet) – The records to train this Estimator on.
mini_batch_size (int or None) – The size of each mini-batch to use when training. If None, a default value will be used.
job_name (str) – Name of the training job to be created. If not specified, one is generated, using the base name given to the constructor if applicable.

fit(records, mini_batch_size=None, wait=True, logs=True, job_name=None, experiment_config=None)¶

Fit this Estimator on serialized Record objects, stored in S3.

records should be an instance of RecordSet. This defines a collection of S3 data files to train this Estimator on.

Training data is expected to be encoded as dense or sparse vectors in the “values” feature on each Record. If the data is labeled, the label is expected to be encoded as a list of scalas in the “values” feature of the Record label.

More information on the Amazon Record format is available at: https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-training.html

See record_set() to construct a RecordSet object from ndarray arrays.

Parameters

records (RecordSet) – The records to train this Estimator on
mini_batch_size (int or None) – The size of each mini-batch to use when training. If None, a default value will be used.
wait (bool) – Whether the call should wait until the job completes (default: True).
logs (bool) – Whether to show the logs produced by the job. Only meaningful when wait is True (default: True).
job_name (str) – Training job name. If not specified, the estimator generates a default job name, based on the training image name and current timestamp.
experiment_config (dict[str, str]) – Experiment management configuration. Optionally, the dict can contain four keys: ‘ExperimentName’, ‘TrialName’, ‘TrialComponentDisplayName’ and ‘RunName’. The behavior of setting these keys is as follows: * If ExperimentName is supplied but TrialName is not a Trial will be automatically created and the job’s Trial Component associated with the Trial. * If TrialName is supplied and the Trial already exists the job’s Trial Component will be associated with the Trial. * If both ExperimentName and TrialName are not supplied the trial component will be unassociated. * TrialComponentDisplayName is used for display in Studio.

record_set(train, labels=None, channel='train', encrypt=False, distribution='ShardedByS3Key')¶

Build a RecordSet from a numpy ndarray matrix and label vector.

For the 2D ndarray train, each row is converted to a Record object. The vector is stored in the “values” entry of the features property of each Record. If labels is not None, each corresponding label is assigned to the “values” entry of the labels property of each Record.

The collection of Record objects are protobuf serialized and uploaded to new S3 locations. A manifest file is generated containing the list of objects created and also stored in S3.

The number of S3 objects created is controlled by the instance_count property on this Estimator. One S3 object is created per training instance.

Parameters

train (numpy.ndarray) – A 2D numpy array of training data.
labels (numpy.ndarray) – A 1D numpy array of labels. Its length must be equal to the number of rows in train.
channel (str) – The SageMaker TrainingJob channel this RecordSet should be assigned to.
encrypt (bool) – Specifies whether the objects uploaded to S3 are encrypted on the server side using AES-256 (default: False).
distribution (str) – The SageMaker TrainingJob channel s3 data distribution type (default: ShardedByS3Key).

Returns

A RecordSet referencing the encoded, uploading training and label data.

Return type

RecordSet