Amazon Estimators¶
Base class for Amazon Estimator implementations
-
class
sagemaker.amazon.amazon_estimator.
AmazonAlgorithmEstimatorBase
(role, train_instance_count, train_instance_type, data_location=None, **kwargs)¶ Bases:
sagemaker.estimator.EstimatorBase
Base class for Amazon first-party Estimator implementations. This class isn’t intended to be instantiated directly.
Initialize an AmazonAlgorithmEstimatorBase.
Parameters: data_location (str or None) – The s3 prefix to upload RecordSet objects to, expressed as an S3 url. For example “s3://example-bucket/some-key-prefix/”. Objects will be saved in a unique sub-directory of the specified location. If None, a default data location will be used. -
feature_dim
¶ An algorithm hyperparameter with optional validation. Implemented as a python descriptor object.
-
mini_batch_size
¶ An algorithm hyperparameter with optional validation. Implemented as a python descriptor object.
-
repo_name
= None¶
-
repo_version
= None¶
-
train_image
()¶ Return the Docker image to use for training.
The
fit()
method, which does the model training, calls this method to find the image to use for model training.Returns: The URI of the Docker image. Return type: str
-
hyperparameters
()¶ Return the hyperparameters as a dictionary to use for training.
The
fit()
method, which trains the model, calls this method to find the hyperparameters.Returns: The hyperparameters. Return type: dict[str, str]
-
data_location
¶
-
fit
(records, mini_batch_size=None, wait=True, logs=True, job_name=None)¶ Fit this Estimator on serialized Record objects, stored in S3.
records
should be an instance ofRecordSet
. This defines a collection of S3 data files to train thisEstimator
on.Training data is expected to be encoded as dense or sparse vectors in the “values” feature on each Record. If the data is labeled, the label is expected to be encoded as a list of scalas in the “values” feature of the Record label.
More information on the Amazon Record format is available at: https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-training.html
See
record_set()
to construct aRecordSet
object fromndarray
arrays.Parameters: - records (
RecordSet
) – The records to train thisEstimator
on - mini_batch_size (int or None) – The size of each mini-batch to use when training. If
None
, a default value will be used. - wait (bool) – Whether the call should wait until the job completes (default: True).
- logs (bool) – Whether to show the logs produced by the job. Only meaningful when wait is True (default: True).
- job_name (str) – Training job name. If not specified, the estimator generates a default job name, based on the training image name and current timestamp.
- records (
-
record_set
(train, labels=None, channel='train', encrypt=False)¶ Build a
RecordSet
from a numpyndarray
matrix and label vector.For the 2D
ndarray
train
, each row is converted to aRecord
object. The vector is stored in the “values” entry of thefeatures
property of each Record. Iflabels
is not None, each corresponding label is assigned to the “values” entry of thelabels
property of each Record.The collection of
Record
objects are protobuf serialized and uploaded to new S3 locations. A manifest file is generated containing the list of objects created and also stored in S3.The number of S3 objects created is controlled by the
train_instance_count
property on this Estimator. One S3 object is created per training instance.Parameters: - train (numpy.ndarray) – A 2D numpy array of training data.
- labels (numpy.ndarray) – A 1D numpy array of labels. Its length must be equal to the
number of rows in
train
. - channel (str) – The SageMaker TrainingJob channel this RecordSet should be assigned to.
- encrypt (bool) – Specifies whether the objects uploaded to S3 are encrypted on the
server side using AES-256 (default:
False
).
Returns: A RecordSet referencing the encoded, uploading training and label data.
Return type: RecordSet
-