Object2Vec

The Amazon SageMaker Object2Vec algorithm.

class sagemaker.Object2Vec(role, instance_count=None, instance_type=None, epochs=None, enc0_max_seq_len=None, enc0_vocab_size=None, enc_dim=None, mini_batch_size=None, early_stopping_patience=None, early_stopping_tolerance=None, dropout=None, weight_decay=None, bucket_width=None, num_classes=None, mlp_layers=None, mlp_dim=None, mlp_activation=None, output_layer=None, optimizer=None, learning_rate=None, negative_sampling_rate=None, comparator_list=None, tied_token_embedding_weight=None, token_embedding_storage_type=None, enc0_network=None, enc1_network=None, enc0_cnn_filter_width=None, enc1_cnn_filter_width=None, enc1_max_seq_len=None, enc0_token_embedding_dim=None, enc1_token_embedding_dim=None, enc1_vocab_size=None, enc0_layers=None, enc1_layers=None, enc0_freeze_pretrained_embedding=None, enc1_freeze_pretrained_embedding=None, **kwargs)

Bases: sagemaker.amazon.amazon_estimator.AmazonAlgorithmEstimatorBase

A general-purpose neural embedding algorithm that is highly customizable.

It can learn low-dimensional dense embeddings of high-dimensional objects. The embeddings are learned in a way that preserves the semantics of the relationship between pairs of objects in the original space in the embedding space.

Object2Vec is Estimator used for anomaly detection.

This Estimator may be fit via calls to fit(). There is an utility record_set() that can be used to upload data to S3 and creates RecordSet to be passed to the fit call.

After this Estimator is fit, model data is stored in S3. The model may be deployed to an Amazon SageMaker Endpoint by invoking deploy(). As well as deploying an Endpoint, deploy returns a Predictor object that can be used for inference calls using the trained model hosted in the SageMaker Endpoint.

Object2Vec Estimators can be configured by setting hyperparameters. The available hyperparameters for Object2Vec are documented below.

For further information on the AWS Object2Vec algorithm, please consult AWS technical documentation: https://docs.aws.amazon.com/sagemaker/latest/dg/object2vec.html

Parameters
  • role (str) – An AWS IAM role (either name or full ARN). The Amazon SageMaker training jobs and APIs that create Amazon SageMaker endpoints use this role to access training data and model artifacts. After the endpoint is created, the inference code might use the IAM role, if accessing AWS resource.

  • instance_count (int) – Number of Amazon EC2 instances to use for training.

  • instance_type (str) – Type of EC2 instance to use for training, for example, ‘ml.c4.xlarge’.

  • epochs (int) – Total number of epochs for SGD training

  • enc0_max_seq_len (int) – Maximum sequence length

  • enc0_vocab_size (int) – Vocabulary size of tokens

  • enc_dim (int) – Optional. Dimension of the output of the embedding layer

  • mini_batch_size (int) – Optional. mini batch size for SGD training

  • early_stopping_patience (int) – Optional. The allowed number of consecutive epochs without improvement before early stopping is applied

  • early_stopping_tolerance (float) – Optional. The value used to determine whether the algorithm has made improvement between two consecutive epochs for early stopping

  • dropout (float) – Optional. Dropout probability on network layers

  • weight_decay (float) – Optional. Weight decay parameter during optimization

  • bucket_width (int) – Optional. The allowed difference between data sequence length when bucketing is enabled

  • num_classes (int) – Optional. Number of classes for classification training (ignored for regression problems)

  • mlp_layers (int) – Optional. Number of MLP layers in the network

  • mlp_dim (int) – Optional. Dimension of the output of MLP layer

  • mlp_activation (str) – Optional. Type of activation function for the MLP layer

  • output_layer (str) – Optional. Type of output layer

  • optimizer (str) – Optional. Type of optimizer for training

  • learning_rate (float) – Optional. Learning rate for SGD training

  • negative_sampling_rate (int) – Optional. Negative sampling rate

  • comparator_list (str) – Optional. Customization of comparator operator

  • tied_token_embedding_weight (bool) – Optional. Tying of token embedding layer weight

  • token_embedding_storage_type (str) – Optional. Type of token embedding storage

  • enc0_network (str) – Optional. Network model of encoder “enc0”

  • enc1_network (str) – Optional. Network model of encoder “enc1”

  • enc0_cnn_filter_width (int) – Optional. CNN filter width

  • enc1_cnn_filter_width (int) – Optional. CNN filter width

  • enc1_max_seq_len (int) – Optional. Maximum sequence length

  • enc0_token_embedding_dim (int) – Optional. Output dimension of token embedding layer

  • enc1_token_embedding_dim (int) – Optional. Output dimension of token embedding layer

  • enc1_vocab_size (int) – Optional. Vocabulary size of tokens

  • enc0_layers (int) – Optional. Number of layers in encoder

  • enc1_layers (int) – Optional. Number of layers in encoder

  • enc0_freeze_pretrained_embedding (bool) – Optional. Freeze pretrained embedding weights

  • enc1_freeze_pretrained_embedding (bool) – Optional. Freeze pretrained embedding weights

  • **kwargs – base class keyword argument values.

Tip

You can find additional parameters for initializing this class at AmazonAlgorithmEstimatorBase and EstimatorBase.

repo_name = 'object2vec'
repo_version = 1
MINI_BATCH_SIZE = 32
CONTAINER_CODE_CHANNEL_SOURCEDIR_PATH = '/opt/ml/input/data/code/sourcedir.tar.gz'
INSTANCE_TYPE = 'sagemaker_instance_type'
LAUNCH_MPI_ENV_NAME = 'sagemaker_mpi_enabled'
LAUNCH_PS_ENV_NAME = 'sagemaker_parameter_server_enabled'
LAUNCH_SM_DDP_ENV_NAME = 'sagemaker_distributed_dataparallel_enabled'
MPI_CUSTOM_MPI_OPTIONS = 'sagemaker_mpi_custom_mpi_options'
MPI_NUM_PROCESSES_PER_HOST = 'sagemaker_mpi_num_of_processes_per_host'
SM_DDP_CUSTOM_MPI_OPTIONS = 'sagemaker_distributed_dataparallel_custom_mpi_options'
classmethod attach(training_job_name, sagemaker_session=None, model_channel_name='model')

Attach to an existing training job.

Create an Estimator bound to an existing training job, each subclass is responsible to implement _prepare_init_params_from_job_description() as this method delegates the actual conversion of a training job description to the arguments that the class constructor expects. After attaching, if the training job has a Complete status, it can be deploy() ed to create a SageMaker Endpoint and return a Predictor.

If the training job is in progress, attach will block until the training job completes, but logs of the training job will not display. To see the logs content, please call logs()

Examples

>>> my_estimator.fit(wait=False)
>>> training_job_name = my_estimator.latest_training_job.name
Later on:
>>> attached_estimator = Estimator.attach(training_job_name)
>>> attached_estimator.logs()
>>> attached_estimator.deploy()
Parameters
  • training_job_name (str) – The name of the training job to attach to.

  • sagemaker_session (sagemaker.session.Session) – Session object which manages interactions with Amazon SageMaker APIs and any other AWS services needed. If not specified, the estimator creates one using the default AWS configuration chain.

  • model_channel_name (str) – Name of the channel where pre-trained model data will be downloaded (default: ‘model’). If no channel with the same name exists in the training job, this option will be ignored.

Returns

Instance of the calling Estimator Class with the attached training job.

compile_model(target_instance_family, input_shape, output_path, framework=None, framework_version=None, compile_max_run=900, tags=None, target_platform_os=None, target_platform_arch=None, target_platform_accelerator=None, compiler_options=None, **kwargs)

Compile a Neo model using the input model.

Parameters
Returns

A SageMaker Model object. See Model() for full details.

Return type

sagemaker.model.Model

property data_location

Placeholder docstring

delete_endpoint(**kwargs)
deploy(initial_instance_count=None, instance_type=None, serializer=None, deserializer=None, accelerator_type=None, endpoint_name=None, use_compiled_model=False, wait=True, model_name=None, kms_key=None, data_capture_config=None, tags=None, serverless_inference_config=None, async_inference_config=None, **kwargs)

Deploy the trained model to an Amazon SageMaker endpoint.

And then return sagemaker.Predictor object.

More information: http://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-training.html

Parameters
  • initial_instance_count (int) – The initial number of instances to run in the Endpoint created from this Model. If not using serverless inference, then it need to be a number larger or equals to 1 (default: None)

  • instance_type (str) – The EC2 instance type to deploy this Model to. For example, ‘ml.p2.xlarge’, or ‘local’ for local mode. If not using serverless inference, then it is required to deploy a model. (default: None)

  • serializer (BaseSerializer) – A serializer object, used to encode data for an inference endpoint (default: None). If serializer is not None, then serializer will override the default serializer. The default serializer is set by the predictor_cls.

  • deserializer (BaseDeserializer) – A deserializer object, used to decode data from an inference endpoint (default: None). If deserializer is not None, then deserializer will override the default deserializer. The default deserializer is set by the predictor_cls.

  • accelerator_type (str) – Type of Elastic Inference accelerator to attach to an endpoint for model loading and inference, for example, ‘ml.eia1.medium’. If not specified, no Elastic Inference accelerator will be attached to the endpoint. For more information: https://docs.aws.amazon.com/sagemaker/latest/dg/ei.html

  • endpoint_name (str) – Name to use for creating an Amazon SageMaker endpoint. If not specified, the name of the training job is used.

  • use_compiled_model (bool) – Flag to select whether to use compiled (optimized) model. Default: False.

  • wait (bool) – Whether the call should wait until the deployment of model completes (default: True).

  • model_name (str) – Name to use for creating an Amazon SageMaker model. If not specified, the estimator generates a default job name based on the training image name and current timestamp.

  • kms_key (str) – The ARN of the KMS key that is used to encrypt the data on the storage volume attached to the instance hosting the endpoint.

  • data_capture_config (sagemaker.model_monitor.DataCaptureConfig) – Specifies configuration related to Endpoint data capture for use with Amazon SageMaker Model Monitoring. Default: None.

  • async_inference_config (sagemaker.model_monitor.AsyncInferenceConfig) – Specifies configuration related to async inference. Use this configuration when trying to create async endpoint and make async inference. If empty config object passed through, will use default config to deploy async endpoint. Deploy a real-time endpoint if it’s None. (default: None)

  • serverless_inference_config (sagemaker.serverless.ServerlessInferenceConfig) – Specifies configuration related to serverless endpoint. Use this configuration when trying to create serverless endpoint and make serverless inference. If empty object passed through, will use pre-defined values in ServerlessInferenceConfig class to deploy serverless endpoint. Deploy an instance based endpoint if it’s None. (default: None)

  • tags (List[dict[str, str]]) – Optional. The list of tags to attach to this specific endpoint. Example: >>> tags = [{‘Key’: ‘tagname’, ‘Value’: ‘tagvalue’}] For more information about tags, see https://boto3.amazonaws.com/v1/documentation /api/latest/reference/services/sagemaker.html#SageMaker.Client.add_tags

  • **kwargs – Passed to invocation of create_model(). Implementations may customize create_model() to accept **kwargs to customize model creation during deploy. For more, see the implementation docs.

Returns

A predictor that provides a predict() method,

which can be used to send requests to the Amazon SageMaker endpoint and obtain inferences.

Return type

sagemaker.predictor.Predictor

disable_profiling()

Update the current training job in progress to disable profiling.

Debugger stops collecting the system and framework metrics and turns off the Debugger built-in monitoring and profiling rules.

enable_default_profiling()

Update training job to enable Debugger monitoring.

This method enables Debugger monitoring with the default profiler_config parameter to collect system metrics and the default built-in profiler_report rule. Framework metrics won’t be saved. To update training job to emit framework metrics, you can use update_profiler method and specify the framework metrics you want to enable.

This method is callable when the training job is in progress while Debugger monitoring is disabled.

enable_network_isolation()

Return True if this Estimator will need network isolation to run.

Returns

Whether this Estimator needs network isolation or not.

Return type

bool

fit(**kwargs)
get_vpc_config(vpc_config_override='VPC_CONFIG_DEFAULT')

Returns VpcConfig dict either from this Estimator’s subnets and security groups.

Or else validate and return an optional override value.

Parameters

vpc_config_override

hyperparameters()

Placeholder docstring

latest_job_debugger_artifacts_path()

Gets the path to the DebuggerHookConfig output artifacts.

Returns

An S3 path to the output artifacts.

Return type

str

latest_job_profiler_artifacts_path()

Gets the path to the profiling output artifacts.

Returns

An S3 path to the output artifacts.

Return type

str

latest_job_tensorboard_artifacts_path()

Gets the path to the TensorBoardOutputConfig output artifacts.

Returns

An S3 path to the output artifacts.

Return type

str

logs()

Display the logs for Estimator’s training job.

If the output is a tty or a Jupyter cell, it will be color-coded based on which instance the log entry is from.

property model_data

The model location in S3. Only set if Estimator has been fit().

Type

str

prepare_workflow_for_training(records=None, mini_batch_size=None, job_name=None)

Calls _prepare_for_training. Used when setting up a workflow.

Parameters
  • records (RecordSet) – The records to train this Estimator on.

  • mini_batch_size (int or None) – The size of each mini-batch to use when training. If None, a default value will be used.

  • job_name (str) – Name of the training job to be created. If not specified, one is generated, using the base name given to the constructor if applicable.

record_set(train, labels=None, channel='train', encrypt=False)

Build a RecordSet from a numpy ndarray matrix and label vector.

For the 2D ndarray train, each row is converted to a Record object. The vector is stored in the “values” entry of the features property of each Record. If labels is not None, each corresponding label is assigned to the “values” entry of the labels property of each Record.

The collection of Record objects are protobuf serialized and uploaded to new S3 locations. A manifest file is generated containing the list of objects created and also stored in S3.

The number of S3 objects created is controlled by the instance_count property on this Estimator. One S3 object is created per training instance.

Parameters
  • train (numpy.ndarray) – A 2D numpy array of training data.

  • labels (numpy.ndarray) – A 1D numpy array of labels. Its length must be equal to the number of rows in train.

  • channel (str) – The SageMaker TrainingJob channel this RecordSet should be assigned to.

  • encrypt (bool) – Specifies whether the objects uploaded to S3 are encrypted on the server side using AES-256 (default: False).

Returns

A RecordSet referencing the encoded, uploading training and label data.

Return type

RecordSet

register(content_types, response_types, inference_instances, transform_instances, image_uri=None, model_package_name=None, model_package_group_name=None, model_metrics=None, metadata_properties=None, marketplace_cert=False, approval_status=None, description=None, compile_model_family=None, model_name=None, drift_check_baselines=None, customer_metadata_properties=None, **kwargs)

Creates a model package for creating SageMaker models or listing on Marketplace.

Parameters
  • content_types (list) – The supported MIME types for the input data.

  • response_types (list) – The supported MIME types for the output data.

  • inference_instances (list) – A list of the instance types that are used to generate inferences in real-time.

  • transform_instances (list) – A list of the instance types on which a transformation job can be run or on which an endpoint can be deployed.

  • image_uri (str) – The container image uri for Model Package, if not specified, Estimator’s training container image will be used (default: None).

  • model_package_name (str) – Model Package name, exclusive to model_package_group_name, using model_package_name makes the Model Package un-versioned (default: None).

  • model_package_group_name (str) – Model Package Group name, exclusive to model_package_name, using model_package_group_name makes the Model Package versioned (default: None).

  • model_metrics (ModelMetrics) – ModelMetrics object (default: None).

  • metadata_properties (MetadataProperties) – MetadataProperties (default: None).

  • marketplace_cert (bool) – A boolean value indicating if the Model Package is certified for AWS Marketplace (default: False).

  • approval_status (str) – Model Approval Status, values can be “Approved”, “Rejected”, or “PendingManualApproval” (default: “PendingManualApproval”).

  • description (str) – Model Package description (default: None).

  • compile_model_family (str) – Instance family for compiled model, if specified, a compiled model will be used (default: None).

  • model_name (str) – User defined model name (default: None).

  • drift_check_baselines (DriftCheckBaselines) – DriftCheckBaselines object (default: None).

  • customer_metadata_properties (dict[str, str]) – A dictionary of key-value paired metadata properties (default: None).

  • **kwargs – Passed to invocation of create_model(). Implementations may customize create_model() to accept **kwargs to customize model creation during deploy. For more, see the implementation docs.

Returns

A string of SageMaker Model Package ARN.

Return type

str

training_image_uri()

Placeholder docstring

property training_job_analytics

Return a TrainingJobAnalytics object for the current training job.

transformer(instance_count, instance_type, strategy=None, assemble_with=None, output_path=None, output_kms_key=None, accept=None, env=None, max_concurrent_transforms=None, max_payload=None, tags=None, role=None, volume_kms_key=None, vpc_config_override='VPC_CONFIG_DEFAULT', enable_network_isolation=None, model_name=None)

Return a Transformer that uses a SageMaker Model based on the training job.

It reuses the SageMaker Session and base job name used by the Estimator.

Parameters
  • instance_count (int) – Number of EC2 instances to use.

  • instance_type (str) – Type of EC2 instance to use, for example, ‘ml.c4.xlarge’.

  • strategy (str) – The strategy used to decide how to batch records in a single request (default: None). Valid values: ‘MultiRecord’ and ‘SingleRecord’.

  • assemble_with (str) – How the output is assembled (default: None). Valid values: ‘Line’ or ‘None’.

  • output_path (str) – S3 location for saving the transform result. If not specified, results are stored to a default bucket.

  • output_kms_key (str) – Optional. KMS key ID for encrypting the transform output (default: None).

  • accept (str) – The accept header passed by the client to the inference endpoint. If it is supported by the endpoint, it will be the format of the batch transform output.

  • env (dict) – Environment variables to be set for use during the transform job (default: None).

  • max_concurrent_transforms (int) – The maximum number of HTTP requests to be made to each individual transform container at one time.

  • max_payload (int) – Maximum size of the payload in a single HTTP request to the container in MB.

  • tags (list[dict]) – List of tags for labeling a transform job. If none specified, then the tags used for the training job are used for the transform job.

  • role (str) – The ExecutionRoleArn IAM Role ARN for the Model, which is also used during transform jobs. If not specified, the role from the Estimator will be used.

  • volume_kms_key (str) – Optional. KMS key ID for encrypting the volume attached to the ML compute instance (default: None).

  • vpc_config_override (dict[str, list[str]]) –

    Optional override for the VpcConfig set on the model. Default: use subnets and security groups from this Estimator.

    • ’Subnets’ (list[str]): List of subnet ids.

    • ’SecurityGroupIds’ (list[str]): List of security group ids.

  • enable_network_isolation (bool) – Specifies whether container will run in network isolation mode. Network isolation mode restricts the container access to outside networks (such as the internet). The container does not make any inbound or outbound network calls. If True, a channel named “code” will be created for any user entry script for inference. Also known as Internet-free mode. If not specified, this setting is taken from the estimator’s current configuration.

  • model_name (str) – Name to use for creating an Amazon SageMaker model. If not specified, the estimator generates a default job name based on the training image name and current timestamp.

update_profiler(rules=None, system_monitor_interval_millis=None, s3_output_path=None, framework_profile_params=None, disable_framework_metrics=False)

Update training jobs to enable profiling.

This method updates the profiler_config parameter and initiates Debugger built-in rules for profiling.

Parameters
  • rules (list[ProfilerRule]) – A list of ProfilerRule objects to define rules for continuous analysis with SageMaker Debugger. Currently, you can only add new profiler rules during the training job. (default: None)

  • s3_output_path (str) – The location in S3 to store the output. If profiler is enabled once, s3_output_path cannot be changed. (default: None)

  • system_monitor_interval_millis (int) – How often profiling system metrics are collected; Unit: Milliseconds (default: None)

  • framework_profile_params (FrameworkProfile) – A parameter object for framework metrics profiling. Configure it using the FrameworkProfile class. To use the default framework profile parameters, pass FrameworkProfile(). For more information about the default values, see FrameworkProfile. (default: None)

  • disable_framework_metrics (bool) – Specify whether to disable all the framework metrics. This won’t update system metrics and the Debugger built-in rules for monitoring. To stop both monitoring and profiling, use the desable_profiling method. (default: False)

Attention

Updating the profiling configuration for TensorFlow dataloader profiling is currently not available. If you started a TensorFlow training job only with monitoring and want to enable profiling while the training job is running, the dataloader profiling cannot be updated.

negative_sampling_rate

An algorithm hyperparameter with optional validation.

Implemented as a python descriptor object.

comparator_list

An algorithm hyperparameter with optional validation.

Implemented as a python descriptor object.

tied_token_embedding_weight

An algorithm hyperparameter with optional validation.

Implemented as a python descriptor object.

token_embedding_storage_type

An algorithm hyperparameter with optional validation.

Implemented as a python descriptor object.

create_model(vpc_config_override='VPC_CONFIG_DEFAULT', **kwargs)

Return a Object2VecModel.

It references the latest s3 model data produced by this Estimator.

Parameters
  • vpc_config_override (dict[str, list[str]]) – Optional override for VpcConfig set on the model. Default: use subnets and security groups from this Estimator. * ‘Subnets’ (list[str]): List of subnet ids. * ‘SecurityGroupIds’ (list[str]): List of security group ids.

  • **kwargs – Additional kwargs passed to the Object2VecModel constructor.

class sagemaker.Object2VecModel(model_data, role, sagemaker_session=None, **kwargs)

Bases: sagemaker.model.Model

Reference Object2Vec s3 model data.

Calling deploy() creates an Endpoint and returns a Predictor that calculates anomaly scores for datapoints.

Initialization for Object2VecModel class.

Parameters
  • model_data (str) – The S3 location of a SageMaker model data .tar.gz file.

  • role (str) – An AWS IAM role (either name or full ARN). The Amazon SageMaker training jobs and APIs that create Amazon SageMaker endpoints use this role to access training data and model artifacts. After the endpoint is created, the inference code might use the IAM role, if it needs to access an AWS resource.

  • sagemaker_session (sagemaker.session.Session) – Session object which manages interactions with Amazon SageMaker APIs and any other AWS services needed. If not specified, the estimator creates one using the default AWS configuration chain.

  • **kwargs – Keyword arguments passed to the FrameworkModel initializer.