Debugger

Amazon SageMaker Debugger is a service that provides full visibility into the training of machine learning (ML) models, enabling customers to automatically detect several classes of errors. Customers can configure Debugger when starting their training jobs by specifying debug level, models, and location where debug output will be stored. Optionally, customers can also specify custom error conditions that they want to be alerted on. Debugger automatically collects model specific data, monitors for errors, and alerts when it detects errors during training.

sagemaker.debugger.get_rule_container_image_uri(region)

Returns the rule image uri for the given AWS region and rule type

Parameters

region – AWS Region

Returns

Formatted image uri for the given region and the rule container type

Return type

str

class sagemaker.debugger.Rule(name, image_uri, instance_type, container_local_output_path, s3_output_path, volume_size_in_gb, rule_parameters, collections_to_save)

Bases: object

Rules analyze tensors emitted during the training of a model. They monitor conditions that are critical for the success of a training job.

For example, they can detect whether gradients are getting too large or too small or if a model is being overfit. Debugger comes pre-packaged with certain built-in rules (created using the Rule.sagemaker classmethod). You can use these rules or write your own rules using the Amazon SageMaker Debugger APIs. You can also analyze raw tensor data without using rules in, for example, an Amazon SageMaker notebook, using Debugger’s full set of APIs.

Do not use this initialization method. Instead, use either the Rule.sagemaker or Rule.custom class method.

Initialize a Rule instance. The Rule analyzes tensors emitted during the training of a model and monitors conditions that are critical for the success of a training job.

Parameters
  • name (str) – The name of the debugger rule.

  • image_uri (str) – The URI of the image to be used by the debugger rule.

  • instance_type (str) – Type of EC2 instance to use, for example, ‘ml.c4.xlarge’.

  • container_local_output_path (str) – The local path to store the Rule output.

  • s3_output_path (str) – The location in S3 to store the output.

  • volume_size_in_gb (int) – Size in GB of the EBS volume to use for storing data.

  • rule_parameters (dict) – A dictionary of parameters for the rule.

  • collections_to_save ([sagemaker.debugger.CollectionConfig]) – A list of CollectionConfig objects to be saved.

classmethod sagemaker(base_config, name=None, container_local_output_path=None, s3_output_path=None, other_trials_s3_input_paths=None, rule_parameters=None, collections_to_save=None)

Initialize a Rule instance for a built-in SageMaker Debugging Rule. The Rule analyzes tensors emitted during the training of a model and monitors conditions that are critical for the success of a training job.

Parameters
  • base_config (dict) – This is the base rule config returned from the built-in list of rules. For example, ‘rule_configs.dead_relu()’.

  • name (str) – The name of the debugger rule. If one is not provided, the name of the base_config will be used.

  • container_local_output_path (str) – The path in the container.

  • s3_output_path (str) – The location in S3 to store the output.

  • other_trials_s3_input_paths ([str]) – S3 input paths for other trials.

  • rule_parameters (dict) – A dictionary of parameters for the rule.

  • collections_to_save ([sagemaker.debugger.CollectionConfig]) – A list of CollectionConfig objects to be saved.

Returns

The instance of the built-in Rule.

Return type

sagemaker.debugger.Rule

classmethod custom(name, image_uri, instance_type, volume_size_in_gb, source=None, rule_to_invoke=None, container_local_output_path=None, s3_output_path=None, other_trials_s3_input_paths=None, rule_parameters=None, collections_to_save=None)

Initialize a Rule instance for a custom rule. The Rule analyzes tensors emitted during the training of a model and monitors conditions that are critical for the success of a training job.

Parameters
  • name (str) – The name of the debugger rule.

  • image_uri (str) – The URI of the image to be used by the debugger rule.

  • instance_type (str) – Type of EC2 instance to use, for example, ‘ml.c4.xlarge’.

  • volume_size_in_gb (int) – Size in GB of the EBS volume to use for storing data.

  • source (str) – A source file containing a rule to invoke. If provided, you must also provide rule_to_invoke. This can either be an S3 uri or a local path.

  • rule_to_invoke (str) – The name of the rule to invoke within the source. If provided, you must also provide source.

  • container_local_output_path (str) – The path in the container.

  • s3_output_path (str) – The location in S3 to store the output.

  • other_trials_s3_input_paths ([str]) – S3 input paths for other trials.

  • rule_parameters (dict) – A dictionary of parameters for the rule.

  • collections_to_save ([sagemaker.debugger.CollectionConfig]) – A list of CollectionConfig objects to be saved.

Returns

The instance of the custom Rule.

Return type

sagemaker.debugger.Rule

to_debugger_rule_config_dict()

Generates a request dictionary using the parameters provided when initializing the object.

Returns

An portion of an API request as a dictionary.

Return type

dict

class sagemaker.debugger.DebuggerHookConfig(s3_output_path=None, container_local_output_path=None, hook_parameters=None, collection_configs=None)

Bases: object

DebuggerHookConfig provides options to customize how debugging information is emitted.

Initialize an instance of DebuggerHookConfig. DebuggerHookConfig provides options to customize how debugging information is emitted.

Parameters
  • s3_output_path (str) – The location in S3 to store the output.

  • container_local_output_path (str) – The path in the container.

  • hook_parameters (dict) – A dictionary of parameters.

  • collection_configs ([sagemaker.debugger.CollectionConfig]) – A list of CollectionConfig objects to be provided to the API.

class sagemaker.debugger.TensorBoardOutputConfig(s3_output_path, container_local_output_path=None)

Bases: object

TensorBoardOutputConfig provides options to customize debugging visualization using TensorBoard.

Initialize an instance of TensorBoardOutputConfig. TensorBoardOutputConfig provides options to customize debugging visualization using TensorBoard.

Parameters
  • s3_output_path (str) – The location in S3 to store the output.

  • container_local_output_path (str) – The path in the container.

class sagemaker.debugger.CollectionConfig(name, parameters=None)

Bases: object

CollectionConfig object for SageMaker Debugger.

Initialize a CollectionConfig object.

Parameters
  • name (str) – The name of the collection configuration.

  • parameters (dict) – The parameters for the collection configuration.