Debugger¶
Amazon SageMaker Debugger is a service that provides full visibility into the training of machine learning (ML) models, enabling customers to automatically detect several classes of errors. Customers can configure Debugger when starting their training jobs by specifying debug level, models, and location where debug output will be stored. Optionally, customers can also specify custom error conditions that they want to be alerted on. Debugger automatically collects model specific data, monitors for errors, and alerts when it detects errors during training.
-
sagemaker.debugger.
get_rule_container_image_uri
(region)¶ Returns the rule image uri for the given AWS region and rule type
Parameters: region – AWS Region Returns: Formatted image uri for the given region and the rule container type Return type: str
-
class
sagemaker.debugger.
Rule
(name, image_uri, instance_type, container_local_output_path, s3_output_path, volume_size_in_gb, rule_parameters, collections_to_save)¶ Bases:
object
Rules analyze tensors emitted during the training of a model. They monitor conditions that are critical for the success of a training job.
For example, they can detect whether gradients are getting too large or too small or if a model is being overfit. Debugger comes pre-packaged with certain built-in rules (created using the Rule.sagemaker classmethod). You can use these rules or write your own rules using the Amazon SageMaker Debugger APIs. You can also analyze raw tensor data without using rules in, for example, an Amazon SageMaker notebook, using Debugger’s full set of APIs.
Do not use this initialization method. Instead, use either the
Rule.sagemaker
orRule.custom
class method.Initialize a
Rule
instance. The Rule analyzes tensors emitted during the training of a model and monitors conditions that are critical for the success of a training job.Parameters: - name (str) – The name of the debugger rule.
- image_uri (str) – The URI of the image to be used by the debugger rule.
- instance_type (str) – Type of EC2 instance to use, for example, ‘ml.c4.xlarge’.
- container_local_output_path (str) – The local path to store the Rule output.
- s3_output_path (str) – The location in S3 to store the output.
- volume_size_in_gb (int) – Size in GB of the EBS volume to use for storing data.
- rule_parameters (dict) – A dictionary of parameters for the rule.
- collections_to_save ([sagemaker.debugger.CollectionConfig]) – A list of CollectionConfig objects to be saved.
-
classmethod
sagemaker
(base_config, name=None, container_local_output_path=None, s3_output_path=None, other_trials_s3_input_paths=None, rule_parameters=None, collections_to_save=None)¶ Initialize a
Rule
instance for a built-in SageMaker Debugging Rule. The Rule analyzes tensors emitted during the training of a model and monitors conditions that are critical for the success of a training job.Parameters: - base_config (dict) – This is the base rule config returned from the built-in list of rules. For example, ‘rule_configs.dead_relu()’.
- name (str) – The name of the debugger rule. If one is not provided, the name of the base_config will be used.
- container_local_output_path (str) – The path in the container.
- s3_output_path (str) – The location in S3 to store the output.
- other_trials_s3_input_paths ([str]) – S3 input paths for other trials.
- rule_parameters (dict) – A dictionary of parameters for the rule.
- collections_to_save ([sagemaker.debugger.CollectionConfig]) – A list of CollectionConfig objects to be saved.
Returns: The instance of the built-in Rule.
Return type:
-
classmethod
custom
(name, image_uri, instance_type, volume_size_in_gb, source=None, rule_to_invoke=None, container_local_output_path=None, s3_output_path=None, other_trials_s3_input_paths=None, rule_parameters=None, collections_to_save=None)¶ Initialize a
Rule
instance for a custom rule. The Rule analyzes tensors emitted during the training of a model and monitors conditions that are critical for the success of a training job.Parameters: - name (str) – The name of the debugger rule.
- image_uri (str) – The URI of the image to be used by the debugger rule.
- instance_type (str) – Type of EC2 instance to use, for example, ‘ml.c4.xlarge’.
- volume_size_in_gb (int) – Size in GB of the EBS volume to use for storing data.
- source (str) – A source file containing a rule to invoke. If provided, you must also provide rule_to_invoke. This can either be an S3 uri or a local path.
- rule_to_invoke (str) – The name of the rule to invoke within the source. If provided, you must also provide source.
- container_local_output_path (str) – The path in the container.
- s3_output_path (str) – The location in S3 to store the output.
- other_trials_s3_input_paths ([str]) – S3 input paths for other trials.
- rule_parameters (dict) – A dictionary of parameters for the rule.
- collections_to_save ([sagemaker.debugger.CollectionConfig]) – A list of CollectionConfig objects to be saved.
Returns: The instance of the custom Rule.
Return type:
-
class
sagemaker.debugger.
DebuggerHookConfig
(s3_output_path=None, container_local_output_path=None, hook_parameters=None, collection_configs=None)¶ Bases:
object
DebuggerHookConfig provides options to customize how debugging information is emitted.
Initialize an instance of
DebuggerHookConfig
. DebuggerHookConfig provides options to customize how debugging information is emitted.Parameters: - s3_output_path (str) – The location in S3 to store the output.
- container_local_output_path (str) – The path in the container.
- hook_parameters (dict) – A dictionary of parameters.
- collection_configs ([sagemaker.debugger.CollectionConfig]) – A list of CollectionConfig objects to be provided to the API.
-
class
sagemaker.debugger.
TensorBoardOutputConfig
(s3_output_path, container_local_output_path=None)¶ Bases:
object
TensorBoardOutputConfig provides options to customize debugging visualization using TensorBoard.
Initialize an instance of TensorBoardOutputConfig. TensorBoardOutputConfig provides options to customize debugging visualization using TensorBoard.
Parameters: