Pipelines¶
ConditionStep¶
-
class
sagemaker.workflow.condition_step.
ConditionStep
(name, depends_on=None, display_name=None, description=None, conditions=None, if_steps=None, else_steps=None)¶ Conditional step for pipelines to support conditional branching in the execution of steps.
Construct a ConditionStep for pipelines to support conditional branching.
If all of the conditions in the condition list evaluate to True, the if_steps are marked as ready for execution. Otherwise, the else_steps are marked as ready for execution.
- Parameters
name (str) – The name of the condition step.
depends_on (List[Union[str, Step, StepCollection]]) – The list of Step/StepCollection` names or Step instances or StepCollection instances that the current Step depends on.
display_name (str) – The display name of the condition step.
description (str) – The description of the condition step.
conditions (List[Condition]) – A list of sagemaker.workflow.conditions.Condition instances.
if_steps (List[Union[Step, StepCollection]]) – A list of sagemaker.workflow.steps.Step or sagemaker.workflow.step_collections.StepCollection instances that are marked as ready for execution if the list of conditions evaluates to True.
else_steps (List[Union[Step, StepCollection]]) – A list of sagemaker.workflow.steps.Step or sagemaker.workflow.step_collections.StepCollection instances that are marked as ready for execution if the list of conditions evaluates to False.
Deprecated since version sagemaker.workflow.condition_step.JsonGet.
Conditions¶
-
class
sagemaker.workflow.conditions.
ConditionTypeEnum
(*args, value=<object object>, **kwargs)¶ Condition type enum.
-
class
sagemaker.workflow.conditions.
Condition
(condition_type=NOTHING)¶ Abstract Condition entity.
- Parameters
condition_type (sagemaker.workflow.conditions.ConditionTypeEnum) –
- Return type
-
condition_type
¶ The type of condition.
- Type
Method generated by attrs for class Condition.
-
class
sagemaker.workflow.conditions.
ConditionComparison
(condition_type=NOTHING, left=None, right=None)¶ Generic comparison condition that can be used to derive specific condition comparisons.
- Parameters
condition_type (sagemaker.workflow.conditions.ConditionTypeEnum) –
left (Optional[Union[sagemaker.workflow.execution_variables.ExecutionVariable, sagemaker.workflow.parameters.Parameter, sagemaker.workflow.properties.Properties, str, int, bool, float]]) –
right (Optional[Union[sagemaker.workflow.execution_variables.ExecutionVariable, sagemaker.workflow.parameters.Parameter, sagemaker.workflow.properties.Properties, str, int, bool, float]]) –
- Return type
-
left
¶ The execution variable, parameter, property, or Python primitive value to use in the comparison.
- Type
Union[ConditionValueType, PrimitiveType]
-
right
¶ The execution variable, parameter, property, or Python primitive value to compare to.
- Type
Union[ConditionValueType, PrimitiveType]
Method generated by attrs for class ConditionComparison.
-
class
sagemaker.workflow.conditions.
ConditionEquals
(left, right)¶ A condition for equality comparisons.
Construct A condition for equality comparisons.
- Parameters
left (Union[ConditionValueType, PrimitiveType]) – The execution variable, parameter, property, or Python primitive value to use in the comparison.
right (Union[ConditionValueType, PrimitiveType]) – The execution variable, parameter, property, or Python primitive value to compare to.
-
class
sagemaker.workflow.conditions.
ConditionGreaterThan
(left, right)¶ A condition for greater than comparisons.
Construct an instance of ConditionGreaterThan for greater than comparisons.
- Parameters
left (Union[ConditionValueType, PrimitiveType]) – The execution variable, parameter, property, or Python primitive value to use in the comparison.
right (Union[ConditionValueType, PrimitiveType]) – The execution variable, parameter, property, or Python primitive value to compare to.
-
class
sagemaker.workflow.conditions.
ConditionGreaterThanOrEqualTo
(left, right)¶ A condition for greater than or equal to comparisons.
Construct of ConditionGreaterThanOrEqualTo for greater than or equal to comparisons.
- Parameters
left (Union[ConditionValueType, PrimitiveType]) – The execution variable, parameter, property, or Python primitive value to use in the comparison.
right (Union[ConditionValueType, PrimitiveType]) – The execution variable, parameter, property, or Python primitive value to compare to.
-
class
sagemaker.workflow.conditions.
ConditionLessThan
(left, right)¶ A condition for less than comparisons.
Construct an instance of ConditionLessThan for less than comparisons.
- Parameters
left (Union[ConditionValueType, PrimitiveType]) – The execution variable, parameter, property, or Python primitive value to use in the comparison.
right (Union[ConditionValueType, PrimitiveType]) – The execution variable, parameter, property, or Python primitive value to compare to.
-
class
sagemaker.workflow.conditions.
ConditionLessThanOrEqualTo
(left, right)¶ A condition for less than or equal to comparisons.
Construct ConditionLessThanOrEqualTo for less than or equal to comparisons.
- Parameters
left (Union[ConditionValueType, PrimitiveType]) – The execution variable, parameter, property, or Python primitive value to use in the comparison.
right (Union[ConditionValueType, PrimitiveType]) – The execution variable, parameter, property, or Python primitive value to compare to.
-
class
sagemaker.workflow.conditions.
ConditionIn
(value, in_values)¶ A condition to check membership.
Construct a ConditionIn condition to check membership.
- Parameters
value (Union[ConditionValueType, PrimitiveType]) – The execution variable, parameter, property or primitive value to check for membership.
in_values (List[Union[ConditionValueType, PrimitiveType]]) – The list of values to check for membership in.
-
class
sagemaker.workflow.conditions.
ConditionNot
(expression)¶ A condition for negating another Condition.
Construct a ConditionNot condition for negating another Condition.
- Parameters
expression (sagemaker.workflow.conditions.Condition) –
-
class
sagemaker.workflow.conditions.
ConditionOr
(conditions=None)¶ A condition for taking the logical OR of a list of Condition instances.
Construct a ConditionOr condition.
- Parameters
conditions (List[sagemaker.workflow.conditions.Condition]) –
CheckJobConfig¶
-
class
sagemaker.workflow.check_job_config.
CheckJobConfig
(role, instance_count=1, instance_type='ml.m5.xlarge', volume_size_in_gb=30, volume_kms_key=None, output_kms_key=None, max_runtime_in_seconds=None, base_job_name=None, sagemaker_session=None, env=None, tags=None, network_config=None)¶ Check job config for QualityCheckStep and ClarifyCheckStep.
Constructs a CheckJobConfig instance.
- Parameters
role (str) – An AWS IAM role. The Amazon SageMaker jobs use this role.
instance_count (int) – The number of instances to run the jobs with (default: 1).
instance_type (str) – Type of EC2 instance to use for the job (default: ‘ml.m5.xlarge’).
volume_size_in_gb (int) – Size in GB of the EBS volume to use for storing data during processing (default: 30).
volume_kms_key (str) – A KMS key for the processing volume (default: None).
output_kms_key (str) – The KMS key id for the job’s outputs (default: None).
max_runtime_in_seconds (int) – Timeout in seconds. After this amount of time, Amazon SageMaker terminates the job regardless of its current status. Default: 3600 if not specified
base_job_name (str) – Prefix for the job name. If not specified, a default name is generated based on the training image name and current timestamp (default: None).
sagemaker_session (sagemaker.session.Session) – Session object which manages interactions with Amazon SageMaker APIs and any other AWS services needed (default: None). If not specified, one is created using the default AWS configuration chain.
env (dict) – Environment variables to be passed to the job (default: None).
tags ([dict]) – List of tags to be passed to the job (default: None).
network_config (sagemaker.network.NetworkConfig) – A NetworkConfig object that configures network isolation, encryption of inter-container traffic, security group IDs, and subnets (default: None).
Entities¶
-
class
sagemaker.workflow.entities.
Entity
¶ Base object for workflow entities.
Entities must implement the to_request method.
-
class
sagemaker.workflow.entities.
DefaultEnumMeta
(cls, bases, classdict, **kwds)¶ An EnumMeta which defaults to the first value in the Enum list.
-
class
sagemaker.workflow.entities.
Expression
¶ Base object for expressions.
Expressions must implement the expr property.
-
class
sagemaker.workflow.entities.
PipelineVariable
¶ Base object for pipeline variables
PipelineVariable subclasses must implement the expr property. Its subclasses include:
Parameter
,Properties
,Join
,JsonGet
,ExecutionVariable
.
Execution Variables¶
-
class
sagemaker.workflow.execution_variables.
ExecutionVariable
(name)¶ Pipeline execution variables for workflow.
Create a pipeline execution variable.
- Parameters
name (str) – The name of the execution variable.
-
class
sagemaker.workflow.execution_variables.
ExecutionVariables
¶ Provide access to all available execution variables:
ExecutionVariables.START_DATETIME
ExecutionVariables.CURRENT_DATETIME
ExecutionVariables.PIPELINE_NAME
ExecutionVariables.PIPELINE_ARN
ExecutionVariables.PIPELINE_EXECUTION_ID
ExecutionVariables.PIPELINE_EXECUTION_ARN
ExecutionVariables.TRAINING_JOB_NAME
ExecutionVariables.PROCESSING_JOB_NAME
Functions¶
-
class
sagemaker.workflow.functions.
Join
(on=NOTHING, values=NOTHING)¶ Join together properties.
Examples: Build a Amazon S3 Uri with bucket name parameter and pipeline execution Id and use it as training input:
bucket = ParameterString('bucket', default_value='my-bucket') TrainingInput( s3_data=Join(on='/', ['s3:/', bucket, ExecutionVariables.PIPELINE_EXECUTION_ID]), content_type="text/csv")
-
values
¶ The primitive type values, parameters, step properties, expressions to join.
- Type
List[Union[PrimitiveType, Parameter, Expression]]
Method generated by attrs for class Join.
-
-
class
sagemaker.workflow.functions.
JsonGet
(step_name, property_file, json_path)¶ Get JSON properties from PropertyFiles.
- Parameters
step_name (str) –
property_file (Union[sagemaker.workflow.properties.PropertyFile, str]) –
json_path (str) –
- Return type
-
property_file
¶ Either a PropertyFile instance or the name of a property file.
- Type
Union[PropertyFile, str]
Method generated by attrs for class JsonGet.
Parameters¶
-
class
sagemaker.workflow.parameters.
ParameterTypeEnum
(*args, value=<object object>, **kwargs)¶ Parameter type enum.
-
class
sagemaker.workflow.parameters.
Parameter
(name=NOTHING, parameter_type=NOTHING, default_value=None)¶ Pipeline parameter for workflow.
- Parameters
- Return type
-
parameter_type
¶ The type of the parameter.
- Type
-
default_value
¶ The default value of the parameter.
- Type
PrimitiveType
Method generated by attrs for class Parameter.
-
class
sagemaker.workflow.parameters.
ParameterString
(name, default_value=None, enum_values=None)¶ String parameter for pipelines.
Create a pipeline string parameter.
- Parameters
name (str) – The name of the parameter.
default_value (str) – The default value of the parameter. The default value could be overridden at start of an execution. If not set or it is set to None, a value must be provided at the start of the execution.
enum_values (List[str]) – Enum values for this parameter.
-
class
sagemaker.workflow.parameters.
ParameterInteger
(name, default_value=None)¶ Integer parameter for pipelines.
Create a pipeline integer parameter.
-
class
sagemaker.workflow.parameters.
ParameterFloat
(name, default_value=None)¶ Float parameter for pipelines.
Create a pipeline float parameter.
-
sagemaker.workflow.parameters.
ParameterBoolean
¶
Pipeline¶
-
class
sagemaker.workflow.pipeline.
Pipeline
(name='', parameters=None, pipeline_experiment_config=<sagemaker.workflow.pipeline_experiment_config.PipelineExperimentConfig object>, steps=None, sagemaker_session=None)¶ Pipeline for workflow.
Initialize a Pipeline
- Parameters
name (str) – The name of the pipeline.
parameters (Sequence[Parameter]) – The list of the parameters.
pipeline_experiment_config (Optional[PipelineExperimentConfig]) – If set, the workflow will attempt to create an experiment and trial before executing the steps. Creation will be skipped if an experiment or a trial with the same name already exists. By default, pipeline name is used as experiment name and execution id is used as the trial name. If set to None, no experiment or trial will be created automatically.
steps (Sequence[Union[Step, StepCollection]]) – The list of the non-conditional steps associated with the pipeline. Any steps that are within the if_steps or else_steps of a ConditionStep cannot be listed in the steps of a pipeline. Of particular note, the workflow service rejects any pipeline definitions that specify a step in the list of steps of a pipeline and that step in the if_steps or else_steps of any ConditionStep.
sagemaker_session (sagemaker.session.Session) – Session object that manages interactions with Amazon SageMaker APIs and any other AWS services needed. If not specified, the pipeline creates one using the default AWS configuration chain.
-
to_request
()¶ Gets the request structure for workflow service calls.
-
create
(role_arn, description=None, tags=None, parallelism_config=None)¶ Creates a Pipeline in the Pipelines service.
- Parameters
role_arn (str) – The role arn that is assumed by the pipeline to create step artifacts.
description (str) – A description of the pipeline.
tags (List[Dict[str, str]]) – A list of {“Key”: “string”, “Value”: “string”} dicts as tags.
parallelism_config (Optional[ParallelismConfiguration]) – Parallelism configuration that is applied to each of the executions of the pipeline. It takes precedence over the parallelism configuration of the parent pipeline.
- Returns
A response dict from the service.
- Return type
Dict[str, Any]
-
describe
()¶ Describes a Pipeline in the Workflow service.
- Returns
Response dict from the service. See boto3 client documentation
- Return type
Dict[str, Any]
-
update
(role_arn, description=None, parallelism_config=None)¶ Updates a Pipeline in the Workflow service.
- Parameters
role_arn (str) – The role arn that is assumed by pipelines to create step artifacts.
description (str) – A description of the pipeline.
parallelism_config (Optional[ParallelismConfiguration]) – Parallelism configuration that is applied to each of the executions of the pipeline. It takes precedence over the parallelism configuration of the parent pipeline.
- Returns
A response dict from the service.
- Return type
Dict[str, Any]
-
upsert
(role_arn, description=None, tags=None, parallelism_config=None)¶ Creates a pipeline or updates it, if it already exists.
- Parameters
role_arn (str) – The role arn that is assumed by workflow to create step artifacts.
description (str) – A description of the pipeline.
tags (List[Dict[str, str]]) – A list of {“Key”: “string”, “Value”: “string”} dicts as tags.
(Optional[Config for parallel steps, Parallelism configuration that (parallelism_config) – is applied to each of. the executions
parallelism_config (sagemaker.workflow.parallelism_config.ParallelismConfiguration) –
- Returns
response dict from service
- Return type
Dict[str, Any]
-
delete
()¶ Deletes a Pipeline in the Workflow service.
- Returns
A response dict from the service.
- Return type
Dict[str, Any]
-
start
(parameters=None, execution_display_name=None, execution_description=None, parallelism_config=None)¶ Starts a Pipeline execution in the Workflow service.
- Parameters
parameters (Dict[str, Union[str, bool, int, float]]) – values to override pipeline parameters.
execution_display_name (str) – The display name of the pipeline execution.
execution_description (str) – A description of the execution.
parallelism_config (Optional[ParallelismConfiguration]) – Parallelism configuration that is applied to each of the executions of the pipeline. It takes precedence over the parallelism configuration of the parent pipeline.
- Returns
A _PipelineExecution instance, if successful.
-
class
sagemaker.workflow.pipeline.
_PipelineExecution
(arn, sagemaker_session=NOTHING)¶ Internal class for encapsulating pipeline execution instances.
- Parameters
arn (str) –
sagemaker_session (sagemaker.session.Session) –
- Return type
-
sagemaker_session
¶ Session object which manages interactions with Amazon SageMaker APIs and any other AWS services needed. If not specified, the pipeline creates one using the default AWS configuration chain.
Method generated by attrs for class _PipelineExecution.
-
stop
()¶ Stops a pipeline execution.
-
describe
()¶ Describes a pipeline execution.
- Returns
Information about the pipeline execution. See boto3 client describe_pipeline_execution.
-
list_steps
()¶ Describes a pipeline execution’s steps.
- Returns
Information about the steps of the pipeline execution. See boto3 client list_pipeline_execution_steps.
Pipeline Context¶
-
class
sagemaker.workflow.pipeline_context.
PipelineSession
(boto_session=None, sagemaker_client=None, default_bucket=None, settings=<sagemaker.session_settings.SessionSettings object>)¶ Managing interactions with SageMaker APIs and AWS services needed under Pipeline Context
This class inherits the SageMaker session, it provides convenient methods for manipulating entities and resources that Amazon SageMaker uses, such as training jobs, endpoints, and input datasets in S3. When composing SageMaker Model-Building Pipeline, PipelineSession is recommended over regular SageMaker Session
Initialize a
PipelineSession
.- Parameters
boto_session (boto3.session.Session) – The underlying Boto3 session which AWS service calls are delegated to (default: None). If not provided, one is created with default AWS configuration chain.
sagemaker_client (boto3.SageMaker.Client) – Client which makes Amazon SageMaker service calls other than
InvokeEndpoint
(default: None). Estimators created using thisSession
use this client. If not provided, one will be created using this instance’sboto_session
.default_bucket (str) – The default Amazon S3 bucket to be used by this session. This will be created the next time an Amazon S3 bucket is needed (by calling
default_bucket()
). If not provided, a default bucket will be created based on the following format: “sagemaker-{region}-{aws-account-id}”. Example: “sagemaker-my-custom-bucket”.settings (sagemaker.session_settings.SessionSettings) – Optional. Set of optional parameters to apply to the session.
-
property
context
¶ Hold contextual information useful to the session
-
init_model_step_arguments
(model)¶ Create a _ModelStepArguments (if not exist) as pipeline context
- Parameters
model (Model or PipelineModel) – A sagemaker.model.Model or sagemaker.pipeline.PipelineModel instance
-
class
sagemaker.workflow.pipeline_context.
LocalPipelineSession
(boto_session=None, default_bucket=None, s3_endpoint_url=None, disable_local_code=False)¶ Managing a session that executes Sagemaker pipelines and jobs locally in a pipeline context.
This class inherits from the LocalSession and PipelineSession classes. When running Sagemaker pipelines locally, this class is preferred over LocalSession.
Initialize a
LocalPipelineSession
.- Parameters
boto_session (boto3.session.Session) – The underlying Boto3 session which AWS service calls are delegated to (default: None). If not provided, one is created with default AWS configuration chain.
default_bucket (str) – The default Amazon S3 bucket to be used by this session. This will be created the next time an Amazon S3 bucket is needed (by calling
default_bucket()
). If not provided, a default bucket will be created based on the following format: “sagemaker-{region}-{aws-account-id}”. Example: “sagemaker-my-custom-bucket”.s3_endpoint_url (str) – Override the default endpoint URL for Amazon S3, if set (default: None).
disable_local_code (bool) – Set to True to override the default AWS configuration chain to disable the local.local_code setting, which may not be supported for some SDK features (default: False).
Parallelism Configuration¶
-
class
sagemaker.workflow.parallelism_config.
ParallelismConfiguration
(max_parallel_execution_steps)¶ Parallelism config for SageMaker pipeline.
Create a ParallelismConfiguration
- Parameters
int (max_parallel_execution_steps,) – max number of steps which could be parallelized
max_parallel_execution_steps (int) –
Pipeline Experiment Config¶
-
class
sagemaker.workflow.pipeline_experiment_config.
PipelineExperimentConfig
(experiment_name, trial_name)¶ Experiment config for SageMaker pipeline.
Create a PipelineExperimentConfig
Examples: Use pipeline name as the experiment name and pipeline execution id as the trial name:
PipelineExperimentConfig( ExecutionVariables.PIPELINE_NAME, ExecutionVariables.PIPELINE_EXECUTION_ID)
Use a customized experiment name and pipeline execution id as the trial name:
PipelineExperimentConfig( 'MyExperiment', ExecutionVariables.PIPELINE_EXECUTION_ID)
- Parameters
experiment_name (Union[str, Parameter, ExecutionVariable, Expression]) – the name of the experiment that will be created.
trial_name (Union[str, Parameter, ExecutionVariable, Expression]) – the name of the trial that will be created.
Properties¶
-
class
sagemaker.workflow.properties.
PropertiesMeta
(*args, **kwargs)¶ Load an internal shapes attribute from the botocore service model
for sagemaker and emr service.
Loads up the shapes from the botocore service model.
-
class
sagemaker.workflow.properties.
Properties
(step_name, path=None, shape_name=None, shape_names=None, service_name='sagemaker')¶ Properties for use in workflow expressions.
Create a Properties instance representing the given shape.
-
class
sagemaker.workflow.properties.
PropertiesList
(step_name, path, shape_name=None, service_name='sagemaker')¶ PropertiesList for use in workflow expressions.
Create a PropertiesList instance representing the given shape.
-
class
sagemaker.workflow.properties.
PropertyFile
(name, output_name, path)¶ Provides a property file struct.
Method generated by attrs for class PropertyFile.
Step Collections¶
-
class
sagemaker.workflow.step_collections.
StepCollection
(name, steps=NOTHING)¶ A wrapper of pipeline steps for workflow.
- Parameters
name (str) –
steps (List[sagemaker.workflow.steps.Step]) –
- Return type
Method generated by attrs for class StepCollection.
-
class
sagemaker.workflow.step_collections.
RegisterModel
(name, content_types, response_types, inference_instances=None, transform_instances=None, estimator=None, model_data=None, depends_on=None, repack_model_step_retry_policies=None, register_model_step_retry_policies=None, model_package_group_name=None, model_metrics=None, approval_status=None, image_uri=None, compile_model_family=None, display_name=None, description=None, tags=None, model=None, drift_check_baselines=None, customer_metadata_properties=None, domain=None, sample_payload_url=None, task=None, framework=None, framework_version=None, nearest_model_name=None, data_input_configuration=None, **kwargs)¶ Register Model step collection for workflow.
Construct steps _RepackModelStep and _RegisterModelStep based on the estimator.
- Parameters
name (str) – The name of the training step.
estimator (sagemaker.estimator.EstimatorBase) – The estimator instance.
model_data – The S3 uri to the model data from training.
content_types (list) – The supported MIME types for the input data (default: None).
response_types (list) – The supported MIME types for the output data (default: None).
inference_instances (list) – A list of the instance types that are used to generate inferences in real-time (default: None).
transform_instances (list) – A list of the instance types on which a transformation job can be run or on which an endpoint can be deployed (default: None).
depends_on (List[Union[str, Step, StepCollection]]) – The list of Step/StepCollection names or Step instances or StepCollection instances that the first step in the collection depends on (default: None).
repack_model_step_retry_policies (List[RetryPolicy]) – The list of retry policies for the repack model step
register_model_step_retry_policies (List[RetryPolicy]) – The list of retry policies for register model step
model_package_group_name (str) – The Model Package Group name or Arn, exclusive to model_package_name, using model_package_group_name makes the Model Package versioned (default: None).
model_metrics (ModelMetrics) – ModelMetrics object (default: None).
approval_status (str) – Model Approval Status, values can be “Approved”, “Rejected”, or “PendingManualApproval” (default: “PendingManualApproval”).
image_uri (str) – The container image uri for Model Package, if not specified, Estimator’s training container image is used (default: None).
compile_model_family (str) – The instance family for the compiled model. If specified, a compiled model is used (default: None).
description (str) – Model Package description (default: None).
tags (List[dict[str, str]]) – The list of tags to attach to the model package group. Note that tags will only be applied to newly created model package groups; if the name of an existing group is passed to “model_package_group_name”, tags will not be applied.
model (object or Model) – A PipelineModel object that comprises a list of models which gets executed as a serial inference pipeline or a Model object.
drift_check_baselines (DriftCheckBaselines) – DriftCheckBaselines object (default: None).
customer_metadata_properties (dict[str, str]) – A dictionary of key-value paired metadata properties (default: None).
domain (str) – Domain values can be “COMPUTER_VISION”, “NATURAL_LANGUAGE_PROCESSING”, “MACHINE_LEARNING” (default: None).
sample_payload_url (str) – The S3 path where the sample payload is stored (default: None).
task (str) – Task values which are supported by Inference Recommender are “FILL_MASK”, “IMAGE_CLASSIFICATION”, “OBJECT_DETECTION”, “TEXT_GENERATION”, “IMAGE_SEGMENTATION”, “CLASSIFICATION”, “REGRESSION”, “OTHER” (default: None).
framework (str) – Machine learning framework of the model package container image (default: None).
framework_version (str) – Framework version of the Model Package Container Image (default: None).
nearest_model_name (str) – Name of a pre-trained machine learning benchmarked by Amazon SageMaker Inference Recommender (default: None).
data_input_configuration (str) – Input object for the model (default: None).
**kwargs – additional arguments to create_model.
-
class
sagemaker.workflow.step_collections.
EstimatorTransformer
(name, estimator, model_data, model_inputs, instance_count, instance_type, transform_inputs, description=None, display_name=None, image_uri=None, predictor_cls=None, env=None, strategy=None, assemble_with=None, output_path=None, output_kms_key=None, accept=None, max_concurrent_transforms=None, max_payload=None, tags=None, volume_kms_key=None, depends_on=None, repack_model_step_retry_policies=None, model_step_retry_policies=None, transform_step_retry_policies=None, **kwargs)¶ Creates a Transformer step collection for workflow.
Construct steps required for a Transformer step collection:
An estimator-centric step collection. It models what happens in workflows when invoking the transform() method on an estimator instance: First, if custom model artifacts are required, a _RepackModelStep is included. Second, a CreateModelStep with the model data passed in from a training step or other training job output. Finally, a TransformerStep.
If repacking the model artifacts is not necessary, only the CreateModelStep and TransformerStep are in the step collection.
- Parameters
name (str) – The name of the Transform Step.
estimator (sagemaker.estimator.EstimatorBase) – The estimator instance.
instance_count (int) – The number of EC2 instances to use.
instance_type (str) – The type of EC2 instance to use.
strategy (str) – The strategy used to decide how to batch records in a single request (default: None). Valid values: ‘MultiRecord’ and ‘SingleRecord’.
assemble_with (str) – How the output is assembled (default: None). Valid values: ‘Line’ or ‘None’.
output_path (str) – The S3 location for saving the transform result. If not specified, results are stored to a default bucket.
output_kms_key (str) – Optional. A KMS key ID for encrypting the transform output (default: None).
accept (str) – The accept header passed by the client to the inference endpoint. If it is supported by the endpoint, it will be the format of the batch transform output.
env (dict) – The Environment variables to be set for use during the transform job (default: None).
depends_on (List[Union[str, Step, StepCollection]]) – The list of Step/StepCollection names or Step instances or StepCollection instances that the first step in the collection depends on (default: None).
repack_model_step_retry_policies (List[RetryPolicy]) – The list of retry policies for the repack model step
model_step_retry_policies (List[RetryPolicy]) – The list of retry policies for model step
transform_step_retry_policies (List[RetryPolicy]) – The list of retry policies for transform step
description (str) –
display_name (str) –
-
class
sagemaker.workflow.model_step.
ModelStep
(name, step_args, depends_on=None, retry_policies=None, display_name=None, description=None)¶ ModelStep for SageMaker Pipelines Workflows.
Constructs a ModelStep.
- Parameters
name (str) – The name of the ModelStep. A name is required and must be unique within a pipeline.
step_args (_ModelStepArguments) –
The arguments for the ModelStep definition, generated by invoking the
register()
orcreate()
under thePipelineSession
. Example:model = Model(sagemaker_session=PipelineSession()) model_step = ModelStep(step_args=model.register())
depends_on (List[Union[str, Step, StepCollection]]) – A list of Step or StepCollection names or Step instances or StepCollection instances that it depends on. If a listed Step name does not exist, an error is returned (default: None).
retry_policies (List[RetryPolicy] or Dict[str, List[RetryPolicy]]) –
The list of retry policies for the ModelStep (default: None).
If a list of retry policies is provided, it would be applied to all steps in the ModelStep collection. Note: in this case, SageMakerJobStepRetryPolicy is not allowed, since create/register model step does not support it. Please find the example below:
ModelStep( ... retry_policies=[ StepRetryPolicy(...), ], )
If a dict is provided, it can specify different retry policies for different types of steps in the ModelStep collection. Similarly, SageMakerJobStepRetryPolicy is not allowed for create/register model step. See examples below:
ModelStep( ... retry_policies=dict( register_model_retry_policies=[ StepRetryPolicy(...), ], repack_model_retry_policies=[ SageMakerJobStepRetryPolicy(...), ], ) )
or
ModelStep( ... retry_policies=dict( create_model_retry_policies=[ StepRetryPolicy(...), ], repack_model_retry_policies=[ SageMakerJobStepRetryPolicy(...), ], ) )
display_name (str) – The display name of the ModelStep. The display name provides better UI readability. (default: None).
description (str) – The description of the ModelStep (default: None).
-
class
sagemaker.workflow.monitor_batch_transform_step.
MonitorBatchTransformStep
(name, transform_step_args, monitor_configuration, check_job_configuration, monitor_before_transform=False, fail_on_violation=True, supplied_baseline_statistics=None, supplied_baseline_constraints=None, display_name=None, description=None)¶ Creates a Transformer step with Quality or Clarify check step
Used to monitor the inputs and outputs of the batch transform job.
Construct a step collection of TransformStep, QualityCheckStep or ClarifyCheckStep
- Parameters
name (str) – The name of the MonitorBatchTransformStep. The corresponding transform step will be named {name}-transform; and the corresponding check step will be named {name}-monitoring
transform_step_args (_JobStepArguments) – the transform step transform arguments.
(Union[ (monitor_configuration) – sagemaker.workflow.quality_check_step.QualityCheckConfig, sagemaker.workflow.quality_check_step.ClarifyCheckConfig
]) – the monitoring configuration used for run model monitoring.
check_job_configuration (sagemaker.workflow.check_job_config.CheckJobConfig) – the check job (processing job) cluster resource configuration.
monitor_before_transform (bool) – If to run data quality or model explainability monitoring type, a true value of this flag indicates running the check step before the transform job.
fail_on_violation (Union[bool, PipelineVariable]) – A opt-out flag to not to fail the check step when a violation is detected.
supplied_baseline_statistics (Union[str, PipelineVariable]) – The S3 path to the supplied statistics object representing the statistics JSON file which will be used for drift to check (default: None).
supplied_baseline_constraints (Union[str, PipelineVariable]) – The S3 path to the supplied constraints object representing the constraints JSON file which will be used for drift to check (default: None).
display_name (str) – The display name of the MonitorBatchTransformStep. The display name provides better UI readability. The corresponding transform step will be named {display_name}-transform; and the corresponding check step will be named {display_name}-monitoring (default: None).
description (str) – The description of the MonitorBatchTransformStep (default: None).
monitor_configuration (Union[sagemaker.workflow.quality_check_step.QualityCheckConfig, sagemaker.workflow.clarify_check_step.ClarifyCheckConfig]) –
Steps¶
-
class
sagemaker.workflow.steps.
StepTypeEnum
(*args, value=<object object>, **kwargs)¶ Enum of Step types.
-
class
sagemaker.workflow.steps.
Step
(name=NOTHING, display_name=None, description=None, step_type=NOTHING, depends_on=None)¶ Pipeline Step for SageMaker Pipelines Workflows.
- Parameters
name (str) –
display_name (Optional[str]) –
description (Optional[str]) –
step_type (sagemaker.workflow.steps.StepTypeEnum) –
depends_on (Optional[List[Union[str, Step, StepCollection]]]) –
- Return type
-
step_type
¶ The type of the Step.
- Type
-
depends_on
¶ The list of Step/StepCollection names or Step instances or StepCollection instances that the current Step depends on.
- Type
List[Union[str, Step, StepCollection]]
Method generated by attrs for class Step.
-
class
sagemaker.workflow.steps.
TrainingStep
(name, step_args=None, estimator=None, display_name=None, description=None, inputs=None, cache_config=None, depends_on=None, retry_policies=None)¶ TrainingStep for SageMaker Pipelines Workflows.
Construct a TrainingStep, given an EstimatorBase instance.
In addition to the EstimatorBase instance, the other arguments are those that are supplied to the fit method of the sagemaker.estimator.Estimator.
- Parameters
name (str) – The name of the TrainingStep.
step_args (_JobStepArguments) – The arguments for the TrainingStep definition.
estimator (EstimatorBase) – A sagemaker.estimator.EstimatorBase instance.
display_name (str) – The display name of the TrainingStep.
description (str) – The description of the TrainingStep.
inputs (Union[str, dict, TrainingInput, FileSystemInput]) –
Information about the training data. This can be one of three types:
(str) the S3 location where training data is saved, or a file:// path in local mode.
(dict[str, str] or dict[str, sagemaker.inputs.TrainingInput]) If using multiple channels for training data, you can specify a dictionary mapping channel names to strings or
TrainingInput()
objects.(sagemaker.inputs.TrainingInput) - channel configuration for S3 data sources that can provide additional information as well as the path to the training dataset. See
sagemaker.inputs.TrainingInput()
for full details.(sagemaker.inputs.FileSystemInput) - channel configuration for a file system data source that can provide additional information as well as the path to the training dataset.
cache_config (CacheConfig) – A sagemaker.workflow.steps.CacheConfig instance.
depends_on (List[Union[str, Step, StepCollection]]) – A list of Step/StepCollection names or Step instances or StepCollection instances that this TrainingStep depends on.
retry_policies (List[RetryPolicy]) – A list of retry policies.
-
class
sagemaker.workflow.steps.
TuningStep
(name, step_args=None, tuner=None, display_name=None, description=None, inputs=None, job_arguments=None, cache_config=None, depends_on=None, retry_policies=None)¶ TuningStep for SageMaker Pipelines Workflows.
Construct a TuningStep, given a HyperparameterTuner instance.
In addition to the HyperparameterTuner instance, the other arguments are those that are supplied to the fit method of the sagemaker.tuner.HyperparameterTuner.
- Parameters
name (str) – The name of the TuningStep.
step_args (_JobStepArguments) – The arguments for the TuningStep definition.
tuner (HyperparameterTuner) – A sagemaker.tuner.HyperparameterTuner instance.
display_name (str) – The display name of the TuningStep.
description (str) – The description of the TuningStep.
inputs –
Information about the training data. Please refer to the fit() method of the associated estimator, as this can take any of the following forms:
(str) - The S3 location where training data is saved.
- (dict[str, str] or dict[str, sagemaker.inputs.TrainingInput]) -
If using multiple channels for training data, you can specify a dictionary mapping channel names to strings or
TrainingInput()
objects.
- (sagemaker.inputs.TrainingInput) - Channel configuration for S3 data sources
that can provide additional information about the training dataset. See
sagemaker.inputs.TrainingInput()
for full details.
- (sagemaker.session.FileSystemInput) - channel configuration for
a file system data source that can provide additional information as well as the path to the training dataset.
- (sagemaker.amazon.amazon_estimator.RecordSet) - A collection of
Amazon :class:~`Record` objects serialized and stored in S3. For use with an estimator for an Amazon algorithm.
- (sagemaker.amazon.amazon_estimator.FileSystemRecordSet) -
Amazon SageMaker channel configuration for a file system data source for Amazon algorithms.
- (list[sagemaker.amazon.amazon_estimator.RecordSet]) - A list of
:class:~`sagemaker.amazon.amazon_estimator.RecordSet` objects, where each instance is a different channel of training data.
- (list[sagemaker.amazon.amazon_estimator.FileSystemRecordSet]) - A list of
:class:~`sagemaker.amazon.amazon_estimator.FileSystemRecordSet` objects, where each instance is a different channel of training data.
job_arguments (List[str]) – A list of strings to be passed into the processing job. Defaults to None.
cache_config (CacheConfig) – A sagemaker.workflow.steps.CacheConfig instance.
depends_on (List[Union[str, Step, StepCollection]]) – A list of Step/StepCollection names or Step instances or StepCollection instances that this TuningStep depends on.
retry_policies (List[RetryPolicy]) – A list of retry policies.
-
sagemaker.workflow.steps.TuningStep.
get_top_model_s3_uri
(self, top_k, s3_bucket, prefix='')¶ Get the model artifact S3 URI from the top performing training jobs.
- Parameters
top_k (int) – The index of the top performing training job tuning step stores up to 50 top performing training jobs. A valid top_k value is from 0 to 49. The best training job model is at index 0.
s3_bucket (str) – The S3 bucket to store the training job output artifact.
prefix (str) – The S3 key prefix to store the training job output artifact.
- Return type
-
class
sagemaker.workflow.steps.
TransformStep
(name, step_args=None, transformer=None, inputs=None, display_name=None, description=None, cache_config=None, depends_on=None, retry_policies=None)¶ TransformStep for SageMaker Pipelines Workflows.
Constructs a TransformStep, given a Transformer instance.
In addition to the Transformer instance, the other arguments are those that are supplied to the transform method of the sagemaker.transformer.Transformer.
- Parameters
name (str) – The name of the TransformStep.
step_args (_JobStepArguments) – The arguments for the TransformStep definition.
transformer (Transformer) – A sagemaker.transformer.Transformer instance.
inputs (TransformInput) – A sagemaker.inputs.TransformInput instance.
cache_config (CacheConfig) – A sagemaker.workflow.steps.CacheConfig instance.
display_name (str) – The display name of the TransformStep.
description (str) – The description of the TransformStep.
depends_on (List[Union[str, Step, StepCollection]]) – A list of Step/StepCollection names or Step instances or StepCollection instances that this TransformStep depends on.
retry_policies (List[RetryPolicy]) – A list of retry policies.
-
class
sagemaker.workflow.steps.
ProcessingStep
(name, step_args=None, processor=None, display_name=None, description=None, inputs=None, outputs=None, job_arguments=None, code=None, property_files=None, cache_config=None, depends_on=None, retry_policies=None, kms_key=None)¶ ProcessingStep for SageMaker Pipelines Workflows.
Construct a ProcessingStep, given a Processor instance.
In addition to the Processor instance, the other arguments are those that are supplied to the process method of the sagemaker.processing.Processor.
- Parameters
name (str) – The name of the ProcessingStep.
step_args (_JobStepArguments) – The arguments for the ProcessingStep definition.
processor (Processor) – A sagemaker.processing.Processor instance.
display_name (str) – The display name of the ProcessingStep.
description (str) – The description of the ProcessingStep
inputs (List[ProcessingInput]) – A list of sagemaker.processing.ProcessorInput instances. Defaults to None.
outputs (List[ProcessingOutput]) – A list of sagemaker.processing.ProcessorOutput instances. Defaults to None.
job_arguments (List[str]) – A list of strings to be passed into the processing job. Defaults to None.
code (str) – This can be an S3 URI or a local path to a file with the framework script to run. Defaults to None.
property_files (List[PropertyFile]) – A list of property files that workflow looks for and resolves from the configured processing output list.
cache_config (CacheConfig) – A sagemaker.workflow.steps.CacheConfig instance.
depends_on (List[Union[str, Step, StepCollection]]) – A list of Step/StepCollection names or Step instances or StepCollection instances that this ProcessingStep depends on.
retry_policies (List[RetryPolicy]) – A list of retry policies.
kms_key (str) – The ARN of the KMS key that is used to encrypt the user code file. Defaults to None.
-
class
sagemaker.workflow.steps.
CreateModelStep
(name, step_args=None, model=None, inputs=None, depends_on=None, retry_policies=None, display_name=None, description=None)¶ CreateModelStep for SageMaker Pipelines Workflows.
Construct a CreateModelStep, given an sagemaker.model.Model instance.
In addition to the Model instance, the other arguments are those that are supplied to the _create_sagemaker_model method of the sagemaker.model.Model._create_sagemaker_model.
- Parameters
name (str) – The name of the CreateModelStep.
step_args (dict) – The arguments for the CreateModelStep definition (default: None).
model (Model or PipelineModel) – A sagemaker.model.Model or sagemaker.pipeline.PipelineModel instance (default: None).
inputs (CreateModelInput) – A sagemaker.inputs.CreateModelInput instance. (default: None).
depends_on (List[Union[str, Step, StepCollection]]) – A list of Step/StepCollection names or Step instances or StepCollection instances that this CreateModelStep depends on (default: None).
retry_policies (List[RetryPolicy]) – A list of retry policies (default: None).
display_name (str) – The display name of the CreateModelStep (default: None).
description (str) – The description of the CreateModelStep (default: None).
-
class
sagemaker.workflow.callback_step.
CallbackStep
(name, sqs_queue_url, inputs, outputs, display_name=None, description=None, cache_config=None, depends_on=None)¶ Callback step for workflow.
Constructs a CallbackStep.
- Parameters
name (str) – The name of the callback step.
sqs_queue_url (str) – An SQS queue URL for receiving callback messages.
inputs (dict) – Input arguments that will be provided in the SQS message body of callback messages.
outputs (List[CallbackOutput]) – Outputs that can be provided when completing a callback.
display_name (str) – The display name of the callback step.
description (str) – The description of the callback step.
cache_config (CacheConfig) – A sagemaker.workflow.steps.CacheConfig instance.
depends_on (List[Union[str, Step, StepCollection]]) – A list of Step/StepCollection names or Step instances or StepCollection instances that this CallbackStep depends on.
-
class
sagemaker.workflow.steps.
CacheConfig
(enable_caching=False, expire_after=None)¶ Configuration class to enable caching in SageMaker Pipelines Workflows.
If caching is enabled, the pipeline attempts to find a previous execution of a Step that was called with the same arguments. Step caching only considers successful execution. If a successful previous execution is found, the pipeline propagates the values from the previous execution rather than recomputing the Step. When multiple successful executions exist within the timeout period, it uses the result for the most recent successful execution.
-
expire_after
¶ If Step caching is enabled, a timeout also needs to defined. It defines how old a previous execution can be to be considered for reuse. Value should be an ISO 8601 duration string. Defaults to None.
Examples:
'p30d' # 30 days 'P4DT12H' # 4 days and 12 hours 'T12H' # 12 hours
- Type
Method generated by attrs for class CacheConfig.
-
-
class
sagemaker.workflow.lambda_step.
LambdaStep
(name, lambda_func, display_name=None, description=None, inputs=None, outputs=None, cache_config=None, depends_on=None)¶ Lambda step for workflow.
Constructs a LambdaStep.
- Parameters
name (str) – The name of the lambda step.
display_name (str) – The display name of the Lambda step.
description (str) – The description of the Lambda step.
lambda_func (str) – An instance of sagemaker.lambda_helper.Lambda. If lambda arn is specified in the instance, LambdaStep just invokes the function, else lambda function will be created while creating the pipeline.
inputs (dict) – Input arguments that will be provided to the lambda function.
outputs (List[LambdaOutput]) – List of outputs from the lambda function.
cache_config (CacheConfig) – A sagemaker.workflow.steps.CacheConfig instance.
depends_on (List[Union[str, Step, StepCollection]]) – A list of Step/StepCollection names or Step instances or StepCollection instances that this LambdaStep depends on.
-
class
sagemaker.workflow.quality_check_step.
QualityCheckConfig
(baseline_dataset, dataset_format, *, output_s3_uri=None, post_analytics_processor_script=None)¶ Quality Check Config.
- Parameters
baseline_dataset (Union[str, sagemaker.workflow.entities.PipelineVariable]) –
dataset_format (dict) –
output_s3_uri (Union[str, sagemaker.workflow.entities.PipelineVariable]) –
post_analytics_processor_script (str) –
- Return type
-
baseline_dataset
¶ The path to the baseline_dataset file. This can be a local path or an S3 uri string
- Type
-
output_s3_uri
¶ Desired S3 destination of the constraint_violations and statistics json files (default: None). If not specified an auto generated path will be used: “s3://<default_session_bucket>/model-monitor/baselining/<job_name>/results”
- Type
-
post_analytics_processor_script
¶ The path to the record post-analytics processor script (default: None). This can be a local path or an S3 uri string but CANNOT be any type of the PipelineVariable.
- Type
Method generated by attrs for class QualityCheckConfig.
-
class
sagemaker.workflow.quality_check_step.
QualityCheckStep
(name, quality_check_config, check_job_config, skip_check=False, fail_on_violation=True, register_new_baseline=False, model_package_group_name=None, supplied_baseline_statistics=None, supplied_baseline_constraints=None, display_name=None, description=None, cache_config=None, depends_on=None)¶ QualityCheck step for workflow.
Constructs a QualityCheckStep.
- Parameters
name (str) – The name of the QualityCheckStep step.
quality_check_config (QualityCheckConfig) – A QualityCheckConfig instance.
check_job_config (CheckJobConfig) – A CheckJobConfig instance.
skip_check (bool or PipelineVariable) – Whether the check should be skipped (default: False).
fail_on_violation (bool or PipelineVariable) – Whether to fail the step if violation detected (default: True).
register_new_baseline (bool or PipelineVariable) – Whether the new baseline should be registered (default: False).
model_package_group_name (str or PipelineVariable) – The name of a registered model package group, among which the baseline will be fetched from the latest approved model (default: None).
supplied_baseline_statistics (str or PipelineVariable) – The S3 path to the supplied statistics object representing the statistics JSON file which will be used for drift to check (default: None).
supplied_baseline_constraints (str or PipelineVariable) – The S3 path to the supplied constraints object representing the constraints JSON file which will be used for drift to check (default: None).
display_name (str) – The display name of the QualityCheckStep step (default: None).
description (str) – The description of the QualityCheckStep step (default: None).
cache_config (CacheConfig) – A sagemaker.workflow.steps.CacheConfig instance (default: None).
depends_on (List[Union[str, Step, StepCollection]]) – A list of Step/StepCollection names or Step instances or StepCollection instances that this QualityCheckStep depends on (default: None).
-
class
sagemaker.workflow.clarify_check_step.
ClarifyCheckConfig
(data_config, *, kms_key=None, monitoring_analysis_config_uri=None)¶ Clarify Check Config
- Parameters
data_config (sagemaker.clarify.DataConfig) –
kms_key (str) –
monitoring_analysis_config_uri (str) –
- Return type
-
data_config
¶ Config of the input/output data.
- Type
-
kms_key
¶ The ARN of the KMS key that is used to encrypt the user code file (default: None). This field CANNOT be any type of the PipelineVariable.
- Type
-
monitoring_analysis_config_uri
¶ (str): The uri of monitoring analysis config. This field does not take input. It will be generated once uploading the created analysis config file.
Method generated by attrs for class ClarifyCheckConfig.
-
class
sagemaker.workflow.clarify_check_step.
ClarifyCheckStep
(name, clarify_check_config, check_job_config, skip_check=False, fail_on_violation=True, register_new_baseline=False, model_package_group_name=None, supplied_baseline_constraints=None, display_name=None, description=None, cache_config=None, depends_on=None)¶ ClarifyCheckStep step for workflow.
Constructs a ClarifyCheckStep.
- Parameters
name (str) – The name of the ClarifyCheckStep step.
clarify_check_config (ClarifyCheckConfig) – A ClarifyCheckConfig instance.
check_job_config (CheckJobConfig) – A CheckJobConfig instance.
skip_check (bool or PipelineVariable) – Whether the check should be skipped (default: False).
fail_on_violation (bool or PipelineVariable) – Whether to fail the step if violation detected (default: True).
register_new_baseline (bool or PipelineVariable) – Whether the new baseline should be registered (default: False).
model_package_group_name (str or PipelineVariable) – The name of a registered model package group, among which the baseline will be fetched from the latest approved model (default: None).
supplied_baseline_constraints (str or PipelineVariable) – The S3 path to the supplied constraints object representing the constraints JSON file which will be used for drift to check (default: None).
display_name (str) – The display name of the ClarifyCheckStep step (default: None).
description (str) – The description of the ClarifyCheckStep step (default: None).
cache_config (CacheConfig) – A sagemaker.workflow.steps.CacheConfig instance (default: None).
depends_on (List[Union[str, Step, StepCollection]]) – A list of Step/StepCollection names or Step instances or StepCollection instances that this ClarifyCheckStep depends on (default: None).
-
class
sagemaker.workflow.fail_step.
FailStep
(name, error_message=None, display_name=None, description=None, depends_on=None)¶ FailStep for SageMaker Pipelines Workflows.
Constructs a FailStep.
- Parameters
name (str) – The name of the FailStep. A name is required and must be unique within a pipeline.
error_message (str or PipelineVariable) – An error message defined by the user. Once the FailStep is reached, the execution fails and the error message is set as the failure reason (default: None).
display_name (str) – The display name of the FailStep. The display name provides better UI readability. (default: None).
description (str) – The description of the FailStep (default: None).
depends_on (List[Union[str, Step, StepCollection]]) – A list of Step/StepCollection names or Step instances or StepCollection instances that this FailStep depends on. If a listed Step name does not exist, an error is returned (default: None).