Airflow¶

training_config¶

sagemaker.workflow.airflow.training_config(estimator, inputs=None, job_name=None, mini_batch_size=None)¶

Export Airflow training config from an estimator

Parameters:	estimator (sagemaker.estimator.EstimatorBase) – The estimator to export training config from. Can be a BYO estimator, Framework estimator or Amazon algorithm estimator. inputs – Information about the training data. Please refer to the `fit()` method of the associated estimator, as this can take any of the following forms: * (str) - The S3 location where training data is saved. (dict[str, str] or dict[str, sagemaker.session.s3_input]) - If using multiple channels for training data, you can specify a dict mapping channel names to strings or `s3_input()` objects. (sagemaker.session.s3_input) - Channel configuration for S3 data sources that can provide additional information about the training dataset. See `sagemaker.session.s3_input()` for full details. (sagemaker.amazon.amazon_estimator.RecordSet) - A collection of Amazon :class:~`Record` objects serialized and stored in S3. For use with an estimator for an Amazon algorithm. (list[sagemaker.amazon.amazon_estimator.RecordSet]) - A list of :class:~`sagemaker.amazon.amazon_estimator.RecordSet` objects, where each instance is a different channel of training data. job_name (str) – Specify a training job name if needed. mini_batch_size (int) – Specify this argument only when estimator is a built-in estimator of an Amazon algorithm. For other estimators, batch size should be specified in the estimator.
Returns:	Training config that can be directly used by SageMakerTrainingOperator in Airflow.
Return type:	dict

tuning_config¶

sagemaker.workflow.airflow.tuning_config(tuner, inputs, job_name=None, include_cls_metadata=False, mini_batch_size=None)¶

Export Airflow tuning config from a HyperparameterTuner

Parameters:	tuner (sagemaker.tuner.HyperparameterTuner) – The tuner to export tuning config from. inputs – Information about the training data. Please refer to the `fit()` method of the associated estimator in the tuner, as this can take any of the following forms: (str) - The S3 location where training data is saved. (dict[str, str] or dict[str, sagemaker.session.s3_input]) - If using multiple channels for training data, you can specify a dict mapping channel names to strings or `s3_input()` objects. (sagemaker.session.s3_input) - Channel configuration for S3 data sources that can provide additional information about the training dataset. See `sagemaker.session.s3_input()` for full details. (sagemaker.amazon.amazon_estimator.RecordSet) - A collection of Amazon :class:~`Record` objects serialized and stored in S3. For use with an estimator for an Amazon algorithm. (list[sagemaker.amazon.amazon_estimator.RecordSet]) - A list of :class:~`sagemaker.amazon.amazon_estimator.RecordSet` objects, where each instance is a different channel of training data. (dict[str, one the forms above]): Required by only tuners created via the factory method `HyperparameterTuner.create()`. The keys should be the same estimator names as keys for the `estimator_dict` argument of the `HyperparameterTuner.create()` method. job_name (str) – Specify a tuning job name if needed. include_cls_metadata – It can take one of the following two forms. (bool) - Whether or not the hyperparameter tuning job should include information about the estimator class (default: False). This information is passed as a hyperparameter, so if the algorithm you are using cannot handle unknown hyperparameters (e.g. an Amazon SageMaker built-in algorithm that does not have a custom estimator in the Python SDK), then set `include_cls_metadata` to `False`. (dict[str, bool]) - This version should be used for tuners created via the factory method `HyperparameterTuner.create()`, to specify the flag for individual estimators provided in the `estimator_dict` argument of the method. The keys would be the same estimator names as in `estimator_dict`. If one estimator doesn’t need the flag set, then no need to include it in the dictionary. If none of the estimators need the flag set, then an empty dictionary `{}` must be used. mini_batch_size – It can take one of the following two forms. (int) - Specify this argument only when estimator is a built-in estimator of an Amazon algorithm. For other estimators, batch size should be specified in the estimator. (dict[str, int]) - This version should be used for tuners created via the factory method `HyperparameterTuner.create()`, to specify the value for individual estimators provided in the `estimator_dict` argument of the method. The keys would be the same estimator names as in `estimator_dict`. If one estimator doesn’t need the value set, then no need to include it in the dictionary. If none of the estimators need the value set, then an empty dictionary `{}` must be used.
Returns:	Tuning config that can be directly used by SageMakerTuningOperator in Airflow.
Return type:	dict

model_config¶

sagemaker.workflow.airflow.model_config(instance_type, model, role=None, image=None)¶

Export Airflow model config from a SageMaker model

Parameters:	instance_type (str) – The EC2 instance type to deploy this Model to. For example, ‘ml.p2.xlarge’ model (sagemaker.model.FrameworkModel) – The SageMaker model to export Airflow config from role (str) – The `ExecutionRoleArn` IAM Role ARN for the model image (str) – An container image to use for deploying the model
Returns:	Model config that can be directly used by SageMakerModelOperator in Airflow. It can also be part of the config used by SageMakerEndpointOperator and SageMakerTransformOperator in Airflow.
Return type:	dict

model_config_from_estimator¶

sagemaker.workflow.airflow.model_config_from_estimator(instance_type, estimator, task_id, task_type, role=None, image=None, name=None, model_server_workers=None, vpc_config_override='VPC_CONFIG_DEFAULT')¶

Export Airflow model config from a SageMaker estimator

Parameters:

instance_type (str) – The EC2 instance type to deploy this Model to. For example, ‘ml.p2.xlarge’
estimator (sagemaker.model.EstimatorBase) – The SageMaker estimator to export Airflow config from. It has to be an estimator associated with a training job.
task_id (str) – The task id of any airflow.contrib.operators.SageMakerTrainingOperator or airflow.contrib.operators.SageMakerTuningOperator that generates training jobs in the DAG. The model config is built based on the training job generated in this operator.
task_type (str) – Whether the task is from SageMakerTrainingOperator or SageMakerTuningOperator. Values can be ‘training’, ‘tuning’ or None (which means training job is not from any task).
role (str) – The ExecutionRoleArn IAM Role ARN for the model
image (str) – An container image to use for deploying the model
name (str) – Name of the model
model_server_workers (int) – The number of worker processes used by the inference server. If None, server will use one worker per vCPU. Only effective when estimator is a SageMaker framework.
vpc_config_override (dict[str, list[str]]) – Override for VpcConfig set on the model. Default: use subnets and security groups from this Estimator. * ‘Subnets’ (list[str]): List of subnet ids. * ‘SecurityGroupIds’ (list[str]): List of security group ids.

Returns:

Model config that can be directly used by SageMakerModelOperator in Airflow. It can: also be part of the config used by SageMakerEndpointOperator in Airflow.

Return type:

dict

transform_config¶

sagemaker.workflow.airflow.transform_config(transformer, data, data_type='S3Prefix', content_type=None, compression_type=None, split_type=None, job_name=None)¶

Export Airflow transform config from a SageMaker transformer

Parameters:	transformer (sagemaker.transformer.Transformer) – The SageMaker transformer to export Airflow config from. data (str) – Input data location in S3. data_type (str) – What the S3 location defines (default: ‘S3Prefix’). Valid values: ’S3Prefix’ - the S3 URI defines a key name prefix. All objects with this prefix will be used as inputs for the transform job. ’ManifestFile’ - the S3 URI points to a single manifest file listing each S3 object to use as an input for the transform job. content_type (str) – MIME type of the input data (default: None). compression_type (str) – Compression type of the input data, if compressed (default: None). Valid values: ‘Gzip’, None. split_type (str) – The record delimiter for the input object (default: ‘None’). Valid values: ‘None’, ‘Line’, ‘RecordIO’, and ‘TFRecord’. job_name (str) – job name (default: None). If not specified, one will be generated.
Returns:	Transform config that can be directly used by SageMakerTransformOperator in Airflow.
Return type:	dict

transform_config_from_estimator¶

sagemaker.workflow.airflow.transform_config_from_estimator(estimator, task_id, task_type, instance_count, instance_type, data, data_type='S3Prefix', content_type=None, compression_type=None, split_type=None, job_name=None, model_name=None, strategy=None, assemble_with=None, output_path=None, output_kms_key=None, accept=None, env=None, max_concurrent_transforms=None, max_payload=None, tags=None, role=None, volume_kms_key=None, model_server_workers=None, image=None, vpc_config_override=None)¶

Export Airflow transform config from a SageMaker estimator

Parameters:	estimator (sagemaker.model.EstimatorBase) – The SageMaker estimator to export Airflow config from. It has to be an estimator associated with a training job. task_id (str) – The task id of any airflow.contrib.operators.SageMakerTrainingOperator or airflow.contrib.operators.SageMakerTuningOperator that generates training jobs in the DAG. The transform config is built based on the training job generated in this operator. task_type (str) – Whether the task is from SageMakerTrainingOperator or SageMakerTuningOperator. Values can be ‘training’, ‘tuning’ or None (which means training job is not from any task). instance_count (int) – Number of EC2 instances to use. instance_type (str) – Type of EC2 instance to use, for example, ‘ml.c4.xlarge’. data (str) – Input data location in S3. data_type (str) – What the S3 location defines (default: ‘S3Prefix’). Valid values: ’S3Prefix’ - the S3 URI defines a key name prefix. All objects with this prefix will be used as inputs for the transform job. ’ManifestFile’ - the S3 URI points to a single manifest file listing each S3 object to use as an input for the transform job. content_type (str) – MIME type of the input data (default: None). compression_type (str) – Compression type of the input data, if compressed (default: None). Valid values: ‘Gzip’, None. split_type (str) – The record delimiter for the input object (default: ‘None’). Valid values: ‘None’, ‘Line’, ‘RecordIO’, and ‘TFRecord’. job_name (str) – transform job name (default: None). If not specified, one will be generated. model_name (str) – model name (default: None). If not specified, one will be generated. strategy (str) – The strategy used to decide how to batch records in a single request (default: None). Valid values: ‘MULTI_RECORD’ and ‘SINGLE_RECORD’. assemble_with (str) – How the output is assembled (default: None). Valid values: ‘Line’ or ‘None’. output_path (str) – S3 location for saving the transform result. If not specified, results are stored to a default bucket. output_kms_key (str) – Optional. KMS key ID for encrypting the transform output (default: None). accept (str) – The accept header passed by the client to the inference endpoint. If it is supported by the endpoint, it will be the format of the batch transform output. env (dict) – Environment variables to be set for use during the transform job (default: None). max_concurrent_transforms (int) – The maximum number of HTTP requests to be made to each individual transform container at one time. max_payload (int) – Maximum size of the payload in a single HTTP request to the container in MB. tags (list[dict]) – List of tags for labeling a transform job. If none specified, then the tags used for the training job are used for the transform job. role (str) – The `ExecutionRoleArn` IAM Role ARN for the `Model`, which is also used during transform jobs. If not specified, the role from the Estimator will be used. volume_kms_key (str) – Optional. KMS key ID for encrypting the volume attached to the ML compute instance (default: None). model_server_workers (int) – Optional. The number of worker processes used by the inference server. If None, server will use one worker per vCPU. image (str) – An container image to use for deploying the model vpc_config_override (dict[str, list[str]]) – Override for VpcConfig set on the model. Default: use subnets and security groups from this Estimator. * ‘Subnets’ (list[str]): List of subnet ids. * ‘SecurityGroupIds’ (list[str]): List of security group ids.
Returns:	Transform config that can be directly used by SageMakerTransformOperator in Airflow.
Return type:	dict

deploy_config¶

sagemaker.workflow.airflow.deploy_config(model, initial_instance_count, instance_type, endpoint_name=None, tags=None)¶

Export Airflow deploy config from a SageMaker model

Parameters:	model (sagemaker.model.Model) – The SageMaker model to export the Airflow config from. initial_instance_count (int) – The initial number of instances to run in the `Endpoint` created from this `Model`. instance_type (str) – The EC2 instance type to deploy this Model to. For example, ‘ml.p2.xlarge’. endpoint_name (str) – The name of the endpoint to create (default: None). If not specified, a unique endpoint name will be created. tags (list[dict]) – List of tags for labeling a training job. For more, see https://docs.aws.amazon.com/sagemaker/latest/dg/API_Tag.html.
Returns:	Deploy config that can be directly used by SageMakerEndpointOperator in Airflow.
Return type:	dict

deploy_config_from_estimator¶

sagemaker.workflow.airflow.deploy_config_from_estimator(estimator, task_id, task_type, initial_instance_count, instance_type, model_name=None, endpoint_name=None, tags=None, **kwargs)¶

Export Airflow deploy config from a SageMaker estimator

Parameters:	estimator (sagemaker.model.EstimatorBase) – The SageMaker estimator to export Airflow config from. It has to be an estimator associated with a training job. task_id (str) – The task id of any airflow.contrib.operators.SageMakerTrainingOperator or airflow.contrib.operators.SageMakerTuningOperator that generates training jobs in the DAG. The endpoint config is built based on the training job generated in this operator. task_type (str) – Whether the task is from SageMakerTrainingOperator or SageMakerTuningOperator. Values can be ‘training’, ‘tuning’ or None (which means training job is not from any task). initial_instance_count (int) – Minimum number of EC2 instances to deploy to an endpoint for prediction. instance_type (str) – Type of EC2 instance to deploy to an endpoint for prediction, for example, ‘ml.c4.xlarge’. model_name (str) – Name to use for creating an Amazon SageMaker model. If not specified, one will be generated. endpoint_name (str) – Name to use for creating an Amazon SageMaker endpoint. If not specified, the name of the SageMaker model is used. tags (list[dict]) – List of tags for labeling a training job. For more, see https://docs.aws.amazon.com/sagemaker/latest/dg/API_Tag.html. kwargs – Passed to invocation of `create_model()`. Implementations may customize `create_model()` to accept `kwargs` to customize model creation during deploy. For more, see the implementation docs.
Returns:	Deploy config that can be directly used by SageMakerEndpointOperator in Airflow.
Return type:	dict