Airflow

training_config

sagemaker.workflow.airflow.training_config(estimator, inputs=None, job_name=None, mini_batch_size=None)

Export Airflow training config from an estimator

Parameters:
  • estimator (sagemaker.estimator.EstimatorBase) – The estimator to export training config from. Can be a BYO estimator, Framework estimator or Amazon algorithm estimator.
  • inputs

    Information about the training data. Please refer to the fit() method of the associated estimator, as this can take any of the following forms: * (str) - The S3 location where training data is saved.

    • (dict[str, str] or dict[str, sagemaker.session.s3_input]) - If using multiple
      channels for training data, you can specify a dict mapping channel names to strings or s3_input() objects.
    • (sagemaker.session.s3_input) - Channel configuration for S3 data sources that can
      provide additional information about the training dataset. See sagemaker.session.s3_input() for full details.
    • (sagemaker.amazon.amazon_estimator.RecordSet) - A collection of
      Amazon :class:~`Record` objects serialized and stored in S3. For use with an estimator for an Amazon algorithm.
    • (list[sagemaker.amazon.amazon_estimator.RecordSet]) - A list of
      :class:~`sagemaker.amazon.amazon_estimator.RecordSet` objects, where each instance is a different channel of training data.
  • job_name (str) – Specify a training job name if needed.
  • mini_batch_size (int) – Specify this argument only when estimator is a built-in estimator of an Amazon algorithm. For other estimators, batch size should be specified in the estimator.
Returns:

Training config that can be directly used by SageMakerTrainingOperator in Airflow.

Return type:

dict

tuning_config

sagemaker.workflow.airflow.tuning_config(tuner, inputs, job_name=None, include_cls_metadata=False, mini_batch_size=None)

Export Airflow tuning config from a HyperparameterTuner

Parameters:
  • tuner (sagemaker.tuner.HyperparameterTuner) – The tuner to export tuning config from.
  • inputs

    Information about the training data. Please refer to the fit() method of the associated estimator in the tuner, as this can take any of the following forms:

    • (str) - The S3 location where training data is saved.
    • (dict[str, str] or dict[str, sagemaker.session.s3_input]) - If using multiple
      channels for training data, you can specify a dict mapping channel names to strings or s3_input() objects.
    • (sagemaker.session.s3_input) - Channel configuration for S3 data sources that can
      provide additional information about the training dataset. See sagemaker.session.s3_input() for full details.
    • (sagemaker.amazon.amazon_estimator.RecordSet) - A collection of
      Amazon :class:~`Record` objects serialized and stored in S3. For use with an estimator for an Amazon algorithm.
    • (list[sagemaker.amazon.amazon_estimator.RecordSet]) - A list of
      :class:~`sagemaker.amazon.amazon_estimator.RecordSet` objects, where each instance is a different channel of training data.
    • (dict[str, one the forms above]): Required by only tuners created via
      the factory method HyperparameterTuner.create(). The keys should be the same estimator names as keys for the estimator_dict argument of the HyperparameterTuner.create() method.
  • job_name (str) – Specify a tuning job name if needed.
  • include_cls_metadata

    It can take one of the following two forms.

    • (bool) - Whether or not the hyperparameter tuning job should include information
      about the estimator class (default: False). This information is passed as a hyperparameter, so if the algorithm you are using cannot handle unknown hyperparameters (e.g. an Amazon SageMaker built-in algorithm that does not have a custom estimator in the Python SDK), then set include_cls_metadata to False.
    • (dict[str, bool]) - This version should be used for tuners created via the factory
      method HyperparameterTuner.create(), to specify the flag for individual estimators provided in the estimator_dict argument of the method. The keys would be the same estimator names as in estimator_dict. If one estimator doesn’t need the flag set, then no need to include it in the dictionary. If none of the estimators need the flag set, then an empty dictionary {} must be used.
  • mini_batch_size

    It can take one of the following two forms.

    • (int) - Specify this argument only when estimator is a built-in estimator of an
      Amazon algorithm. For other estimators, batch size should be specified in the estimator.
    • (dict[str, int]) - This version should be used for tuners created via the factory
      method HyperparameterTuner.create(), to specify the value for individual estimators provided in the estimator_dict argument of the method. The keys would be the same estimator names as in estimator_dict. If one estimator doesn’t need the value set, then no need to include it in the dictionary. If none of the estimators need the value set, then an empty dictionary {} must be used.
Returns:

Tuning config that can be directly used by SageMakerTuningOperator in Airflow.

Return type:

dict

model_config

sagemaker.workflow.airflow.model_config(instance_type, model, role=None, image=None)

Export Airflow model config from a SageMaker model

Parameters:
  • instance_type (str) – The EC2 instance type to deploy this Model to. For example, ‘ml.p2.xlarge’
  • model (sagemaker.model.FrameworkModel) – The SageMaker model to export Airflow config from
  • role (str) – The ExecutionRoleArn IAM Role ARN for the model
  • image (str) – An container image to use for deploying the model
Returns:

Model config that can be directly used by SageMakerModelOperator in Airflow. It can also be part of the config used by SageMakerEndpointOperator and SageMakerTransformOperator in Airflow.

Return type:

dict

model_config_from_estimator

sagemaker.workflow.airflow.model_config_from_estimator(instance_type, estimator, task_id, task_type, role=None, image=None, name=None, model_server_workers=None, vpc_config_override='VPC_CONFIG_DEFAULT')

Export Airflow model config from a SageMaker estimator

Parameters:
  • instance_type (str) – The EC2 instance type to deploy this Model to. For example, ‘ml.p2.xlarge’
  • estimator (sagemaker.model.EstimatorBase) – The SageMaker estimator to export Airflow config from. It has to be an estimator associated with a training job.
  • task_id (str) – The task id of any airflow.contrib.operators.SageMakerTrainingOperator or airflow.contrib.operators.SageMakerTuningOperator that generates training jobs in the DAG. The model config is built based on the training job generated in this operator.
  • task_type (str) – Whether the task is from SageMakerTrainingOperator or SageMakerTuningOperator. Values can be ‘training’, ‘tuning’ or None (which means training job is not from any task).
  • role (str) – The ExecutionRoleArn IAM Role ARN for the model
  • image (str) – An container image to use for deploying the model
  • name (str) – Name of the model
  • model_server_workers (int) – The number of worker processes used by the inference server. If None, server will use one worker per vCPU. Only effective when estimator is a SageMaker framework.
  • vpc_config_override (dict[str, list[str]]) – Override for VpcConfig set on the model. Default: use subnets and security groups from this Estimator. * ‘Subnets’ (list[str]): List of subnet ids. * ‘SecurityGroupIds’ (list[str]): List of security group ids.
Returns:

Model config that can be directly used by SageMakerModelOperator in Airflow. It can

also be part of the config used by SageMakerEndpointOperator in Airflow.

Return type:

dict

transform_config

sagemaker.workflow.airflow.transform_config(transformer, data, data_type='S3Prefix', content_type=None, compression_type=None, split_type=None, job_name=None)

Export Airflow transform config from a SageMaker transformer

Parameters:
  • transformer (sagemaker.transformer.Transformer) – The SageMaker transformer to export Airflow config from.
  • data (str) – Input data location in S3.
  • data_type (str) –

    What the S3 location defines (default: ‘S3Prefix’). Valid values:

    • ’S3Prefix’ - the S3 URI defines a key name prefix. All objects with this prefix will
      be used as inputs for the transform job.
    • ’ManifestFile’ - the S3 URI points to a single manifest file listing each S3 object
      to use as an input for the transform job.
  • content_type (str) – MIME type of the input data (default: None).
  • compression_type (str) – Compression type of the input data, if compressed (default: None). Valid values: ‘Gzip’, None.
  • split_type (str) – The record delimiter for the input object (default: ‘None’). Valid values: ‘None’, ‘Line’, ‘RecordIO’, and ‘TFRecord’.
  • job_name (str) – job name (default: None). If not specified, one will be generated.
Returns:

Transform config that can be directly used by SageMakerTransformOperator in Airflow.

Return type:

dict

transform_config_from_estimator

sagemaker.workflow.airflow.transform_config_from_estimator(estimator, task_id, task_type, instance_count, instance_type, data, data_type='S3Prefix', content_type=None, compression_type=None, split_type=None, job_name=None, model_name=None, strategy=None, assemble_with=None, output_path=None, output_kms_key=None, accept=None, env=None, max_concurrent_transforms=None, max_payload=None, tags=None, role=None, volume_kms_key=None, model_server_workers=None, image=None, vpc_config_override=None)

Export Airflow transform config from a SageMaker estimator

Parameters:
  • estimator (sagemaker.model.EstimatorBase) – The SageMaker estimator to export Airflow config from. It has to be an estimator associated with a training job.
  • task_id (str) – The task id of any airflow.contrib.operators.SageMakerTrainingOperator or airflow.contrib.operators.SageMakerTuningOperator that generates training jobs in the DAG. The transform config is built based on the training job generated in this operator.
  • task_type (str) – Whether the task is from SageMakerTrainingOperator or SageMakerTuningOperator. Values can be ‘training’, ‘tuning’ or None (which means training job is not from any task).
  • instance_count (int) – Number of EC2 instances to use.
  • instance_type (str) – Type of EC2 instance to use, for example, ‘ml.c4.xlarge’.
  • data (str) – Input data location in S3.
  • data_type (str) –

    What the S3 location defines (default: ‘S3Prefix’). Valid values:

    • ’S3Prefix’ - the S3 URI defines a key name prefix. All objects with this prefix will
      be used as inputs for the transform job.
    • ’ManifestFile’ - the S3 URI points to a single manifest file listing each S3 object
      to use as an input for the transform job.
  • content_type (str) – MIME type of the input data (default: None).
  • compression_type (str) – Compression type of the input data, if compressed (default: None). Valid values: ‘Gzip’, None.
  • split_type (str) – The record delimiter for the input object (default: ‘None’). Valid values: ‘None’, ‘Line’, ‘RecordIO’, and ‘TFRecord’.
  • job_name (str) – transform job name (default: None). If not specified, one will be generated.
  • model_name (str) – model name (default: None). If not specified, one will be generated.
  • strategy (str) – The strategy used to decide how to batch records in a single request (default: None). Valid values: ‘MULTI_RECORD’ and ‘SINGLE_RECORD’.
  • assemble_with (str) – How the output is assembled (default: None). Valid values: ‘Line’ or ‘None’.
  • output_path (str) – S3 location for saving the transform result. If not specified, results are stored to a default bucket.
  • output_kms_key (str) – Optional. KMS key ID for encrypting the transform output (default: None).
  • accept (str) – The accept header passed by the client to the inference endpoint. If it is supported by the endpoint, it will be the format of the batch transform output.
  • env (dict) – Environment variables to be set for use during the transform job (default: None).
  • max_concurrent_transforms (int) – The maximum number of HTTP requests to be made to each individual transform container at one time.
  • max_payload (int) – Maximum size of the payload in a single HTTP request to the container in MB.
  • tags (list[dict]) – List of tags for labeling a transform job. If none specified, then the tags used for the training job are used for the transform job.
  • role (str) – The ExecutionRoleArn IAM Role ARN for the Model, which is also used during transform jobs. If not specified, the role from the Estimator will be used.
  • volume_kms_key (str) – Optional. KMS key ID for encrypting the volume attached to the ML compute instance (default: None).
  • model_server_workers (int) – Optional. The number of worker processes used by the inference server. If None, server will use one worker per vCPU.
  • image (str) – An container image to use for deploying the model
  • vpc_config_override (dict[str, list[str]]) – Override for VpcConfig set on the model. Default: use subnets and security groups from this Estimator. * ‘Subnets’ (list[str]): List of subnet ids. * ‘SecurityGroupIds’ (list[str]): List of security group ids.
Returns:

Transform config that can be directly used by SageMakerTransformOperator in Airflow.

Return type:

dict

deploy_config

sagemaker.workflow.airflow.deploy_config(model, initial_instance_count, instance_type, endpoint_name=None, tags=None)

Export Airflow deploy config from a SageMaker model

Parameters:
  • model (sagemaker.model.Model) – The SageMaker model to export the Airflow config from.
  • initial_instance_count (int) – The initial number of instances to run in the Endpoint created from this Model.
  • instance_type (str) – The EC2 instance type to deploy this Model to. For example, ‘ml.p2.xlarge’.
  • endpoint_name (str) – The name of the endpoint to create (default: None). If not specified, a unique endpoint name will be created.
  • tags (list[dict]) – List of tags for labeling a training job. For more, see https://docs.aws.amazon.com/sagemaker/latest/dg/API_Tag.html.
Returns:

Deploy config that can be directly used by SageMakerEndpointOperator in Airflow.

Return type:

dict

deploy_config_from_estimator

sagemaker.workflow.airflow.deploy_config_from_estimator(estimator, task_id, task_type, initial_instance_count, instance_type, model_name=None, endpoint_name=None, tags=None, **kwargs)

Export Airflow deploy config from a SageMaker estimator

Parameters:
  • estimator (sagemaker.model.EstimatorBase) – The SageMaker estimator to export Airflow config from. It has to be an estimator associated with a training job.
  • task_id (str) – The task id of any airflow.contrib.operators.SageMakerTrainingOperator or airflow.contrib.operators.SageMakerTuningOperator that generates training jobs in the DAG. The endpoint config is built based on the training job generated in this operator.
  • task_type (str) – Whether the task is from SageMakerTrainingOperator or SageMakerTuningOperator. Values can be ‘training’, ‘tuning’ or None (which means training job is not from any task).
  • initial_instance_count (int) – Minimum number of EC2 instances to deploy to an endpoint for prediction.
  • instance_type (str) – Type of EC2 instance to deploy to an endpoint for prediction, for example, ‘ml.c4.xlarge’.
  • model_name (str) – Name to use for creating an Amazon SageMaker model. If not specified, one will be generated.
  • endpoint_name (str) – Name to use for creating an Amazon SageMaker endpoint. If not specified, the name of the SageMaker model is used.
  • tags (list[dict]) – List of tags for labeling a training job. For more, see https://docs.aws.amazon.com/sagemaker/latest/dg/API_Tag.html.
  • **kwargs – Passed to invocation of create_model(). Implementations may customize create_model() to accept **kwargs to customize model creation during deploy. For more, see the implementation docs.
Returns:

Deploy config that can be directly used by SageMakerEndpointOperator in Airflow.

Return type:

dict