Airflow¶
training_config¶
-
sagemaker.workflow.airflow.
training_config
(estimator, inputs=None, job_name=None, mini_batch_size=None)¶ Export Airflow training config from an estimator
Parameters: - estimator (sagemaker.estimator.EstimatorBase) – The estimator to export training config from. Can be a BYO estimator, Framework estimator or Amazon algorithm estimator.
- inputs –
Information about the training data. Please refer to the
fit()
method of the associated estimator, as this can take any of the following forms: * (str) - The S3 location where training data is saved.- (dict[str, str] or dict[str, sagemaker.session.s3_input]) - If using multiple
- channels for training data, you can specify a dict mapping channel names to
strings or
s3_input()
objects.
- (sagemaker.session.s3_input) - Channel configuration for S3 data sources that can
- provide additional information about the training dataset. See
sagemaker.session.s3_input()
for full details.
- (sagemaker.amazon.amazon_estimator.RecordSet) - A collection of
- Amazon :class:~`Record` objects serialized and stored in S3. For use with an estimator for an Amazon algorithm.
- (list[sagemaker.amazon.amazon_estimator.RecordSet]) - A list of
- :class:~`sagemaker.amazon.amazon_estimator.RecordSet` objects, where each instance is a different channel of training data.
- job_name (str) – Specify a training job name if needed.
- mini_batch_size (int) – Specify this argument only when estimator is a built-in estimator of an Amazon algorithm. For other estimators, batch size should be specified in the estimator.
Returns: Training config that can be directly used by SageMakerTrainingOperator in Airflow.
Return type:
tuning_config¶
-
sagemaker.workflow.airflow.
tuning_config
(tuner, inputs, job_name=None, include_cls_metadata=False, mini_batch_size=None)¶ Export Airflow tuning config from a HyperparameterTuner
Parameters: - tuner (sagemaker.tuner.HyperparameterTuner) – The tuner to export tuning config from.
- inputs –
Information about the training data. Please refer to the
fit()
method of the associated estimator in the tuner, as this can take any of the following forms:- (str) - The S3 location where training data is saved.
- (dict[str, str] or dict[str, sagemaker.session.s3_input]) - If using multiple
- channels for training data, you can specify a dict mapping channel names to
strings or
s3_input()
objects.
- (sagemaker.session.s3_input) - Channel configuration for S3 data sources that can
- provide additional information about the training dataset. See
sagemaker.session.s3_input()
for full details.
- (sagemaker.amazon.amazon_estimator.RecordSet) - A collection of
- Amazon :class:~`Record` objects serialized and stored in S3. For use with an estimator for an Amazon algorithm.
- (list[sagemaker.amazon.amazon_estimator.RecordSet]) - A list of
- :class:~`sagemaker.amazon.amazon_estimator.RecordSet` objects, where each instance is a different channel of training data.
- (dict[str, one the forms above]): Required by only tuners created via
- the factory method
HyperparameterTuner.create()
. The keys should be the same estimator names as keys for theestimator_dict
argument of theHyperparameterTuner.create()
method.
- job_name (str) – Specify a tuning job name if needed.
- include_cls_metadata –
It can take one of the following two forms.
- (bool) - Whether or not the hyperparameter tuning job should include information
- about the estimator class (default: False). This information is passed as a
hyperparameter, so if the algorithm you are using cannot handle unknown
hyperparameters (e.g. an Amazon SageMaker built-in algorithm that does not
have a custom estimator in the Python SDK), then set
include_cls_metadata
toFalse
.
- (dict[str, bool]) - This version should be used for tuners created via the factory
- method
HyperparameterTuner.create()
, to specify the flag for individual estimators provided in theestimator_dict
argument of the method. The keys would be the same estimator names as inestimator_dict
. If one estimator doesn’t need the flag set, then no need to include it in the dictionary. If none of the estimators need the flag set, then an empty dictionary{}
must be used.
- mini_batch_size –
It can take one of the following two forms.
- (int) - Specify this argument only when estimator is a built-in estimator of an
- Amazon algorithm. For other estimators, batch size should be specified in the estimator.
- (dict[str, int]) - This version should be used for tuners created via the factory
- method
HyperparameterTuner.create()
, to specify the value for individual estimators provided in theestimator_dict
argument of the method. The keys would be the same estimator names as inestimator_dict
. If one estimator doesn’t need the value set, then no need to include it in the dictionary. If none of the estimators need the value set, then an empty dictionary{}
must be used.
Returns: Tuning config that can be directly used by SageMakerTuningOperator in Airflow.
Return type:
model_config¶
-
sagemaker.workflow.airflow.
model_config
(instance_type, model, role=None, image=None)¶ Export Airflow model config from a SageMaker model
Parameters: - instance_type (str) – The EC2 instance type to deploy this Model to. For example, ‘ml.p2.xlarge’
- model (sagemaker.model.FrameworkModel) – The SageMaker model to export Airflow config from
- role (str) – The
ExecutionRoleArn
IAM Role ARN for the model - image (str) – An container image to use for deploying the model
Returns: Model config that can be directly used by SageMakerModelOperator in Airflow. It can also be part of the config used by SageMakerEndpointOperator and SageMakerTransformOperator in Airflow.
Return type:
model_config_from_estimator¶
-
sagemaker.workflow.airflow.
model_config_from_estimator
(instance_type, estimator, task_id, task_type, role=None, image=None, name=None, model_server_workers=None, vpc_config_override='VPC_CONFIG_DEFAULT')¶ Export Airflow model config from a SageMaker estimator
Parameters: - instance_type (str) – The EC2 instance type to deploy this Model to. For example, ‘ml.p2.xlarge’
- estimator (sagemaker.model.EstimatorBase) – The SageMaker estimator to export Airflow config from. It has to be an estimator associated with a training job.
- task_id (str) – The task id of any airflow.contrib.operators.SageMakerTrainingOperator or airflow.contrib.operators.SageMakerTuningOperator that generates training jobs in the DAG. The model config is built based on the training job generated in this operator.
- task_type (str) – Whether the task is from SageMakerTrainingOperator or SageMakerTuningOperator. Values can be ‘training’, ‘tuning’ or None (which means training job is not from any task).
- role (str) – The
ExecutionRoleArn
IAM Role ARN for the model - image (str) – An container image to use for deploying the model
- name (str) – Name of the model
- model_server_workers (int) – The number of worker processes used by the inference server. If None, server will use one worker per vCPU. Only effective when estimator is a SageMaker framework.
- vpc_config_override (dict[str, list[str]]) – Override for VpcConfig set on the model. Default: use subnets and security groups from this Estimator. * ‘Subnets’ (list[str]): List of subnet ids. * ‘SecurityGroupIds’ (list[str]): List of security group ids.
Returns: - Model config that can be directly used by SageMakerModelOperator in Airflow. It can
also be part of the config used by SageMakerEndpointOperator in Airflow.
Return type:
transform_config¶
-
sagemaker.workflow.airflow.
transform_config
(transformer, data, data_type='S3Prefix', content_type=None, compression_type=None, split_type=None, job_name=None)¶ Export Airflow transform config from a SageMaker transformer
Parameters: - transformer (sagemaker.transformer.Transformer) – The SageMaker transformer to export Airflow config from.
- data (str) – Input data location in S3.
- data_type (str) –
What the S3 location defines (default: ‘S3Prefix’). Valid values:
- ’S3Prefix’ - the S3 URI defines a key name prefix. All objects with this prefix will
- be used as inputs for the transform job.
- ’ManifestFile’ - the S3 URI points to a single manifest file listing each S3 object
- to use as an input for the transform job.
- content_type (str) – MIME type of the input data (default: None).
- compression_type (str) – Compression type of the input data, if compressed (default: None). Valid values: ‘Gzip’, None.
- split_type (str) – The record delimiter for the input object (default: ‘None’). Valid values: ‘None’, ‘Line’, ‘RecordIO’, and ‘TFRecord’.
- job_name (str) – job name (default: None). If not specified, one will be generated.
Returns: Transform config that can be directly used by SageMakerTransformOperator in Airflow.
Return type:
transform_config_from_estimator¶
-
sagemaker.workflow.airflow.
transform_config_from_estimator
(estimator, task_id, task_type, instance_count, instance_type, data, data_type='S3Prefix', content_type=None, compression_type=None, split_type=None, job_name=None, model_name=None, strategy=None, assemble_with=None, output_path=None, output_kms_key=None, accept=None, env=None, max_concurrent_transforms=None, max_payload=None, tags=None, role=None, volume_kms_key=None, model_server_workers=None, image=None, vpc_config_override=None)¶ Export Airflow transform config from a SageMaker estimator
Parameters: - estimator (sagemaker.model.EstimatorBase) – The SageMaker estimator to export Airflow config from. It has to be an estimator associated with a training job.
- task_id (str) – The task id of any airflow.contrib.operators.SageMakerTrainingOperator or airflow.contrib.operators.SageMakerTuningOperator that generates training jobs in the DAG. The transform config is built based on the training job generated in this operator.
- task_type (str) – Whether the task is from SageMakerTrainingOperator or SageMakerTuningOperator. Values can be ‘training’, ‘tuning’ or None (which means training job is not from any task).
- instance_count (int) – Number of EC2 instances to use.
- instance_type (str) – Type of EC2 instance to use, for example, ‘ml.c4.xlarge’.
- data (str) – Input data location in S3.
- data_type (str) –
What the S3 location defines (default: ‘S3Prefix’). Valid values:
- ’S3Prefix’ - the S3 URI defines a key name prefix. All objects with this prefix will
- be used as inputs for the transform job.
- ’ManifestFile’ - the S3 URI points to a single manifest file listing each S3 object
- to use as an input for the transform job.
- content_type (str) – MIME type of the input data (default: None).
- compression_type (str) – Compression type of the input data, if compressed (default: None). Valid values: ‘Gzip’, None.
- split_type (str) – The record delimiter for the input object (default: ‘None’). Valid values: ‘None’, ‘Line’, ‘RecordIO’, and ‘TFRecord’.
- job_name (str) – transform job name (default: None). If not specified, one will be generated.
- model_name (str) – model name (default: None). If not specified, one will be generated.
- strategy (str) – The strategy used to decide how to batch records in a single request (default: None). Valid values: ‘MULTI_RECORD’ and ‘SINGLE_RECORD’.
- assemble_with (str) – How the output is assembled (default: None). Valid values: ‘Line’ or ‘None’.
- output_path (str) – S3 location for saving the transform result. If not specified, results are stored to a default bucket.
- output_kms_key (str) – Optional. KMS key ID for encrypting the transform output (default: None).
- accept (str) – The accept header passed by the client to the inference endpoint. If it is supported by the endpoint, it will be the format of the batch transform output.
- env (dict) – Environment variables to be set for use during the transform job (default: None).
- max_concurrent_transforms (int) – The maximum number of HTTP requests to be made to each individual transform container at one time.
- max_payload (int) – Maximum size of the payload in a single HTTP request to the container in MB.
- tags (list[dict]) – List of tags for labeling a transform job. If none specified, then the tags used for the training job are used for the transform job.
- role (str) – The
ExecutionRoleArn
IAM Role ARN for theModel
, which is also used during transform jobs. If not specified, the role from the Estimator will be used. - volume_kms_key (str) – Optional. KMS key ID for encrypting the volume attached to the ML compute instance (default: None).
- model_server_workers (int) – Optional. The number of worker processes used by the inference server. If None, server will use one worker per vCPU.
- image (str) – An container image to use for deploying the model
- vpc_config_override (dict[str, list[str]]) – Override for VpcConfig set on the model. Default: use subnets and security groups from this Estimator. * ‘Subnets’ (list[str]): List of subnet ids. * ‘SecurityGroupIds’ (list[str]): List of security group ids.
Returns: Transform config that can be directly used by SageMakerTransformOperator in Airflow.
Return type:
deploy_config¶
-
sagemaker.workflow.airflow.
deploy_config
(model, initial_instance_count, instance_type, endpoint_name=None, tags=None)¶ Export Airflow deploy config from a SageMaker model
Parameters: - model (sagemaker.model.Model) – The SageMaker model to export the Airflow config from.
- initial_instance_count (int) – The initial number of instances to run in
the
Endpoint
created from thisModel
. - instance_type (str) – The EC2 instance type to deploy this Model to. For example, ‘ml.p2.xlarge’.
- endpoint_name (str) – The name of the endpoint to create (default: None). If not specified, a unique endpoint name will be created.
- tags (list[dict]) – List of tags for labeling a training job. For more, see https://docs.aws.amazon.com/sagemaker/latest/dg/API_Tag.html.
Returns: Deploy config that can be directly used by SageMakerEndpointOperator in Airflow.
Return type:
deploy_config_from_estimator¶
-
sagemaker.workflow.airflow.
deploy_config_from_estimator
(estimator, task_id, task_type, initial_instance_count, instance_type, model_name=None, endpoint_name=None, tags=None, **kwargs)¶ Export Airflow deploy config from a SageMaker estimator
Parameters: - estimator (sagemaker.model.EstimatorBase) – The SageMaker estimator to export Airflow config from. It has to be an estimator associated with a training job.
- task_id (str) – The task id of any airflow.contrib.operators.SageMakerTrainingOperator or airflow.contrib.operators.SageMakerTuningOperator that generates training jobs in the DAG. The endpoint config is built based on the training job generated in this operator.
- task_type (str) – Whether the task is from SageMakerTrainingOperator or SageMakerTuningOperator. Values can be ‘training’, ‘tuning’ or None (which means training job is not from any task).
- initial_instance_count (int) – Minimum number of EC2 instances to deploy to an endpoint for prediction.
- instance_type (str) – Type of EC2 instance to deploy to an endpoint for prediction, for example, ‘ml.c4.xlarge’.
- model_name (str) – Name to use for creating an Amazon SageMaker model. If not specified, one will be generated.
- endpoint_name (str) – Name to use for creating an Amazon SageMaker endpoint. If not specified, the name of the SageMaker model is used.
- tags (list[dict]) – List of tags for labeling a training job. For more, see https://docs.aws.amazon.com/sagemaker/latest/dg/API_Tag.html.
- **kwargs – Passed to invocation of
create_model()
. Implementations may customizecreate_model()
to accept**kwargs
to customize model creation during deploy. For more, see the implementation docs.
Returns: Deploy config that can be directly used by SageMakerEndpointOperator in Airflow.
Return type: