Airflow¶
training_config¶
-
sagemaker.workflow.airflow.
training_config
(estimator, inputs=None, job_name=None, mini_batch_size=None)¶ Export Airflow training config from an estimator
- Parameters
estimator (sagemaker.estimator.EstimatorBase) – The estimator to export training config from. Can be a BYO estimator, Framework estimator or Amazon algorithm estimator.
inputs –
Information about the training data. Please refer to the
fit()
method of the associated estimator, as this can take any of the following forms: * (str) - The S3 location where training data is saved.- (dict[str, str] or dict[str, sagemaker.inputs.TrainingInput]) - If using multiple
channels for training data, you can specify a dict mapping channel names to strings or
TrainingInput()
objects.
- (sagemaker.inputs.TrainingInput) - Channel configuration for S3 data sources that can
provide additional information about the training dataset. See
sagemaker.inputs.TrainingInput()
for full details.
- (sagemaker.amazon.amazon_estimator.RecordSet) - A collection of
Amazon :class:~`Record` objects serialized and stored in S3. For use with an estimator for an Amazon algorithm.
- (list[sagemaker.amazon.amazon_estimator.RecordSet]) - A list of
:class:~`sagemaker.amazon.amazon_estimator.RecordSet` objects, where each instance is a different channel of training data.
job_name (str) – Specify a training job name if needed.
mini_batch_size (int) – Specify this argument only when estimator is a built-in estimator of an Amazon algorithm. For other estimators, batch size should be specified in the estimator.
- Returns
Training config that can be directly used by SageMakerTrainingOperator in Airflow.
- Return type
tuning_config¶
-
sagemaker.workflow.airflow.
tuning_config
(tuner, inputs, job_name=None, include_cls_metadata=False, mini_batch_size=None)¶ Export Airflow tuning config from a HyperparameterTuner
- Parameters
tuner (sagemaker.tuner.HyperparameterTuner) – The tuner to export tuning config from.
inputs –
Information about the training data. Please refer to the
fit()
method of the associated estimator in the tuner, as this can take any of the following forms:(str) - The S3 location where training data is saved.
- (dict[str, str] or dict[str, sagemaker.inputs.TrainingInput]) - If using multiple
channels for training data, you can specify a dict mapping channel names to strings or
TrainingInput()
objects.
- (sagemaker.inputs.TrainingInput) - Channel configuration for S3 data sources that can
provide additional information about the training dataset. See
sagemaker.inputs.TrainingInput()
for full details.
- (sagemaker.amazon.amazon_estimator.RecordSet) - A collection of
Amazon :class:~`Record` objects serialized and stored in S3. For use with an estimator for an Amazon algorithm.
- (list[sagemaker.amazon.amazon_estimator.RecordSet]) - A list of
:class:~`sagemaker.amazon.amazon_estimator.RecordSet` objects, where each instance is a different channel of training data.
- (dict[str, one the forms above]): Required by only tuners created via
the factory method
HyperparameterTuner.create()
. The keys should be the same estimator names as keys for theestimator_dict
argument of theHyperparameterTuner.create()
method.
job_name (str) – Specify a tuning job name if needed.
include_cls_metadata –
It can take one of the following two forms.
- (bool) - Whether or not the hyperparameter tuning job should include information
about the estimator class (default: False). This information is passed as a hyperparameter, so if the algorithm you are using cannot handle unknown hyperparameters (e.g. an Amazon SageMaker built-in algorithm that does not have a custom estimator in the Python SDK), then set
include_cls_metadata
toFalse
.
- (dict[str, bool]) - This version should be used for tuners created via the factory
method
HyperparameterTuner.create()
, to specify the flag for individual estimators provided in theestimator_dict
argument of the method. The keys would be the same estimator names as inestimator_dict
. If one estimator doesn’t need the flag set, then no need to include it in the dictionary. If none of the estimators need the flag set, then an empty dictionary{}
must be used.
mini_batch_size –
It can take one of the following two forms.
- (int) - Specify this argument only when estimator is a built-in estimator of an
Amazon algorithm. For other estimators, batch size should be specified in the estimator.
- (dict[str, int]) - This version should be used for tuners created via the factory
method
HyperparameterTuner.create()
, to specify the value for individual estimators provided in theestimator_dict
argument of the method. The keys would be the same estimator names as inestimator_dict
. If one estimator doesn’t need the value set, then no need to include it in the dictionary. If none of the estimators need the value set, then an empty dictionary{}
must be used.
- Returns
Tuning config that can be directly used by SageMakerTuningOperator in Airflow.
- Return type
model_config¶
-
sagemaker.workflow.airflow.
model_config
(model, instance_type=None, role=None, image_uri=None)¶ Export Airflow model config from a SageMaker model
- Parameters
model (sagemaker.model.Model) – The Model object from which to export the Airflow config
instance_type (str) – The EC2 instance type to deploy this Model to. For example, ‘ml.p2.xlarge’
role (str) – The
ExecutionRoleArn
IAM Role ARN for the modelimage_uri (str) – An Docker image URI to use for deploying the model
- Returns
- Model config that can be directly used by SageMakerModelOperator
in Airflow. It can also be part of the config used by SageMakerEndpointOperator and SageMakerTransformOperator in Airflow.
- Return type
model_config_from_estimator¶
-
sagemaker.workflow.airflow.
model_config_from_estimator
(estimator, task_id, task_type, instance_type=None, role=None, image_uri=None, name=None, model_server_workers=None, vpc_config_override='VPC_CONFIG_DEFAULT')¶ Export Airflow model config from a SageMaker estimator
- Parameters
estimator (sagemaker.model.EstimatorBase) – The SageMaker estimator to export Airflow config from. It has to be an estimator associated with a training job.
task_id (str) – The task id of any airflow.contrib.operators.SageMakerTrainingOperator or airflow.contrib.operators.SageMakerTuningOperator that generates training jobs in the DAG. The model config is built based on the training job generated in this operator.
task_type (str) – Whether the task is from SageMakerTrainingOperator or SageMakerTuningOperator. Values can be ‘training’, ‘tuning’ or None (which means training job is not from any task).
instance_type (str) – The EC2 instance type to deploy this Model to. For example, ‘ml.p2.xlarge’
role (str) – The
ExecutionRoleArn
IAM Role ARN for the modelimage_uri (str) – A Docker image URI to use for deploying the model
name (str) – Name of the model
model_server_workers (int) – The number of worker processes used by the inference server. If None, server will use one worker per vCPU. Only effective when estimator is a SageMaker framework.
vpc_config_override (dict[str, list[str]]) – Override for VpcConfig set on the model. Default: use subnets and security groups from this Estimator. * ‘Subnets’ (list[str]): List of subnet ids. * ‘SecurityGroupIds’ (list[str]): List of security group ids.
- Returns
- Model config that can be directly used by SageMakerModelOperator in Airflow. It can
also be part of the config used by SageMakerEndpointOperator in Airflow.
- Return type
transform_config¶
-
sagemaker.workflow.airflow.
transform_config
(transformer, data, data_type='S3Prefix', content_type=None, compression_type=None, split_type=None, job_name=None, input_filter=None, output_filter=None, join_source=None)¶ Export Airflow transform config from a SageMaker transformer
- Parameters
transformer (sagemaker.transformer.Transformer) – The SageMaker transformer to export Airflow config from.
data (str) – Input data location in S3.
data_type (str) –
What the S3 location defines (default: ‘S3Prefix’). Valid values:
- ’S3Prefix’ - the S3 URI defines a key name prefix. All objects with this prefix will
be used as inputs for the transform job.
- ’ManifestFile’ - the S3 URI points to a single manifest file listing each S3 object
to use as an input for the transform job.
content_type (str) – MIME type of the input data (default: None).
compression_type (str) – Compression type of the input data, if compressed (default: None). Valid values: ‘Gzip’, None.
split_type (str) – The record delimiter for the input object (default: ‘None’). Valid values: ‘None’, ‘Line’, ‘RecordIO’, and ‘TFRecord’.
job_name (str) – job name (default: None). If not specified, one will be generated.
input_filter (str) – A JSONPath to select a portion of the input to pass to the algorithm container for inference. If you omit the field, it gets the value ‘$’, representing the entire input. For CSV data, each row is taken as a JSON array, so only index-based JSONPaths can be applied, e.g. $[0], $[1:]. CSV data should follow the RFC format. See Supported JSONPath Operators for a table of supported JSONPath operators. For more information, see the SageMaker API documentation for CreateTransformJob. Some examples: “$[1:]”, “$.features” (default: None).
output_filter (str) –
A JSONPath to select a portion of the joined/original output to return as the output. For more information, see the SageMaker API documentation for CreateTransformJob. Some examples: “$[1:]”, “$.prediction” (default: None).
join_source (str) – The source of data to be joined to the transform output. It can be set to ‘Input’ meaning the entire input record will be joined to the inference result. You can use OutputFilter to select the useful portion before uploading to S3. (default: None). Valid values: Input, None.
- Returns
Transform config that can be directly used by SageMakerTransformOperator in Airflow.
- Return type
transform_config_from_estimator¶
-
sagemaker.workflow.airflow.
transform_config_from_estimator
(estimator, task_id, task_type, instance_count, instance_type, data, data_type='S3Prefix', content_type=None, compression_type=None, split_type=None, job_name=None, model_name=None, strategy=None, assemble_with=None, output_path=None, output_kms_key=None, accept=None, env=None, max_concurrent_transforms=None, max_payload=None, tags=None, role=None, volume_kms_key=None, model_server_workers=None, image_uri=None, vpc_config_override=None, input_filter=None, output_filter=None, join_source=None)¶ Export Airflow transform config from a SageMaker estimator
- Parameters
estimator (sagemaker.model.EstimatorBase) – The SageMaker estimator to export Airflow config from. It has to be an estimator associated with a training job.
task_id (str) – The task id of any airflow.contrib.operators.SageMakerTrainingOperator or airflow.contrib.operators.SageMakerTuningOperator that generates training jobs in the DAG. The transform config is built based on the training job generated in this operator.
task_type (str) – Whether the task is from SageMakerTrainingOperator or SageMakerTuningOperator. Values can be ‘training’, ‘tuning’ or None (which means training job is not from any task).
instance_count (int) – Number of EC2 instances to use.
instance_type (str) – Type of EC2 instance to use, for example, ‘ml.c4.xlarge’.
data (str) – Input data location in S3.
data_type (str) –
What the S3 location defines (default: ‘S3Prefix’). Valid values:
- ’S3Prefix’ - the S3 URI defines a key name prefix. All objects with this prefix will
be used as inputs for the transform job.
- ’ManifestFile’ - the S3 URI points to a single manifest file listing each S3 object
to use as an input for the transform job.
content_type (str) – MIME type of the input data (default: None).
compression_type (str) – Compression type of the input data, if compressed (default: None). Valid values: ‘Gzip’, None.
split_type (str) – The record delimiter for the input object (default: ‘None’). Valid values: ‘None’, ‘Line’, ‘RecordIO’, and ‘TFRecord’.
job_name (str) – transform job name (default: None). If not specified, one will be generated.
model_name (str) – model name (default: None). If not specified, one will be generated.
strategy (str) – The strategy used to decide how to batch records in a single request (default: None). Valid values: ‘MultiRecord’ and ‘SingleRecord’.
assemble_with (str) – How the output is assembled (default: None). Valid values: ‘Line’ or ‘None’.
output_path (str) – S3 location for saving the transform result. If not specified, results are stored to a default bucket.
output_kms_key (str) – Optional. KMS key ID for encrypting the transform output (default: None).
accept (str) – The accept header passed by the client to the inference endpoint. If it is supported by the endpoint, it will be the format of the batch transform output.
env (dict) – Environment variables to be set for use during the transform job (default: None).
max_concurrent_transforms (int) – The maximum number of HTTP requests to be made to each individual transform container at one time.
max_payload (int) – Maximum size of the payload in a single HTTP request to the container in MB.
tags (list[dict]) – List of tags for labeling a transform job. If none specified, then the tags used for the training job are used for the transform job.
role (str) – The
ExecutionRoleArn
IAM Role ARN for theModel
, which is also used during transform jobs. If not specified, the role from the Estimator will be used.volume_kms_key (str) – Optional. KMS key ID for encrypting the volume attached to the ML compute instance (default: None).
model_server_workers (int) – Optional. The number of worker processes used by the inference server. If None, server will use one worker per vCPU.
image_uri (str) – A Docker image URI to use for deploying the model
vpc_config_override (dict[str, list[str]]) –
Override for VpcConfig set on the model. Default: use subnets and security groups from this Estimator.
’Subnets’ (list[str]): List of subnet ids.
’SecurityGroupIds’ (list[str]): List of security group ids.
input_filter (str) –
A JSONPath to select a portion of the input to pass to the algorithm container for inference. If you omit the field, it gets the value ‘$’, representing the entire input. For CSV data, each row is taken as a JSON array, so only index-based JSONPaths can be applied, e.g. $[0], $[1:]. CSV data should follow the RFC format. See Supported JSONPath Operators for a table of supported JSONPath operators. For more information, see the SageMaker API documentation for CreateTransformJob. Some examples: “$[1:]”, “$.features” (default: None).
output_filter (str) –
A JSONPath to select a portion of the joined/original output to return as the output. For more information, see the SageMaker API documentation for CreateTransformJob. Some examples: “$[1:]”, “$.prediction” (default: None).
join_source (str) – The source of data to be joined to the transform output. It can be set to ‘Input’ meaning the entire input record will be joined to the inference result. You can use OutputFilter to select the useful portion before uploading to S3. (default: None). Valid values: Input, None.
- Returns
Transform config that can be directly used by SageMakerTransformOperator in Airflow.
- Return type
deploy_config¶
-
sagemaker.workflow.airflow.
deploy_config
(model, initial_instance_count, instance_type, endpoint_name=None, tags=None)¶ Export Airflow deploy config from a SageMaker model
- Parameters
model (sagemaker.model.Model) – The SageMaker model to export the Airflow config from.
initial_instance_count (int) – The initial number of instances to run in the
Endpoint
created from thisModel
.instance_type (str) – The EC2 instance type to deploy this Model to. For example, ‘ml.p2.xlarge’.
endpoint_name (str) – The name of the endpoint to create (default: None). If not specified, a unique endpoint name will be created.
tags (list[dict]) – List of tags for labeling a training job. For more, see https://docs.aws.amazon.com/sagemaker/latest/dg/API_Tag.html.
- Returns
Deploy config that can be directly used by SageMakerEndpointOperator in Airflow.
- Return type
deploy_config_from_estimator¶
-
sagemaker.workflow.airflow.
deploy_config_from_estimator
(estimator, task_id, task_type, initial_instance_count, instance_type, model_name=None, endpoint_name=None, tags=None, **kwargs)¶ Export Airflow deploy config from a SageMaker estimator
- Parameters
estimator (sagemaker.model.EstimatorBase) – The SageMaker estimator to export Airflow config from. It has to be an estimator associated with a training job.
task_id (str) – The task id of any airflow.contrib.operators.SageMakerTrainingOperator or airflow.contrib.operators.SageMakerTuningOperator that generates training jobs in the DAG. The endpoint config is built based on the training job generated in this operator.
task_type (str) – Whether the task is from SageMakerTrainingOperator or SageMakerTuningOperator. Values can be ‘training’, ‘tuning’ or None (which means training job is not from any task).
initial_instance_count (int) – Minimum number of EC2 instances to deploy to an endpoint for prediction.
instance_type (str) – Type of EC2 instance to deploy to an endpoint for prediction, for example, ‘ml.c4.xlarge’.
model_name (str) – Name to use for creating an Amazon SageMaker model. If not specified, one will be generated.
endpoint_name (str) – Name to use for creating an Amazon SageMaker endpoint. If not specified, the name of the SageMaker model is used.
tags (list[dict]) – List of tags for labeling a training job. For more, see https://docs.aws.amazon.com/sagemaker/latest/dg/API_Tag.html.
**kwargs – Passed to invocation of
create_model()
. Implementations may customizecreate_model()
to accept**kwargs
to customize model creation during deploy. For more, see the implementation docs.
- Returns
Deploy config that can be directly used by SageMakerEndpointOperator in Airflow.
- Return type