sagemaker.train.evaluate.mtrl_pipeline_templates

sagemaker.train.evaluate.mtrl_pipeline_templates#

MTRL (Multi-Turn RL) SageMaker Pipelines templates.

Canonical templates for the MTRL Agentic Eval pipeline contract: see Quip “MTRL Agentic Eval — SM Pipeline Template”. Three template shapes are exported:

MTRL_TEMPLATE_BASE_MODEL_ONLY — evaluate a base model only.
MTRL_TEMPLATE_FINE_TUNED_ONLY — evaluate a fine-tuned model only.
MTRL_TEMPLATE — base vs fine-tuned side-by-side.

Contract summary#

Step Type = "Job"; Arguments.JobCategory = "AgentRFTEvaluation"; Arguments.JobConfigSchemaVersion = "1.0.0".
Input channel name = "evaluation".
MlflowConfig lives inside JobConfigDocument.OutputDataConfig.MlflowConfig (not top-level).
Distinct MlflowRunName per eval step: base-model-eval vs fine-tuned-model-eval — both runs land in the same experiment for side-by-side comparison.
VpcConfig is at the step Arguments level (not inside the JobConfigDocument).
Hyperparameters (eval_group_size, sampling_temperature, top_p, max_tokens, pass_k_values, success_threshold) are emitted as string values under EvaluationConfig.HyperParameters.
Lineage DAG: CreateEvaluationAction → (Evaluate…) → AssociateLineage; AssociateLineage reads ServiceOutput.MlflowDetails.* from the eval step(s) via Get expressions.

Template context keys (all three templates consume the same context shape; each template omits keys it does not need):

pipeline_name, role_arn, base_model_arn, agent_arn, agent_qualifier, dataset_uri, s3_output_path, mlflow_resource_arn, mlflow_experiment_name, eval_group_size, sampling_temperature, top_p, max_tokens, pass_k_values, success_threshold, model_package_group_arn, source_model_package_arn, action_arn_prefix, dataset_artifact_arn, kms_key_arn, vpc_config, vpc_security_group_ids, vpc_subnets, tags

sagemaker.train.evaluate.mtrl_pipeline_templates

Contents

sagemaker.train.evaluate.mtrl_pipeline_templates#

Contract summary#