sagemaker.train.evaluate.mtrl_pipeline_templates#
MTRL (Multi-Turn RL) SageMaker Pipelines templates.
Canonical templates for the MTRL Agentic Eval pipeline contract: see Quip “MTRL Agentic Eval — SM Pipeline Template”. Three template shapes are exported:
MTRL_TEMPLATE_BASE_MODEL_ONLY— evaluate a base model only.MTRL_TEMPLATE_FINE_TUNED_ONLY— evaluate a fine-tuned model only.MTRL_TEMPLATE— base vs fine-tuned side-by-side.
Contract summary#
Step
Type="Job";Arguments.JobCategory = "AgentRFTEvaluation";Arguments.JobConfigSchemaVersion = "1.0.0".Input channel name =
"evaluation".MlflowConfiglives insideJobConfigDocument.OutputDataConfig.MlflowConfig(not top-level).Distinct
MlflowRunNameper eval step:base-model-evalvsfine-tuned-model-eval— both runs land in the same experiment for side-by-side comparison.VpcConfigis at the stepArgumentslevel (not inside theJobConfigDocument).Hyperparameters (
eval_group_size,sampling_temperature,top_p,max_tokens,pass_k_values,success_threshold) are emitted as string values underEvaluationConfig.HyperParameters.Lineage DAG:
CreateEvaluationAction → (Evaluate…) → AssociateLineage;AssociateLineagereadsServiceOutput.MlflowDetails.*from the eval step(s) viaGetexpressions.
Template context keys (all three templates consume the same context shape; each template omits keys it does not need):
pipeline_name, role_arn, base_model_arn, agent_arn, agent_qualifier, dataset_uri, s3_output_path, mlflow_resource_arn, mlflow_experiment_name, eval_group_size, sampling_temperature, top_p, max_tokens, pass_k_values, success_threshold, model_package_group_arn, source_model_package_arn, action_arn_prefix, dataset_artifact_arn, kms_key_arn, vpc_config, vpc_security_group_ids, vpc_subnets, tags