sagemaker.train.evaluate.mtrl_pipeline_templates

Contents

sagemaker.train.evaluate.mtrl_pipeline_templates#

MTRL (Multi-Turn RL) SageMaker Pipelines templates.

Canonical templates for the MTRL Agentic Eval pipeline contract: see Quip “MTRL Agentic Eval — SM Pipeline Template”. Three template shapes are exported:

  • MTRL_TEMPLATE_BASE_MODEL_ONLY — evaluate a base model only.

  • MTRL_TEMPLATE_FINE_TUNED_ONLY — evaluate a fine-tuned model only.

  • MTRL_TEMPLATE — base vs fine-tuned side-by-side.

Contract summary#

  • Step Type = "Job"; Arguments.JobCategory = "AgentRFTEvaluation"; Arguments.JobConfigSchemaVersion = "1.0.0".

  • Input channel name = "evaluation".

  • MlflowConfig lives inside JobConfigDocument.OutputDataConfig.MlflowConfig (not top-level).

  • Distinct MlflowRunName per eval step: base-model-eval vs fine-tuned-model-eval — both runs land in the same experiment for side-by-side comparison.

  • VpcConfig is at the step Arguments level (not inside the JobConfigDocument).

  • Hyperparameters (eval_group_size, sampling_temperature, top_p, max_tokens, pass_k_values, success_threshold) are emitted as string values under EvaluationConfig.HyperParameters.

  • Lineage DAG: CreateEvaluationAction (Evaluate…) AssociateLineage; AssociateLineage reads ServiceOutput.MlflowDetails.* from the eval step(s) via Get expressions.

Template context keys (all three templates consume the same context shape; each template omits keys it does not need):

pipeline_name, role_arn, base_model_arn, agent_arn, agent_qualifier, dataset_uri, s3_output_path, mlflow_resource_arn, mlflow_experiment_name, eval_group_size, sampling_temperature, top_p, max_tokens, pass_k_values, success_threshold, model_package_group_arn, source_model_package_arn, action_arn_prefix, dataset_artifact_arn, kms_key_arn, vpc_config, vpc_security_group_ids, vpc_subnets, tags