sagemaker.train.agent_rft_job#

AgentRFTJob — wrapper around sagemaker-core Job for AgentRFT job category.

Classes

AgentRFTJob(job)

Wrapper around sagemaker-core Job for AgentRFT job category.

class sagemaker.train.agent_rft_job.AgentRFTJob(job: Job)[source]#

Bases: object

Wrapper around sagemaker-core Job for AgentRFT job category.

Delegates lifecycle methods to the underlying Job and adds typed convenience properties by parsing the JobConfigDocument JSON string.

Parameters:

job – The sagemaker-core Job instance to wrap.

JOB_CATEGORY = 'AgentRFT'#
property agent_config: dict | None#

Full AgentConfig section from JobConfigDocument.

property billable_token_usage: dict | None#

Billable token usage from ServiceOutput.

Returns dict with keys: TrainTokenCount, PrefillTokenCount, SampleTokenCount.

property creation_time#
delete()[source]#

Delete the job via DeleteJob API.

property end_time#
property failure_reason: str | None#
classmethod from_job(job: Job) AgentRFTJob[source]#

Create an AgentRFTJob from a sagemaker-core Job instance.

classmethod get(job_name: str, session=None) AgentRFTJob[source]#

Attach to an existing AgentRFT job by name.

Parameters:
  • job_name – The name of the job.

  • session – Optional boto3 session.

Returns:

AgentRFTJob wrapping the existing job.

classmethod get_all(session=None, **kwargs)[source]#

List all AgentRFT jobs.

Delegates to Job.get_all with job_category pre-filled. Additional keyword arguments (e.g. creation_time_after, name_contains, sort_by, sort_order, status_equals) are forwarded.

Parameters:
  • session – Optional boto3 session.

  • **kwargs – Additional filter arguments forwarded to Job.get_all.

Yields:

AgentRFTJob instances.

get_mlflow_url() str | None[source]#

Generate a fresh presigned MLflow URL for this job’s experiment/run.

In Jupyter notebooks, also renders a clickable link.

Returns:

Presigned URL string, or None if MLflow is not configured.

get_training_metrics() list[dict][source]#

Fetch per-step MTRL training metrics from MLflow.

Retrieves rollout/reward/mean, rollout/turns/mean, training/total_tokens, and training/num_trajectories for each training step and prints a summary table.

Returns:

List of dicts, one per step, with keys step, rollout/reward/mean, rollout/turns/mean, training/total_tokens, and training/num_trajectories.

property job_arn: str#
property job_name: str#
property job_status: str#
property last_modified_time#
property mlflow_details: dict | None#

MLflow experiment/run details from ServiceOutput.

Returns dict with keys: ExperimentName, RunName, ExperimentId, RunId.

property output_model_package_arn: str | None#

ARN of the output model package from ServiceOutput, or None.

property progress_info: dict | None#

Training progress from ServiceOutput.

Supports two formats: - Epoch-based: dict with MaxEpoch, StepsPerEpoch, CurrentEpoch, CurrentStep. - Step-only: dict with MaxSteps, CurrentStep.

Returns None if not available.

refresh()[source]#

Refresh job state from DescribeJob API.

property s3_output_path: str | None#

S3 output path from OutputDataConfig.

property secondary_status: str#
property secondary_status_transitions: list#
stop()[source]#

Stop the job via StopJob API.

property training_config: dict | None#

Full TrainingConfig section from JobConfigDocument.

wait(poll: int = 5, timeout: int | None = 3000, max_log_lines: int = 20)[source]#

Wait for job to reach terminal status.

Parameters:
  • poll – Seconds between polls.

  • timeout – Maximum seconds to wait.

  • max_log_lines – Maximum number of log lines to display. Defaults to 20.

wait_for_delete()[source]#

Wait for job deletion to complete.