sagemaker.train.agent_rft_job

sagemaker.train.agent_rft_job#

AgentRFTJob — wrapper around sagemaker-core Job for AgentRFT job category.

Classes

AgentRFTJob(job)

Wrapper around sagemaker-core Job for AgentRFT job category.

class sagemaker.train.agent_rft_job.AgentRFTJob(job: Job)[source]#

Bases: object

Wrapper around sagemaker-core Job for AgentRFT job category.

Delegates lifecycle methods to the underlying Job and adds typed convenience properties by parsing the JobConfigDocument JSON string.

Parameters:: job – The sagemaker-core Job instance to wrap.

JOB_CATEGORY = 'AgentRFT'#

property agent_config: dict | None#: Full AgentConfig section from JobConfigDocument.

property billable_token_usage: dict | None#

Billable token usage from ServiceOutput.

Returns dict with keys: TrainTokenCount, PrefillTokenCount, SampleTokenCount.

property creation_time#

delete()[source]#: Delete the job via DeleteJob API.

property end_time#

property failure_reason: str | None#

classmethod from_job(job: Job) → AgentRFTJob[source]#: Create an AgentRFTJob from a sagemaker-core Job instance.

classmethod get(job_name: str, session=None) → AgentRFTJob[source]#

Attach to an existing AgentRFT job by name.

Parameters:

job_name – The name of the job.
session – Optional boto3 session.

Returns:

AgentRFTJob wrapping the existing job.

classmethod get_all(session=None, **kwargs)[source]#

List all AgentRFT jobs.

Delegates to Job.get_all with job_category pre-filled. Additional keyword arguments (e.g. creation_time_after, name_contains, sort_by, sort_order, status_equals) are forwarded.

Parameters:

session – Optional boto3 session.
**kwargs – Additional filter arguments forwarded to Job.get_all.

Yields:

AgentRFTJob instances.

get_mlflow_url() → str | None[source]#

Generate a fresh presigned MLflow URL for this job’s experiment/run.

In Jupyter notebooks, also renders a clickable link.

Returns:: Presigned URL string, or None if MLflow is not configured.

get_training_metrics() → list[dict][source]#

Fetch per-step MTRL training metrics from MLflow.

Retrieves rollout/reward/mean, rollout/turns/mean, training/total_tokens, and training/num_trajectories for each training step and prints a summary table.

Returns:: List of dicts, one per step, with keys step, rollout/reward/mean, rollout/turns/mean, training/total_tokens, and training/num_trajectories.

property job_arn: str#

property job_name: str#

property job_status: str#

property last_modified_time#

property mlflow_details: dict | None#

MLflow experiment/run details from ServiceOutput.

Returns dict with keys: ExperimentName, RunName, ExperimentId, RunId.

property output_model_package_arn: str | None#: ARN of the output model package from ServiceOutput, or None.

property progress_info: dict | None#

Training progress from ServiceOutput.

Supports two formats: - Epoch-based: dict with MaxEpoch, StepsPerEpoch, CurrentEpoch, CurrentStep. - Step-only: dict with MaxSteps, CurrentStep.

Returns None if not available.

refresh()[source]#: Refresh job state from DescribeJob API.

property s3_output_path: str | None#: S3 output path from OutputDataConfig.

property secondary_status: str#

property secondary_status_transitions: list#

stop()[source]#: Stop the job via StopJob API.

property training_config: dict | None#: Full TrainingConfig section from JobConfigDocument.

wait(poll: int = 5, timeout: int | None = 3000, max_log_lines: int = 20)[source]#

Wait for job to reach terminal status.

Parameters:

poll – Seconds between polls.
timeout – Maximum seconds to wait.
max_log_lines – Maximum number of log lines to display. Defaults to 20.

wait_for_delete()[source]#: Wait for job deletion to complete.

sagemaker.train.agent_rft_job

Contents

sagemaker.train.agent_rft_job#