sagemaker.train.rlvr_trainer#
Classes
|
Class that performs Reinforcement Learning from Verifiable Rewards (RLVR) fine-tuning on foundation models using AWS SageMaker. |
- class sagemaker.train.rlvr_trainer.RLVRTrainer(model: str | ModelPackage, training_type: TrainingType | str = TrainingType.LORA, model_package_group: str | ModelPackageGroup | None = None, custom_reward_function: str | Evaluator | None = None, mlflow_resource_arn: str | MlflowTrackingServer | None = None, mlflow_experiment_name: str | None = None, mlflow_run_name: str | None = None, training_dataset: str | DataSet | None = None, validation_dataset: str | DataSet | None = None, s3_output_path: str | None = None, kms_key_id: str | None = None, networking: VpcConfig | None = None, accept_eula: bool = False, stopping_condition: StoppingCondition | None = None, **kwargs)[source]#
Bases:
BaseTrainerClass that performs Reinforcement Learning from Verifiable Rewards (RLVR) fine-tuning on foundation models using AWS SageMaker.
Example:
from sagemaker.train import RLVRTrainer from sagemaker.train.common import TrainingType trainer = RLVRTrainer( model="meta-llama/Llama-2-7b-hf", training_type=TrainingType.LORA, model_package_group="my-model-group", custom_reward_function="arn:aws:sagemaker:us-east-1:123456789012:hub-content/SageMakerPublicHub/JsonDoc/my-evaluator/1.0", training_dataset="s3://bucket/rlvr_data.jsonl" ) trainer.train() # Complete workflow: create -> wait -> get model package ARN trainer = RLVRTrainer( model="meta-llama/Llama-2-7b-hf", model_package_group="my-rlvr-models", custom_reward_function="arn:aws:sagemaker:us-east-1:123456789012:hub-content/SageMakerPublicHub/JsonDoc/my-evaluator/1.0" ) # Create training job (non-blocking) training_job = trainer.train( training_dataset="s3://bucket/rlvr_data.jsonl", wait=False ) # Wait for completion training_job.wait() # Refresh job status training_job.refresh() # Get the fine-tuned model package ARN model_package_arn = training_job.output_model_package_arn
- Parameters:
model (Union[str, ModelPackage]) – The foundation model to fine-tune. Can be a model name string, model package ARN, or ModelPackage object.
training_type (Union[TrainingType, str]) – The fine-tuning approach. Valid values are TrainingType.LORA (default), TrainingType.FULL.
model_package_group (Optional[Union[str, ModelPackageGroup]]) – The model package group for storing the fine-tuned model. Can be a group name, ARN, or ModelPackageGroup object. Required when model is not a ModelPackage.
custom_reward_function (Optional[Union[str, Evaluator]]) – The custom reward function evaluator. Can be an evaluator ARN string or Evaluator object. Required for RLVR training to provide reward signals.
mlflow_resource_arn (Optional[Union[str, MlflowTrackingServer]]) – The MLflow tracking server ARN for experiment tracking. If not specified, uses default MLflow experience.
mlflow_experiment_name (Optional[str]) – The MLflow experiment name for organizing runs.
mlflow_run_name (Optional[str]) – The MLflow run name for this training job.
training_dataset (Optional[Union[str, DataSet]]) – The training dataset. Can be a dataset ARN, or DataSet object.
validation_dataset (Optional[Union[str, DataSet]]) – The validation dataset. Can be a dataset ARN, or DataSet object.
s3_output_path (Optional[str]) – The S3 path for training job outputs. If not specified, defaults to s3://sagemaker-<region>-<account>/output.
kms_key_id (Optional[str]) – The KMS key ID for encrypting training job outputs.
networking (Optional[VpcConfig]) – The VPC configuration for the training job.
stopping_condition (Optional[StoppingCondition]) – The stopping condition to override training runtime limit. If not specified, uses SageMaker service default (24 hours for serverless training).
- train(training_dataset: str | DataSet | None = None, validation_dataset: str | DataSet | None = None, wait: bool = True, wait_timeout: int | None = None, poll: int = 5)[source]#
Execute the RLVR training job.
- Parameters:
training_dataset (Optional[Union[str, DataSet]]) – The training dataset for this job. Overrides the dataset specified in __init__. Can be an S3 URI, dataset ARN, or DataSet object.
validation_dataset (Optional[Union[str, DataSet]]) – The validation dataset for this job. Overrides the dataset specified in __init__. Can be an S3 URI, dataset ARN, or DataSet object.
wait (bool) – Whether to wait for the training job to complete. Defaults to True.
wait_timeout (Optional[int]) – Maximum time in seconds to wait for the training job to complete. Only used when wait=True. If None, uses the default timeout from the wait utility.
poll (int) – Polling interval in seconds for checking training job status. Defaults to 5.
- Returns:
The SageMaker training job object.
- Return type: