CPT Training on HyperPod

Contents

CPT Training on HyperPod#

This notebook demonstrates Continued Pre-Training (CPTTrainer) on HyperPod.

CPT operates on a raw corpus rather than instruction pairs, extending the model’s knowledge in a specific domain.

Note: CPTTrainer is supported for Nova models only.

What you will learn#

Create a CPTTrainer with HyperPod compute
Submit a CPT training job

1. Setup#

# === Fill in your AWS resources ===
S3_BUCKET = "<your-s3-bucket>"  # e.g. "sagemaker-us-east-1-123456789012"
TRAINING_DATASET = f"s3://{S3_BUCKET}/cpt-data/cpt-corpus.jsonl"
S3_OUTPUT_PATH = f"s3://{S3_BUCKET}/cpt-hyperpod/output/"
CLUSTER_NAME = "<your-cluster-name>"  # e.g. "my-cluster"

2. Create CPTTrainer with HyperPod Compute#

Use CPTTrainer with HyperPodCompute for distributed pre-training on a managed cluster.

from sagemaker.train import CPTTrainer
from sagemaker.core.training.configs import HyperPodCompute

compute = HyperPodCompute(
    cluster_name=CLUSTER_NAME,
    instance_type="ml.p5.48xlarge",
    node_count=2,
)

cpt_trainer = CPTTrainer(
    model="nova-textgeneration-micro",
    compute=compute,
    training_dataset=TRAINING_DATASET,
    s3_output_path=S3_OUTPUT_PATH,
)

3. Submit Training Job#

job_name = cpt_trainer.train(wait=False)
print(f"HyperPod CPT job submitted: {job_name}")