sagemaker
stable


Filters:
  • Using the SageMaker Python SDK
  • Use Version 2.x of the SageMaker Python SDK
  • APIs
    • Feature Store APIs
    • Training APIs
    • Distributed Training APIs
      • The SageMaker Distributed Data Parallel Library
        • The SageMaker Distributed Data Parallel Library Overview
        • Use the Library to Adapt Your Training Script
        • Launch a Distributed Training Job Using the SageMaker Python SDK
        • Release Notes
      • The SageMaker Distributed Model Parallel Library
        • The SageMaker Distributed Model Parallel Library Overview
        • Use the Library’s API to Adapt Training Scripts
        • Run a Distributed Training Job Using the SageMaker Python SDK
        • Release Notes
    • Inference APIs
    • Governance APIs
    • Utility APIs
  • Frameworks
  • Built-in Algorithms
  • Workflows
  • Amazon SageMaker Experiments
  • Amazon SageMaker Debugger
  • Amazon SageMaker Feature Store
  • Amazon SageMaker Model Monitor
  • Amazon SageMaker Processing
  • Amazon SageMaker Model Building Pipeline
sagemaker
  • »
  • APIs »
  • Distributed Training APIs
  • Edit on GitHub

Distributed Training APIs¶

SageMaker distributed training libraries offer both data parallel and model parallel training strategies. They combine software and hardware technologies to improve inter-GPU and inter-node communications. They extend SageMaker’s training capabilities with built-in options that require only small code changes to your training scripts.

The SageMaker Distributed Data Parallel Library¶

  • The SageMaker Distributed Data Parallel Library Overview
  • Use the Library to Adapt Your Training Script
    • For versions between 1.4.0 and 1.7.0 (Latest)
    • Documentation Archive
  • Launch a Distributed Training Job Using the SageMaker Python SDK
  • Release Notes
    • SageMaker Distributed Data Parallel 1.7.0 Release Notes
    • Release History

The SageMaker Distributed Model Parallel Library¶

  • The SageMaker Distributed Model Parallel Library Overview
  • Use the Library’s API to Adapt Training Scripts
    • Version 1.11.0, 1.13.0, 1.14.0 (Latest)
    • Documentation Archive
  • Run a Distributed Training Job Using the SageMaker Python SDK
    • Configuration Parameters for distribution
    • Ranking Basics without Tensor Parallelism
    • Placement Strategy with Tensor Parallelism
    • Prescaled Batch
  • Release Notes
    • SageMaker Distributed Model Parallel 1.14.0 Release Notes
    • Release History
Next Previous

© Copyright 2023, Amazon Revision 6ce0d99f.

Built with Sphinx using a theme provided by Read the Docs.
Read the Docs v: stable
Versions
Downloads
On Read the Docs
Project Home
Builds