Distributed Training APIs¶
SageMaker distributed training libraries offer both data parallel and model parallel training strategies. They combine software and hardware technologies to improve inter-GPU and inter-node communications. They extend SageMaker’s training capabilities with built-in options that require only small code changes to your training scripts.
The SageMaker Distributed Data Parallel Library¶
- The SageMaker Distributed Data Parallel Library Overview
- Use the Library to Adapt Your Training Script
- Launch a Distributed Training Job Using the SageMaker Python SDK
- Release Notes
- SageMaker Distributed Data Parallel 1.4.0 Release Notes
- Release History
- SageMaker Distributed Data Parallel 1.2.2 Release Notes
- SageMaker Distributed Data Parallel 1.2.1 Release Notes
- SageMaker Distributed Data Parallel 1.2.0 Release Notes
- SageMaker Distributed Data Parallel 1.1.2 Release Notes
- SageMaker Distributed Data Parallel 1.1.1 Release Notes
- SageMaker Distributed Data Parallel 1.1.0 Release Notes
- SageMaker Distributed Data Parallel 1.0.0 Release Notes
The SageMaker Distributed Model Parallel Library¶
- The SageMaker Distributed Model Parallel Library Overview
- Use the Library’s API to Adapt Training Scripts
- Run a Distributed Training Job Using the SageMaker Python SDK
- Release Notes
- SageMaker Distributed Model Parallel 1.7.0 Release Notes
- Release History
- SageMaker Distributed Model Parallel 1.6.0 Release Notes
- SageMaker Distributed Model Parallel 1.5.0 Release Notes
- SageMaker Distributed Model Parallel 1.4.0 Release Notes
- SageMaker Distributed Model Parallel 1.3.1 Release Notes
- SageMaker Distributed Model Parallel 1.3.0 Release Notes
- SageMaker Distributed Model Parallel 1.2.0 Release Notes
- SageMaker Distributed Model Parallel 1.1.0 Release Notes