Distributed Training APIs¶
SageMaker distributed training libraries offer both data parallel and model parallel training strategies. They combine software and hardware technologies to improve inter-GPU and inter-node communications. They extend SageMaker’s training capabilities with built-in options that require only small code changes to your training scripts.
The SageMaker Distributed Data Parallel Library¶
The SageMaker Distributed Model Parallel Library¶
- The SageMaker Distributed Model Parallel Library Overview
- Use the Library’s API to Adapt Training Scripts
- Run a Distributed Training Job Using the SageMaker Python SDK
- Release Notes