Use the Library to Adapt Your Training Script¶
This section contains the SageMaker distributed data parallel API documentation. If you are a new user of this library, it is recommended you use this guide alongside SageMaker’s Distributed Data Parallel Library.
The library provides framework-specific APIs for TensorFlow and PyTorch.
Select the latest or one of the previous versions of the API documentation depending on the version of the library you use.
The distributed data parallel library supports training jobs using CUDA 11 or later.
When you define a
estimator with the data parallel library enabled,
SageMaker uses CUDA 11. When you extend or customize your own training image,
you must use a base image with CUDA 11 or later. See
SageMaker Python SDK’s distributed data parallel library APIs
for more information.