Use the Library to Adapt Your Training Script

This section contains the SageMaker distributed data parallel API documentation. If you are a new user of this library, it is recommended you use this guide alongside SageMaker’s Distributed Data Parallel Library.

The library provides framework-specific APIs for TensorFlow and PyTorch.

Select the latest or one of the previous versions of the API documentation depending on the version of the library you use.

Important

The distributed data parallel library supports training jobs using CUDA 11 or later. When you define a sagemaker.tensorflow.estimator.TensorFlow or sagemaker.pytorch.estimator.PyTorch estimator with the data parallel library enabled, SageMaker uses CUDA 11. When you extend or customize your own training image, you must use a base image with CUDA 11 or later. See SageMaker Python SDK’s distributed data parallel library APIs for more information.

Version 1.4.0, 1.4.1, 1.5.0 (Latest)

Documentation Archive

To find the API documentation for the previous versions of the library, choose one of the following: