Remote function classes and methods specification

@remote decorator

client.remote(*, dependencies=None, pre_execution_commands=None, pre_execution_script=None, environment_variables=None, image_uri=None, include_local_workdir=None, custom_file_filter=None, instance_count=1, instance_type=None, job_conda_env=None, job_name_prefix=None, keep_alive_period_in_seconds=0, max_retry_attempts=1, max_runtime_in_seconds=86400, role=None, s3_kms_key=None, s3_root_uri=None, sagemaker_session=None, security_group_ids=None, subnets=None, tags=None, volume_kms_key=None, volume_size=30, encrypt_inter_container_traffic=None, spark_config=None, use_spot_instances=False, max_wait_time_in_seconds=None, use_torchrun=False, nproc_per_node=1)

Decorator for running the annotated function as a SageMaker training job.

This decorator wraps the annotated code and runs it as a new SageMaker job synchronously with the provided runtime settings.

If a parameter value is not set, the decorator first looks up the value from the SageMaker configuration file. If no value is specified in the configuration file or no configuration file is found, the decorator selects the default as specified below. For more information, see Configuring and using defaults with the SageMaker Python SDK.

Parameters
  • _func (Optional) – A Python function to run as a SageMaker training job.

  • dependencies (str) –

    Either the path to a dependencies file or the reserved keyword auto_capture. Defaults to None. If dependencies is provided, the value must be one of the following:

    • A path to a conda environment.yml file. The following conditions apply.

      • If job_conda_env is set, then the conda environment is updated by installing dependencies from the yaml file and the function is invoked within that conda environment. For this to succeed, the specified conda environment must already exist in the image.

      • If the environment variable SAGEMAKER_JOB_CONDA_ENV is set in the image, then the conda environment is updated by installing dependencies from the yaml file and the function is invoked within that conda environment. For this to succeed, the conda environment name must already be set in SAGEMAKER_JOB_CONDA_ENV, and SAGEMAKER_JOB_CONDA_ENV must already exist in the image.

      • If none of the previous conditions are met, a new conda environment named sagemaker-runtime-env is created and the function annotated with the remote decorator is invoked in that conda environment.

    • A path to a requirements.txt file. The following conditions apply.

      • If job_conda_env is set in the remote decorator, dependencies are installed within that conda environment and the function annotated with the remote decorator is invoked in the same conda environment. For this to succeed, the specified conda environment must already exist in the image.

      • If an environment variable SAGEMAKER_JOB_CONDA_ENV is set in the image, dependencies are installed within that conda environment and the function annotated with the remote decorator is invoked in the same. For this to succeed, the conda environment name must already be set in SAGEMAKER_JOB_CONDA_ENV, and SAGEMAKER_JOB_CONDA_ENV must already exist in the image.

      • If none of the above conditions are met, conda is not used. Dependencies are installed at the system level, without any virtual environment, and the function annotated with the remote decorator is invoked using the Python runtime available in the system path.

    • The parameter dependencies is set to auto_capture. SageMaker will automatically generate an env_snapshot.yml corresponding to the current active conda environment’s snapshot. You do not need to provide a dependencies file. The following conditions apply:

      • You must run the remote function within an active conda environment.

      • When installing the dependencies on the training job, the same conditions as when dependencies is set to a path to a conda environment file apply. These conditions are as follows:

        • If job_conda_env is set, then the conda environment is updated by installing dependencies from the yaml file and the function is invoked within that conda environment. For this to succeed, the specified conda environment must already exist in the image.

        • If the environment variable SAGEMAKER_JOB_CONDA_ENV is set in the image, then the conda environment is updated by installing dependencies from the yaml file and the function is invoked within that conda environment. For this to succeed, the conda environment name must already be set in SAGEMAKER_JOB_CONDA_ENV, and SAGEMAKER_JOB_CONDA_ENV must already exist in the image.

        • If none of the previous conditions are met, a new conda environment with name sagemaker-runtime-env is created and the function annotated with the remote decorator is invoked in that conda environment.

    • None. SageMaker will assume that there are no dependencies to install while executing the remote annotated function in the training job.

  • pre_execution_commands (List[str]) – List of commands to be executed prior to executing remote function. Only one of pre_execution_commands or pre_execution_script can be specified at the same time. Defaults to None.

  • pre_execution_script (str) – Path to script file to be executed prior to executing remote function. Only one of pre_execution_commands or pre_execution_script can be specified at the same time. Defaults to None.

  • environment_variables (Dict) – The environment variables used inside the decorator function. Defaults to None.

  • image_uri (str) –

    The universal resource identifier (URI) location of a Docker image on Amazon Elastic Container Registry (ECR). Defaults to the following based on where the SDK is running:

    • For users who specify spark_config and want to run the function in a Spark application, the image_uri should be None. A SageMaker Spark image will be used for training, otherwise a ValueError is thrown.

    • For users on SageMaker Studio notebooks, the image used as the kernel image for the notebook is used.

    • For other users, it is resolved to base python image with the same python version as the environment running the local code.

    If no compatible image is found, a ValueError is thrown.

  • include_local_workdir (bool) – A flag to indicate that the remote function should include local directories. Set to True if the remote function code imports local modules and methods that are not available via PyPI or conda. Only python files are included. Default value is False.

  • custom_file_filter (Callable[[str, List], List], CustomFileFilter) – Either a function that filters job dependencies to be uploaded to S3 or a CustomFileFilter object that specifies the local directories and files to be included in the remote function. If a callable is passed in, the function should follow the protocol of ignore argument of shutil.copytree. Defaults to None, which means only python files are accepted and uploaded to S3.

  • instance_count (int) – The number of instances to use. Defaults to 1. NOTE: Remote function does not support instance_count > 1 for non Spark jobs.

  • instance_type (str) – The Amazon Elastic Compute Cloud (EC2) instance type to use to run the SageMaker job. e.g. ml.c4.xlarge. If not provided, a ValueError is thrown.

  • job_conda_env (str) – The name of the conda environment to activate during job’s runtime. Defaults to None.

  • job_name_prefix (str) – The prefix used used to create the underlying SageMaker job.

  • keep_alive_period_in_seconds (int) – The duration in seconds to retain and reuse provisioned infrastructure after the completion of a training job, also known as SageMaker managed warm pools. The use of warmpools reduces the latency time spent to provision new resources. The default value for keep_alive_period_in_seconds is 0. NOTE: Additional charges associated with warm pools may apply. Using this parameter also activates a new persistent cache feature, which will further reduce job start up latency than over using SageMaker managed warm pools alone by caching the package source downloaded in the previous runs.

  • max_retry_attempts (int) – The max number of times the job is retried on InternalServerFailure Error from SageMaker service. Defaults to 1.

  • max_runtime_in_seconds (int) – The upper limit in seconds to be used for training. After this specified amount of time, SageMaker terminates the job regardless of its current status. Defaults to 1 day or (86400 seconds).

  • role (str) –

    The IAM role (either name or full ARN) used to run your SageMaker training job. Defaults to:

    • the SageMaker default IAM role if the SDK is running in SageMaker Notebooks or SageMaker Studio Notebooks.

    • if not above, a ValueError is be thrown.

  • s3_kms_key (str) – The key used to encrypt the input and output data. Default to None.

  • s3_root_uri (str) – The root S3 folder to which the code archives and data are uploaded to. Defaults to s3://<sagemaker-default-bucket>.

  • sagemaker_session (sagemaker.session.Session) – The underlying SageMaker session to which SageMaker service calls are delegated to (default: None). If not provided, one is created using a default configuration chain.

  • security_group_ids (List[str) – A list of security group IDs. Defaults to None and the training job is created without VPC config.

  • subnets (List[str) – A list of subnet IDs. Defaults to None and the job is created without VPC config.

  • tags (List[Tuple[str, str]) – A list of tags attached to the job. Defaults to None and the training job is created without tags.

  • volume_kms_key (str) – An Amazon Key Management Service (KMS) key used to encrypt an Amazon Elastic Block Storage (EBS) volume attached to the training instance. Defaults to None.

  • volume_size (int) – The size in GB of the storage volume for storing input and output data during training. Defaults to 30.

  • encrypt_inter_container_traffic (bool) – A flag that specifies whether traffic between training containers is encrypted for the training job. Defaults to False.

  • spark_config (SparkConfig) – Configurations to the Spark application that runs on Spark image. If spark_config is specified, a SageMaker Spark image uri will be used for training. Note that image_uri can not be specified at the same time otherwise a ValueError is thrown. Defaults to None.

  • use_spot_instances (bool) – Specifies whether to use SageMaker Managed Spot instances for training. If enabled then the max_wait_time_in_seconds arg should also be set. Defaults to False.

  • max_wait_time_in_seconds (int) – Timeout in seconds waiting for spot training job. After this amount of time Amazon SageMaker will stop waiting for managed spot training job to complete. Defaults to None.

  • use_torchrun (bool) – Specifies whether to use torchrun for distributed training. Defaults to False.

  • nproc_per_node (int) – Specifies the number of processes per node for distributed training. Defaults to 1.

RemoteExecutor

class sagemaker.remote_function.RemoteExecutor(*, dependencies=None, pre_execution_commands=None, pre_execution_script=None, environment_variables=None, image_uri=None, include_local_workdir=None, custom_file_filter=None, instance_count=1, instance_type=None, job_conda_env=None, job_name_prefix=None, keep_alive_period_in_seconds=0, max_parallel_jobs=1, max_retry_attempts=1, max_runtime_in_seconds=86400, role=None, s3_kms_key=None, s3_root_uri=None, sagemaker_session=None, security_group_ids=None, subnets=None, tags=None, volume_kms_key=None, volume_size=30, encrypt_inter_container_traffic=None, spark_config=None, use_spot_instances=False, max_wait_time_in_seconds=None, use_torchrun=False, nproc_per_node=1)

Run Python functions asynchronously as SageMaker jobs

Constructor for RemoteExecutor

If a parameter value is not set, the constructor first looks up the value from the SageMaker configuration file. If no value is specified in the configuration file or no configuration file is found, the constructor selects the default as specified below. For more information, see Configuring and using defaults with the SageMaker Python SDK.

Parameters
  • _func (Optional) – A Python function to run as a SageMaker training job.

  • dependencies (str) – Either the path to a dependencies file or the reserved keyword auto_capture. Defaults to None. If dependencies is provided, the value must be one of the following:

  • apply. (* A path to a requirements.txt file. The following conditions) –

    • If job_conda_env is set, then the conda environment is updated by installing dependencies from the yaml file and the function is invoked within that conda environment. For this to succeed, the specified conda environment must already exist in the image.

    • If the environment variable SAGEMAKER_JOB_CONDA_ENV is set in the image, then the conda environment is updated by installing dependencies from the yaml file and the function is invoked within that conda environment. For this to succeed, the conda environment name must already be set in SAGEMAKER_JOB_CONDA_ENV, and SAGEMAKER_JOB_CONDA_ENV must already exist in the image.

    • If none of the previous conditions are met, a new conda environment named sagemaker-runtime-env is created and the function annotated with the remote decorator is invoked in that conda environment.

  • apply.

    • If job_conda_env is set in the remote decorator, dependencies are installed within that conda environment and the function annotated with the remote decorator is invoked in the same conda environment. For this to succeed, the specified conda environment must already exist in the image.

    • If an environment variable SAGEMAKER_JOB_CONDA_ENV is set in the image, dependencies are installed within that conda environment and the function annotated with the remote decorator is invoked in the same. For this to succeed, the conda environment name must already be set in SAGEMAKER_JOB_CONDA_ENV, and SAGEMAKER_JOB_CONDA_ENV must already exist in the image.

    • If none of the above conditions are met, conda is not used. Dependencies are installed at the system level, without any virtual environment, and the function annotated with the remote decorator is invoked using the Python runtime available in the system path.

  • automatically (* The parameter dependencies is set to auto_capture. SageMaker will) –

    generate an env_snapshot.yml corresponding to the current active conda environment’s

    snapshot. You do not need to provide a dependencies file. The following conditions apply:

    • You must run the remote function within an active conda environment.

    • When installing the dependencies on the training job, the same conditions as when dependencies is set to a path to a conda environment file apply. These conditions are as follows:

      • If job_conda_env is set, then the conda environment is updated by installing dependencies from the yaml file and the function is invoked within that conda environment. For this to succeed, the specified conda environment must already exist in the image.

      • If the environment variable SAGEMAKER_JOB_CONDA_ENV is set in the image, then the conda environment is updated by installing dependencies from the yaml file and the function is invoked within that conda environment. For this to succeed, the conda environment name must already be set in SAGEMAKER_JOB_CONDA_ENV, and SAGEMAKER_JOB_CONDA_ENV must already exist in the image.

      • If none of the previous conditions are met, a new conda environment with name sagemaker-runtime-env is created and the function annotated with the remote decorator is invoked in that conda environment.

    • None. SageMaker will assume that there are no dependencies to install while executing the remote annotated function in the training job.

  • pre_execution_commands (List[str]) – List of commands to be executed prior to executing remote function. Only one of pre_execution_commands or pre_execution_script can be specified at the same time. Defaults to None.

  • pre_execution_script (str) – Path to script file to be executed prior to executing remote function. Only one of pre_execution_commands or pre_execution_script can be specified at the same time. Defaults to None.

  • environment_variables (Dict) – The environment variables used inside the decorator function. Defaults to None.

  • image_uri (str) –

    The universal resource identifier (URI) location of a Docker image on Amazon Elastic Container Registry (ECR). Defaults to the following based on where the SDK is running:

    • For users who specify spark_config and want to run the function in a Spark application, the image_uri should be None. A SageMaker Spark image will be used for training, otherwise a ValueError is thrown.

    • For users on SageMaker Studio notebooks, the image used as the kernel image for the notebook is used.

    • For other users, it is resolved to base python image with the same python version as the environment running the local code.

    If no compatible image is found, a ValueError is thrown.

  • include_local_workdir (bool) – A flag to indicate that the remote function should include local directories. Set to True if the remote function code imports local modules and methods that are not available via PyPI or conda. Default value is False.

  • custom_file_filter (Callable[[str, List], List], CustomFileFilter) – Either a function that filters job dependencies to be uploaded to S3 or a CustomFileFilter object that specifies the local directories and files to be included in the remote function. If a callable is passed in, that function is passed to the ignore argument of shutil.copytree. Defaults to None, which means only python files are accepted and uploaded to S3.

  • instance_count (int) – The number of instances to use. Defaults to 1. NOTE: Remote function does not support instance_count > 1 for non Spark jobs.

  • instance_type (str) – The Amazon Elastic Compute Cloud (EC2) instance type to use to run the SageMaker job. e.g. ml.c4.xlarge. If not provided, a ValueError is thrown.

  • job_conda_env (str) – The name of the conda environment to activate during job’s runtime. Defaults to None.

  • job_name_prefix (str) – The prefix used used to create the underlying SageMaker job.

  • keep_alive_period_in_seconds (int) – The duration in seconds to retain and reuse provisioned infrastructure after the completion of a training job, also known as SageMaker managed warm pools. The use of warmpools reduces the latency time spent to provision new resources. The default value for keep_alive_period_in_seconds is 0. NOTE: Additional charges associated with warm pools may apply. Using this parameter also activates a new pesistent cache feature, which will further reduce job start up latency than over using SageMaker managed warm pools alone by caching the package source downloaded in the previous runs.

  • max_parallel_jobs (int) – Maximum number of jobs that run in parallel. Defaults to 1.

  • max_retry_attempts (int) – The max number of times the job is retried on InternalServerFailure Error from SageMaker service. Defaults to 1.

  • max_runtime_in_seconds (int) – The upper limit in seconds to be used for training. After this specified amount of time, SageMaker terminates the job regardless of its current status. Defaults to 1 day or (86400 seconds).

  • role (str) –

    The IAM role (either name or full ARN) used to run your SageMaker training job. Defaults to:

    • the SageMaker default IAM role if the SDK is running in SageMaker Notebooks or SageMaker Studio Notebooks.

    • if not above, a ValueError is be thrown.

  • s3_kms_key (str) – The key used to encrypt the input and output data. Default to None.

  • s3_root_uri (str) – The root S3 folder to which the code archives and data are uploaded to. Defaults to s3://<sagemaker-default-bucket>.

  • sagemaker_session (sagemaker.session.Session) – The underlying SageMaker session to which SageMaker service calls are delegated to (default: None). If not provided, one is created using a default configuration chain.

  • security_group_ids (List[str) – A list of security group IDs. Defaults to None and the training job is created without VPC config.

  • subnets (List[str) – A list of subnet IDs. Defaults to None and the job is created without VPC config.

  • tags (List[Tuple[str, str]) – A list of tags attached to the job. Defaults to None and the training job is created without tags.

  • volume_kms_key (str) – An Amazon Key Management Service (KMS) key used to encrypt an Amazon Elastic Block Storage (EBS) volume attached to the training instance. Defaults to None.

  • volume_size (int) – The size in GB of the storage volume for storing input and output data during training. Defaults to 30.

  • encrypt_inter_container_traffic (bool) – A flag that specifies whether traffic between training containers is encrypted for the training job. Defaults to False.

  • spark_config (SparkConfig) – Configurations to the Spark application that runs on Spark image. If spark_config is specified, a SageMaker Spark image uri will be used for training. Note that image_uri can not be specified at the same time otherwise a ValueError is thrown. Defaults to None.

  • use_spot_instances (bool) – Specifies whether to use SageMaker Managed Spot instances for training. If enabled then the max_wait_time_in_seconds arg should also be set. Defaults to False.

  • max_wait_time_in_seconds (int) – Timeout in seconds waiting for spot training job. After this amount of time Amazon SageMaker will stop waiting for managed spot training job to complete. Defaults to None.

  • use_torchrun (bool) – Specifies whether to use torchrun for distributed training. Defaults to False.

  • nproc_per_node (int) – Specifies the number of processes per node. Defaults to 1.

submit(func, *args, **kwargs)

Execute the input function as a SageMaker job asynchronously.

Parameters
  • func – Python function to run as a SageMaker job.

  • *args – Positional arguments to the input function.

  • **kwargs – keyword arguments to the input function

map(func, *iterables)

Return an iterator that applies function to every item of iterable, yielding the results.

If additional iterables arguments are passed, function must take that many arguments and is applied to the items from all iterables in parallel. With multiple iterables, the iterator stops when the shortest iterable is exhausted.

Parameters
  • func – Python function to run as a SageMaker job.

  • iterables – Arguments of the input python function.

shutdown()

Prevent more function executions to be submitted to this executor.

Future

class sagemaker.remote_function.client.Future

Class representing a reference to a SageMaker job result.

Reference to the SageMaker job created as a result of the remote function run. The job may or may not have finished running.

static from_describe_response(describe_training_job_response, sagemaker_session)

Construct a Future from a describe_training_job_response object.

result(timeout=None)

Returns the SageMaker job result.

This method waits for the SageMaker job created from the remote function execution to complete for up to the timeout value (if specified). If timeout is None, this method will wait until the SageMaker job completes.

Parameters

timeout (float) – Timeout in seconds to wait until the job is completed. None by default.

Returns

The Python object returned by the remote function.

Return type

Any

wait(timeout=None)

Wait for the underlying SageMaker job to complete.

This method waits for the SageMaker job created as a result of the remote function run to complete for up to the timeout value (if specified). If timeout is None, this method will block until the job is completed.

Parameters

timeout (int) – Timeout in seconds to wait until the job is completed before it is stopped. Defaults to None.

Returns

None

Return type

None

cancel()

Cancel the function execution.

This method prevents the SageMaker job being created or stops the underlying SageMaker job early if it is already in progress.

Returns

True if the underlying SageMaker job created as a result of the remote function run is cancelled.

Return type

bool

running()

Check if the underlying SageMaker job is running.

Returns

True if the underlying SageMaker job is still running. False, otherwise.

Return type

bool

cancelled()

Check if the underlying SageMaker job was cancelled.

Returns

True if the underlying SageMaker job was cancelled. False, otherwise.

Return type

bool

done()

Check if the underlying SageMaker job is finished.

Returns

True if the underlying SageMaker job finished running. False, otherwise.

Return type

bool

client.list_futures(sagemaker_session=None)

Generates Future objects with information about jobs with given job_name_prefix.

Parameters
  • job_name_prefix (str) – A prefix used to identify the SageMaker jobs associated with remote function run.

  • sagemaker_session (sagemaker.session.Session) – A session object that manages interactions with Amazon SageMaker APIs and any other AWS services needed.

Yields

A sagemaker.remote_function.client.Future instance.

client.get_future(sagemaker_session=None)

Get a future object with information about a job with the given job_name.

Parameters
  • job_name (str) – name of the underlying SageMaker job created as a result of the remote function run.

  • sagemaker_session (sagemaker.session.Session) – A session object that manages interactions with Amazon SageMaker APIs and any other AWS services needed.

Returns

A sagemaker.remote_function.client.Future instance.

Return type

Future

CustomFileFilter

class sagemaker.remote_function.custom_file_filter.CustomFileFilter(*, ignore_name_patterns=None)

Configuration that specifies how the local working directory should be packaged.

Initialize a CustomFileFilter.

Parameters

ignore_name_patterns (List[str]) – ignore files or directories with names that match one of the glob-style patterns. Defaults to None.