Inputs

Amazon SageMaker channel configurations for S3 data sources and file system data sources

class sagemaker.inputs.s3_input(s3_data, distribution=None, compression=None, content_type=None, record_wrapping=None, s3_data_type='S3Prefix', input_mode=None, attribute_names=None, target_attribute_name=None, shuffle_config=None)

Bases: object

Amazon SageMaker channel configurations for S3 data sources.

config

A SageMaker DataSource referencing a SageMaker S3DataSource.

Type

dict[str, dict]

Create a definition for input data used by an SageMaker training job. See AWS documentation on the CreateTrainingJob API for more details on the parameters.

Parameters
  • s3_data (str) – Defines the location of s3 data to train on.

  • distribution (str) – Valid values: ‘FullyReplicated’, ‘ShardedByS3Key’ (default: ‘FullyReplicated’).

  • compression (str) – Valid values: ‘Gzip’, None (default: None). This is used only in Pipe input mode.

  • content_type (str) – MIME type of the input data (default: None).

  • record_wrapping (str) – Valid values: ‘RecordIO’ (default: None).

  • s3_data_type (str) – Valid values: ‘S3Prefix’, ‘ManifestFile’, ‘AugmentedManifestFile’. If ‘S3Prefix’, s3_data defines a prefix of s3 objects to train on. All objects with s3 keys beginning with s3_data will be used to train. If ‘ManifestFile’ or ‘AugmentedManifestFile’, then s3_data defines a single S3 manifest file or augmented manifest file (respectively), listing the S3 data to train on. Both the ManifestFile and AugmentedManifestFile formats are described in the SageMaker API documentation: https://docs.aws.amazon.com/sagemaker/latest/dg/API_S3DataSource.html

  • input_mode (str) –

    Optional override for this channel’s input mode (default: None). By default, channels will use the input mode defined on sagemaker.estimator.EstimatorBase.input_mode, but they will ignore that setting if this parameter is set.

    • None - Amazon SageMaker will use the input mode specified in the Estimator

    • ’File’ - Amazon SageMaker copies the training dataset from the S3 location to

      a local directory.

    • ’Pipe’ - Amazon SageMaker streams data directly from S3 to the container via

      a Unix-named pipe.

  • attribute_names (list[str]) – A list of one or more attribute names to use that are found in a specified AugmentedManifestFile.

  • target_attribute_name (str) – The name of the attribute will be predicted (classified) in a SageMaker AutoML job. It is required if the input is for SageMaker AutoML job.

  • shuffle_config (ShuffleConfig) – If specified this configuration enables shuffling on this channel. See the SageMaker API documentation for more info: https://docs.aws.amazon.com/sagemaker/latest/dg/API_ShuffleConfig.html

class sagemaker.inputs.FileSystemInput(file_system_id, file_system_type, directory_path, file_system_access_mode='ro', content_type=None)

Bases: object

Amazon SageMaker channel configurations for file system data sources.

config

A Sagemaker File System DataSource.

Type

dict[str, dict]

Create a new file system input used by an SageMaker training job.

Parameters