Inputs¶
Amazon SageMaker channel configurations for S3 data sources and file system data sources
-
class
sagemaker.inputs.
s3_input
(s3_data, distribution=None, compression=None, content_type=None, record_wrapping=None, s3_data_type='S3Prefix', input_mode=None, attribute_names=None, target_attribute_name=None, shuffle_config=None)¶ Bases:
object
Amazon SageMaker channel configurations for S3 data sources.
-
config
¶ dict[str, dict] – A SageMaker
DataSource
referencing a SageMakerS3DataSource
.
Create a definition for input data used by an SageMaker training job. See AWS documentation on the
CreateTrainingJob
API for more details on the parameters.Parameters: - s3_data (str) – Defines the location of s3 data to train on.
- distribution (str) – Valid values: ‘FullyReplicated’, ‘ShardedByS3Key’ (default: ‘FullyReplicated’).
- compression (str) – Valid values: ‘Gzip’, None (default: None). This is used only in Pipe input mode.
- content_type (str) – MIME type of the input data (default: None).
- record_wrapping (str) – Valid values: ‘RecordIO’ (default: None).
- s3_data_type (str) – Valid values: ‘S3Prefix’, ‘ManifestFile’, ‘AugmentedManifestFile’.
If ‘S3Prefix’,
s3_data
defines a prefix of s3 objects to train on. All objects with s3 keys beginning withs3_data
will be used to train. If ‘ManifestFile’ or ‘AugmentedManifestFile’, thens3_data
defines a single S3 manifest file or augmented manifest file (respectively), listing the S3 data to train on. Both the ManifestFile and AugmentedManifestFile formats are described in the SageMaker API documentation: https://docs.aws.amazon.com/sagemaker/latest/dg/API_S3DataSource.html - input_mode (str) –
Optional override for this channel’s input mode (default: None). By default, channels will use the input mode defined on
sagemaker.estimator.EstimatorBase.input_mode
, but they will ignore that setting if this parameter is set.- None - Amazon SageMaker will use the input mode specified in the
Estimator
- ’File’ - Amazon SageMaker copies the training dataset from the S3 location to
- a local directory.
- ’Pipe’ - Amazon SageMaker streams data directly from S3 to the container via
- a Unix-named pipe.
- None - Amazon SageMaker will use the input mode specified in the
- attribute_names (list[str]) – A list of one or more attribute names to use that are found in a specified AugmentedManifestFile.
- target_attribute_name (str) – The name of the attribute will be predicted (classified) in a SageMaker AutoML job. It is required if the input is for SageMaker AutoML job.
- shuffle_config (ShuffleConfig) – If specified this configuration enables shuffling on this channel. See the SageMaker API documentation for more info: https://docs.aws.amazon.com/sagemaker/latest/dg/API_ShuffleConfig.html
-
-
class
sagemaker.inputs.
FileSystemInput
(file_system_id, file_system_type, directory_path, file_system_access_mode='ro', content_type=None)¶ Bases:
object
Amazon SageMaker channel configurations for file system data sources.
-
config
¶ dict[str, dict] – A Sagemaker File System
DataSource
.
Create a new file system input used by an SageMaker training job.
Parameters: - file_system_id (str) – An Amazon file system ID starting with ‘fs-‘.
- file_system_type (str) – The type of file system used for the input. Valid values: ‘EFS’, ‘FSxLustre’.
- directory_path (str) – Absolute or normalized path to the root directory (mount point) in the file system. Reference: https://docs.aws.amazon.com/efs/latest/ug/mounting-fs.html and https://docs.aws.amazon.com/fsx/latest/LustreGuide/mount-fs-auto-mount-onreboot.html
- file_system_access_mode (str) – Permissions for read and write. Valid values: ‘ro’ or ‘rw’. Defaults to ‘ro’.
-