Inputs¶
Amazon SageMaker channel configurations for S3 data sources and file system data sources
-
class
sagemaker.inputs.
TrainingInput
(s3_data, distribution=None, compression=None, content_type=None, record_wrapping=None, s3_data_type='S3Prefix', input_mode=None, attribute_names=None, target_attribute_name=None, shuffle_config=None)¶ Bases:
object
Amazon SageMaker channel configurations for S3 data sources.
Create a definition for input data used by an SageMaker training job.
See AWS documentation on the
CreateTrainingJob
API for more details on the parameters.- Parameters
s3_data (str) – Defines the location of s3 data to train on.
distribution (str) – Valid values: ‘FullyReplicated’, ‘ShardedByS3Key’ (default: ‘FullyReplicated’).
compression (str) – Valid values: ‘Gzip’, None (default: None). This is used only in Pipe input mode.
content_type (str) – MIME type of the input data (default: None).
record_wrapping (str) – Valid values: ‘RecordIO’ (default: None).
s3_data_type (str) – Valid values: ‘S3Prefix’, ‘ManifestFile’, ‘AugmentedManifestFile’. If ‘S3Prefix’,
s3_data
defines a prefix of s3 objects to train on. All objects with s3 keys beginning withs3_data
will be used to train. If ‘ManifestFile’ or ‘AugmentedManifestFile’, thens3_data
defines a single S3 manifest file or augmented manifest file (respectively), listing the S3 data to train on. Both the ManifestFile and AugmentedManifestFile formats are described in the SageMaker API documentation: https://docs.aws.amazon.com/sagemaker/latest/dg/API_S3DataSource.htmlinput_mode (str) –
Optional override for this channel’s input mode (default: None). By default, channels will use the input mode defined on
sagemaker.estimator.EstimatorBase.input_mode
, but they will ignore that setting if this parameter is set.None - Amazon SageMaker will use the input mode specified in the
Estimator
- ’File’ - Amazon SageMaker copies the training dataset from the S3 location to
a local directory.
- ’Pipe’ - Amazon SageMaker streams data directly from S3 to the container via
a Unix-named pipe.
- ’FastFile’ - Amazon SageMaker streams data from S3 on demand instead of
downloading the entire dataset before training begins.
attribute_names (list[str]) – A list of one or more attribute names to use that are found in a specified AugmentedManifestFile.
target_attribute_name (str) – The name of the attribute will be predicted (classified) in a SageMaker AutoML job. It is required if the input is for SageMaker AutoML job.
shuffle_config (sagemaker.inputs.ShuffleConfig) – If specified this configuration enables shuffling on this channel. See the SageMaker API documentation for more info: https://docs.aws.amazon.com/sagemaker/latest/dg/API_ShuffleConfig.html
-
class
sagemaker.inputs.
ShuffleConfig
(seed)¶ Bases:
object
For configuring channel shuffling using a seed.
For more detail, see the AWS documentation: https://docs.aws.amazon.com/sagemaker/latest/dg/API_ShuffleConfig.html
Create a ShuffleConfig.
- Parameters
seed (long) – the long value used to seed the shuffled sequence.
-
class
sagemaker.inputs.
CreateModelInput
(instance_type: str = None, accelerator_type: str = None)¶ Bases:
object
A class containing parameters which can be used to create a SageMaker Model
- Parameters
Method generated by attrs for class CreateModelInput.
-
class
sagemaker.inputs.
TransformInput
(data: str, data_type: str = 'S3Prefix', content_type: str = None, compression_type: str = None, split_type: str = None, input_filter: str = None, output_filter: str = None, join_source: str = None, model_client_config: dict = None)¶ Bases:
object
Create a class containing all the parameters.
It can be used when calling
sagemaker.transformer.Transformer.transform()
Method generated by attrs for class TransformInput.
-
class
sagemaker.inputs.
FileSystemInput
(file_system_id, file_system_type, directory_path, file_system_access_mode='ro', content_type=None)¶ Bases:
object
Amazon SageMaker channel configurations for file system data sources.
Create a new file system input used by an SageMaker training job.
- Parameters
file_system_id (str) – An Amazon file system ID starting with ‘fs-‘.
file_system_type (str) – The type of file system used for the input. Valid values: ‘EFS’, ‘FSxLustre’.
directory_path (str) – Absolute or normalized path to the root directory (mount point) in the file system. Reference: https://docs.aws.amazon.com/efs/latest/ug/mounting-fs.html and https://docs.aws.amazon.com/fsx/latest/LustreGuide/mount-fs-auto-mount-onreboot.html
file_system_access_mode (str) – Permissions for read and write. Valid values: ‘ro’ or ‘rw’. Defaults to ‘ro’.
The input configs for DatasetDefinition.
DatasetDefinition supports the data sources like S3 which can be queried via Athena and Redshift. A mechanism has to be created for customers to generate datasets from Athena/Redshift queries and to retrieve the data, using Processing jobs so as to make it available for other downstream processes.
-
class
sagemaker.dataset_definition.inputs.
RedshiftDatasetDefinition
(**kwargs)¶ Bases:
sagemaker.apiutils._base_types.ApiObject
DatasetDefinition for Redshift.
With this input, SQL queries will be executed using Redshift to generate datasets to S3.
- Parameters
cluster_id (str) – The Redshift cluster Identifier.
database (str) – The name of the Redshift database used in Redshift query execution.
db_user (str) – The database user name used in Redshift query execution.
query_string (str) – The SQL query statements to be executed.
cluster_role_arn (str) – The IAM role attached to your Redshift cluster that Amazon SageMaker uses to generate datasets.
output_s3_uri (str) – The location in Amazon S3 where the Redshift query results are stored.
kms_key_id (str) – The AWS Key Management Service (AWS KMS) key that Amazon SageMaker uses to encrypt data from a Redshift execution.
output_format (str) – The data storage format for Redshift query results. Valid options are “PARQUET”, “CSV”
output_compression (str) – The compression used for Redshift query results. Valid options are “None”, “GZIP”, “SNAPPY”, “ZSTD”, “BZIP2”
Init ApiObject.
-
cluster_id
= None¶
-
database
= None¶
-
db_user
= None¶
-
query_string
= None¶
-
cluster_role_arn
= None¶
-
output_s3_uri
= None¶
-
kms_key_id
= None¶
-
output_format
= None¶
-
output_compression
= None¶
-
class
sagemaker.dataset_definition.inputs.
AthenaDatasetDefinition
(**kwargs)¶ Bases:
sagemaker.apiutils._base_types.ApiObject
DatasetDefinition for Athena.
With this input, SQL queries will be executed using Athena to generate datasets to S3.
- Parameters
catalog (str) – The name of the data catalog used in Athena query execution.
database (str) – The name of the database used in the Athena query execution.
query_string (str) – The SQL query statements, to be executed.
output_s3_uri (str) – The location in Amazon S3 where Athena query results are stored.
work_group (str) – The name of the workgroup in which the Athena query is being started.
kms_key_id (str) – The AWS Key Management Service (AWS KMS) key that Amazon SageMaker uses to encrypt data generated from an Athena query execution.
output_format (str) – The data storage format for Athena query results. Valid options are “PARQUET”, “ORC”, “AVRO”, “JSON”, “TEXTFILE”
output_compression (str) – The compression used for Athena query results. Valid options are “GZIP”, “SNAPPY”, “ZLIB”
Init ApiObject.
-
catalog
= None¶
-
database
= None¶
-
query_string
= None¶
-
output_s3_uri
= None¶
-
work_group
= None¶
-
kms_key_id
= None¶
-
output_format
= None¶
-
output_compression
= None¶
-
class
sagemaker.dataset_definition.inputs.
DatasetDefinition
(**kwargs)¶ Bases:
sagemaker.apiutils._base_types.ApiObject
DatasetDefinition input.
- Parameters
data_distribution_type (str) – Whether the generated dataset is FullyReplicated or ShardedByS3Key (default).
input_mode (str) – Whether to use File or Pipe input mode. In File (default) mode, Amazon SageMaker copies the data from the input source onto the local Amazon Elastic Block Store (Amazon EBS) volumes before starting your training algorithm. This is the most commonly used input mode. In Pipe mode, Amazon SageMaker streams input data from the source directly to your algorithm without using the EBS volume.
local_path (str) – The local path where you want Amazon SageMaker to download the Dataset Definition inputs to run a processing job. LocalPath is an absolute path to the input data. This is a required parameter when AppManaged is False (default).
redshift_dataset_definition (
RedshiftDatasetDefinition
) – Configuration for Redshift Dataset Definition input.athena_dataset_definition (
AthenaDatasetDefinition
) – Configuration for Athena Dataset Definition input.
Init ApiObject.
-
data_distribution_type
= 'ShardedByS3Key'¶
-
input_mode
= 'File'¶
-
local_path
= None¶
-
redshift_dataset_definition
= None¶
-
athena_dataset_definition
= None¶
-
class
sagemaker.dataset_definition.inputs.
S3Input
(**kwargs)¶ Bases:
sagemaker.apiutils._base_types.ApiObject
Metadata of data objects stored in S3.
Two options are provided: specifying a S3 prefix or by explicitly listing the files in a manifest file and referencing the manifest file’s S3 path. Note: Strong consistency is not guaranteed if S3Prefix is provided here. S3 list operations are not strongly consistent. Use ManifestFile if strong consistency is required.
- Parameters
s3_uri (str) – the path to a specific S3 object or a S3 prefix
local_path (str) – the path to a local directory. If not provided, skips data download by SageMaker platform.
s3_data_type (str) – Valid options are “ManifestFile” or “S3Prefix”.
s3_input_mode (str) – Valid options are “Pipe” or “File”.
s3_data_distribution_type (str) – Valid options are “FullyReplicated” or “ShardedByS3Key”.
s3_compression_type (str) – Valid options are “None” or “Gzip”.
Init ApiObject.
-
s3_uri
= None¶
-
local_path
= None¶
-
s3_data_type
= 'S3Prefix'¶
-
s3_input_mode
= 'File'¶
-
s3_data_distribution_type
= 'FullyReplicated'¶
-
s3_compression_type
= None¶