The Amazon SageMaker Factorization Machines algorithm.
FactorizationMachines(role, train_instance_count, train_instance_type, num_factors, predictor_type, epochs=None, clip_gradient=None, eps=None, rescale_grad=None, bias_lr=None, linear_lr=None, factors_lr=None, bias_wd=None, linear_wd=None, factors_wd=None, bias_init_method=None, bias_init_scale=None, bias_init_sigma=None, bias_init_value=None, linear_init_method=None, linear_init_scale=None, linear_init_sigma=None, linear_init_value=None, factors_init_method=None, factors_init_scale=None, factors_init_sigma=None, factors_init_value=None, **kwargs)¶
Factorization Machines is
Estimatorfor general-purpose supervised learning.
Amazon SageMaker Factorization Machines is a general-purpose supervised learning algorithm that you can use for both classification and regression tasks. It is an extension of a linear model that is designed to parsimoniously capture interactions between features within high dimensional sparse datasets.
This Estimator may be fit via calls to
fit(). It requires Amazon
Recordprotobuf serialized data to be stored in S3. There is an utility
record_set()that can be used to upload data to S3 and creates
RecordSetto be passed to the fit call.
To learn more about the Amazon protobuf Record class and how to prepare bulk data in this format, please consult AWS technical documentation: https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-training.html
After this Estimator is fit, model data is stored in S3. The model may be deployed to an Amazon SageMaker Endpoint by invoking
deploy(). As well as deploying an Endpoint, deploy returns a
FactorizationMachinesPredictorobject that can be used for inference calls using the trained model hosted in the SageMaker Endpoint.
FactorizationMachines Estimators can be configured by setting hyperparameters. The available hyperparameters for FactorizationMachines are documented below.
For further information on the AWS FactorizationMachines algorithm, please consult AWS technical documentation: https://docs.aws.amazon.com/sagemaker/latest/dg/fact-machines.html
- role (str) – An AWS IAM role (either name or full ARN). The Amazon SageMaker training jobs and APIs that create Amazon SageMaker endpoints use this role to access training data and model artifacts. After the endpoint is created, the inference code might use the IAM role, if accessing AWS resource.
- train_instance_count (int) – Number of Amazon EC2 instances to use for training.
- train_instance_type (str) – Type of EC2 instance to use for training, for example, ‘ml.c4.xlarge’.
- num_factors (int) – Dimensionality of factorization.
- predictor_type (str) – Type of predictor ‘binary_classifier’ or ‘regressor’.
- epochs (int) – Number of training epochs to run.
- clip_gradient (float) – Optimizer parameter. Clip the gradient by projecting onto the box [-clip_gradient, +clip_gradient]
- eps (float) – Optimizer parameter. Small value to avoid division by 0.
- rescale_grad (float) – Optimizer parameter. If set, multiplies the gradient with rescale_grad before updating. Often choose to be 1.0/batch_size.
- bias_lr (float) – Non-negative learning rate for the bias term.
- linear_lr (float) – Non-negative learning rate for linear terms.
- factors_lr (float) – Noon-negative learning rate for factorization terms.
- bias_wd (float) – Non-negative weight decay for the bias term.
- linear_wd (float) – Non-negative weight decay for linear terms.
- factors_wd (float) – Non-negative weight decay for factorization terms.
- bias_init_method (string) – Initialization method for the bias term: ‘normal’, ‘uniform’ or ‘constant’.
- bias_init_scale (float) – Non-negative range for initialization of the bias term that takes effect when bias_init_method parameter is ‘uniform’
- bias_init_sigma (float) – Non-negative standard deviation for initialization of the bias term that takes effect when bias_init_method parameter is ‘normal’.
- bias_init_value (float) – Initial value of the bias term that takes effect when bias_init_method parameter is ‘constant’.
- linear_init_method (string) – Initialization method for linear term: ‘normal’, ‘uniform’ or ‘constant’.
- linear_init_scale (float) – Non-negative range for initialization of linear terms that takes effect when linear_init_method parameter is ‘uniform’.
- linear_init_sigma (float) – Non-negative standard deviation for initialization of linear terms that takes effect when linear_init_method parameter is ‘normal’.
- linear_init_value (float) – Initial value of linear terms that takes effect when linear_init_method parameter is ‘constant’.
- factors_init_method (string) – Initialization method for factorization term: ‘normal’, ‘uniform’ or ‘constant’.
- factors_init_scale (float) – Non-negative range for initialization of factorization terms that takes effect when factors_init_method parameter is ‘uniform’.
- factors_init_sigma (float) – Non-negative standard deviation for initialization of factorization terms that takes effect when factors_init_method parameter is ‘normal’.
- factors_init_value (float) – Initial value of factorization terms that takes effect when factors_init_method parameter is ‘constant’.
- **kwargs – base class keyword argument values.
Attach to an existing training job.
Create an Estimator bound to an existing training job, each subclass is responsible to implement
_prepare_init_params_from_job_description()as this method delegates the actual conversion of a training job description to the arguments that the class constructor expects. After attaching, if the training job has a Complete status, it can be
deploy()ed to create a SageMaker Endpoint and return a
If the training job is in progress, attach will block and display log messages from the training job, until the training job completes.
- training_job_name (str) – The name of the training job to attach to.
- sagemaker_session (sagemaker.session.Session) – Session object which manages interactions with Amazon SageMaker APIs and any other AWS services needed. If not specified, the estimator creates one using the default AWS configuration chain.
>>> my_estimator.fit(wait=False) >>> training_job_name = my_estimator.latest_training_job.name Later on: >>> attached_estimator = Estimator.attach(training_job_name) >>> attached_estimator.deploy()
Returns: Instance of the calling
EstimatorClass with the attached training job.
Delete an Amazon SageMaker
ValueError– If the endpoint does not exist.
deploy(initial_instance_count, instance_type, endpoint_name=None, **kwargs)¶
Deploy the trained model to an Amazon SageMaker endpoint and return a
- initial_instance_count (int) – Minimum number of EC2 instances to deploy to an endpoint for prediction.
- instance_type (str) – Type of EC2 instance to deploy to an endpoint for prediction, for example, ‘ml.c4.xlarge’.
- endpoint_name (str) – Name to use for creating an Amazon SageMaker endpoint. If not specified, the name of the training job is used.
- **kwargs – Passed to invocation of
create_model(). Implementations may customize
**kwargsto customize model creation during deploy. For more, see the implementation docs.
- A predictor that provides a
which can be used to send requests to the Amazon SageMaker endpoint and obtain inferences.
fit(records, mini_batch_size=None, wait=True, logs=True, job_name=None)¶
Fit this Estimator on serialized Record objects, stored in S3.
recordsshould be an instance of
RecordSet. This defines a collection of S3 data files to train this
Training data is expected to be encoded as dense or sparse vectors in the “values” feature on each Record. If the data is labeled, the label is expected to be encoded as a list of scalas in the “values” feature of the Record label.
More information on the Amazon Record format is available at: https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-training.html
record_set()to construct a
- records (
RecordSet) – The records to train this
- mini_batch_size (int or None) – The size of each mini-batch to use when training. If
None, a default value will be used.
- wait (bool) – Whether the call should wait until the job completes (default: True).
- logs (bool) – Whether to show the logs produced by the job. Only meaningful when wait is True (default: True).
- job_name (str) – Training job name. If not specified, the estimator generates a default job name, based on the training image name and current timestamp.
- records (
Returns VpcConfig dict either from this Estimator’s subnets and security groups, or else validate and return an optional override value.
Return the hyperparameters as a dictionary to use for training.
fit()method, which trains the model, calls this method to find the hyperparameters.
Returns: The hyperparameters. Return type: dict[str, str]
record_set(train, labels=None, channel='train')¶
RecordSetfrom a numpy
ndarraymatrix and label vector.
For the 2D
train, each row is converted to a
Recordobject. The vector is stored in the “values” entry of the
featuresproperty of each Record. If
labelsis not None, each corresponding label is assigned to the “values” entry of the
labelsproperty of each Record.
The collection of
Recordobjects are protobuf serialized and uploaded to new S3 locations. A manifest file is generated containing the list of objects created and also stored in S3.
The number of S3 objects created is controlled by the
train_instance_countproperty on this Estimator. One S3 object is created per training instance.
- train (numpy.ndarray) – A 2D numpy array of training data.
- labels (numpy.ndarray) – A 1D numpy array of labels. Its length must be equal to the
number of rows in
- channel (str) – The SageMaker TrainingJob channel this RecordSet should be assigned to.
A RecordSet referencing the encoded, uploading training and label data.
Return the Docker image to use for training.
fit()method, which does the model training, calls this method to find the image to use for model training.
Returns: The URI of the Docker image. Return type: str
TrainingJobAnalyticsobject for the current training job.
transformer(instance_count, instance_type, strategy=None, assemble_with=None, output_path=None, output_kms_key=None, accept=None, env=None, max_concurrent_transforms=None, max_payload=None, tags=None, role=None, volume_kms_key=None)¶
Transformerthat uses a SageMaker Model based on the training job. It reuses the SageMaker Session and base job name used by the Estimator.
- instance_count (int) – Number of EC2 instances to use.
- instance_type (str) – Type of EC2 instance to use, for example, ‘ml.c4.xlarge’.
- strategy (str) – The strategy used to decide how to batch records in a single request (default: None). Valid values: ‘MULTI_RECORD’ and ‘SINGLE_RECORD’.
- assemble_with (str) – How the output is assembled (default: None). Valid values: ‘Line’ or ‘None’.
- output_path (str) – S3 location for saving the transform result. If not specified, results are stored to a default bucket.
- output_kms_key (str) – Optional. KMS key ID for encrypting the transform output (default: None).
- accept (str) – The content type accepted by the endpoint deployed during the transform job.
- env (dict) – Environment variables to be set for use during the transform job (default: None).
- max_concurrent_transforms (int) – The maximum number of HTTP requests to be made to each individual transform container at one time.
- max_payload (int) – Maximum size of the payload in a single HTTP request to the container in MB.
- tags (list[dict]) – List of tags for labeling a transform job. If none specified, then the tags used for the training job are used for the transform job.
- role (str) – The
ExecutionRoleArnIAM Role ARN for the
Model, which is also used during transform jobs. If not specified, the role from the Estimator will be used.
- volume_kms_key (str) – Optional. KMS key ID for encrypting the volume attached to the ML compute instance (default: None).
FactorizationMachinesModelreferencing the latest s3 model data produced by this Estimator.
Parameters: vpc_config_override (dict[str, list[str]]) – Optional override for VpcConfig set on the model. Default: use subnets and security groups from this Estimator. * ‘Subnets’ (list[str]): List of subnet ids. * ‘SecurityGroupIds’ (list[str]): List of security group ids.
FactorizationMachinesModel(model_data, role, sagemaker_session=None, **kwargs)¶
Performs binary-classification or regression prediction from input vectors.
The implementation of
predict()in this RealTimePredictor requires a numpy
ndarrayas input. The array should contain the same number of columns as the feature-dimension of the data used to fit the model this Predictor performs inference on.
predict()returns a list of
Recordobjects, one for each row in the input
ndarray. The prediction is stored in the
"score"key of the
Record.labelfield. Please refer to the formats details described: https://docs.aws.amazon.com/sagemaker/latest/dg/fm-in-formats.html