SparkML Serving

SparkML Model

class sagemaker.sparkml.model.SparkMLModel(model_data, role=None, spark_version=2.2, sagemaker_session=None, **kwargs)

Bases: sagemaker.model.Model

Model data and S3 location holder for MLeap serialized SparkML model. Calling deploy() creates an Endpoint and return a Predictor to performs predictions against an MLeap serialized SparkML model .

Initialize a SparkMLModel.

Parameters:
  • model_data (str) – The S3 location of a SageMaker model data .tar.gz file. For SparkML, this will be the output that has been produced by the Spark job after serializing the Model via MLeap.
  • role (str) – An AWS IAM role (either name or full ARN). The Amazon SageMaker training jobs and APIs that create Amazon SageMaker endpoints use this role to access training data and model artifacts. After the endpoint is created, the inference code might use the IAM role, if it needs to access an AWS resource.
  • spark_version (str) – Spark version you want to use for executing the inference (default: ‘2.2’).
  • sagemaker_session (sagemaker.session.Session) – Session object which manages interactions with Amazon SageMaker APIs and any other AWS services needed. If not specified, the estimator creates one using the default AWS configuration chain. For local mode, please do not pass this variable.
  • **kwargs – Additional parameters passed to the Model constructor.

Tip

You can find additional parameters for initializing this class at Model.

SparkML Predictor

class sagemaker.sparkml.model.SparkMLPredictor(endpoint, sagemaker_session=None)

Bases: sagemaker.predictor.RealTimePredictor

Performs predictions against an MLeap serialized SparkML model.

The implementation of predict() in this RealTimePredictor requires a json as input. The input should follow the json format as documented.

predict() returns a csv output, comma separated if the output is a list.

Initializes a SparkMLPredictor which should be used with SparkMLModel to perform predictions against SparkML models serialized via MLeap. The response is returned in text/csv format which is the default response format for SparkML Serving container.

Parameters:
  • endpoint (str) – The name of the endpoint to perform inference on.
  • sagemaker_session (sagemaker.session.Session) – Session object which manages interactions with Amazon SageMaker APIs and any other AWS services needed. If not specified, the estimator creates one using the default AWS configuration chain.