Feature Store APIs

Feature group

class sagemaker.feature_store.feature_group.FeatureGroup(name: str = NOTHING, sagemaker_session: sagemaker.session.Session = <class 'sagemaker.session.Session'>, feature_definitions: Sequence[sagemaker.feature_store.feature_definition.FeatureDefinition] = NOTHING)

Bases: object

FeatureGroup definition.

This class instantiates a FeatureGroup object that comprises of a name for the FeatureGroup, session instance, and a list of feature definition objects i.e., FeatureDefinition.

name

name of the FeatureGroup instance.

Type

str

sagemaker_session

session instance to perform boto calls.

Type

Session

feature_definitions

list of FeatureDefinitions.

Type

Sequence[FeatureDefinition]

Method generated by attrs for class FeatureGroup.

create(s3_uri: Union[str, bool], record_identifier_name: str, event_time_feature_name: str, role_arn: str, online_store_kms_key_id: str = None, enable_online_store: bool = False, offline_store_kms_key_id: str = None, disable_glue_table_creation: bool = False, data_catalog_config: sagemaker.feature_store.inputs.DataCatalogConfig = None, description: str = None, tags: List[Dict[str, str]] = None) → Dict[str, Any]

Create a SageMaker FeatureStore FeatureGroup.

Parameters
  • s3_uri (Union[str, bool]) – S3 URI of the offline store, set to False to disable offline store.

  • record_identifier_name (str) – name of the record identifier feature.

  • event_time_feature_name (str) – name of the event time feature.

  • role_arn (str) – ARN of the role used to call CreateFeatureGroup.

  • online_store_kms_key_id (str) – KMS key id for online store.

  • enable_online_store (bool) – whether to enable online store or not.

  • offline_store_kms_key_id (str) – KMS key id for offline store. If a KMS encryption key is not specified, SageMaker encrypts all data at rest using the default AWS KMS key. By defining your bucket-level key for SSE, you can reduce the cost of AWS KMS requests. For more information, see Bucket Key in the Amazon S3 User Guide.

  • disable_glue_table_creation (bool) – whether to turn off Glue table creation no not.

  • data_catalog_config (DataCatalogConfig) – configuration for Metadata store.

  • description (str) – description of the FeatureGroup.

  • tags (List[Dict[str, str]]) – list of tags for labeling a FeatureGroup.

Returns

Response dict from service.

delete()

Delete a FeatureGroup.

describe(next_token: str = None) → Dict[str, Any]

Describe a FeatureGroup.

Parameters

next_token (str) – next_token to get next page of features.

Returns

Response dict from the service.

put_record(record: Sequence[sagemaker.feature_store.inputs.FeatureValue])

Put a single record in the FeatureGroup.

Parameters

record (Sequence[FeatureValue]) – a list contains feature values.

ingest(data_frame: pandas.core.frame.DataFrame, max_workers: int = 1, max_processes: int = 1, wait: bool = True, timeout: Union[int, float] = None)sagemaker.feature_store.feature_group.IngestionManagerPandas

Ingest the content of a pandas DataFrame to feature store.

max_worker number of thread will be created to work on different partitions of the data_frame in parallel.

max_processes number of processes will be created to work on different partitions of the data_frame in parallel, each with max_worker threads.

The ingest function will attempt to ingest all records in the data frame. If wait is True, then an exception is thrown after all records have been processed. If wait is False, then a later call to the returned instance IngestionManagerPandas’ wait() function will throw an exception.

Zero based indices of rows that failed to be ingested can be found in the exception. They can also be found from the IngestionManagerPandas’ failed_rows function after the exception is thrown.

Parameters
  • data_frame (DataFrame) – data_frame to be ingested to feature store.

  • max_workers (int) – number of threads to be created.

  • max_processes (int) – number of processes to be created. Each process spawns max_worker number of threads.

  • wait (bool) – whether to wait for the ingestion to finish or not.

  • timeout (Union[int, float]) – concurrent.futures.TimeoutError will be raised if timeout is reached.

Returns

An instance of IngestionManagerPandas.

athena_query()sagemaker.feature_store.feature_group.AthenaQuery

Create an AthenaQuery instance.

Returns

An instance of AthenaQuery initialized with data catalog configurations.

as_hive_ddl(database: str = 'sagemaker_featurestore', table_name: str = None)str

Generate Hive DDL commands to define or change structure of tables or databases in Hive.

Schema of the table is generated based on the feature definitions. Columns are named after feature name and data-type are inferred based on feature type. Integral feature type is mapped to INT data-type. Fractional feature type is mapped to FLOAT data-type. String feature type is mapped to STRING data-type.

Parameters
  • database – name of the database. If not set “sagemaker_featurestore” will be used.

  • table_name – name of the table. If not set the name of this feature group will be used.

Returns

Generated create table DDL string.

class sagemaker.feature_store.feature_group.AthenaQuery(catalog: str, database: str, table_name: str, sagemaker_session: sagemaker.session.Session)

Bases: object

Class to manage querying of feature store data with AWS Athena.

This class instantiates a AthenaQuery object that is used to retrieve data from feature store via standard SQL queries.

catalog

name of the data catalog.

Type

str

database

name of the database.

Type

str

table_name

name of the table.

Type

str

sagemaker_session

instance of the Session class to perform boto calls.

Type

Session

Method generated by attrs for class AthenaQuery.

run(query_string: str, output_location: str, kms_key: str = None)str

Execute a SQL query given a query string, output location and kms key.

This method executes the SQL query using Athena and outputs the results to output_location and returns the execution id of the query.

Parameters
  • query_string – SQL query string.

  • output_location – S3 URI of the query result.

  • kms_key – KMS key id. If set, will be used to encrypt the query result file.

Returns

Execution id of the query.

wait()

Wait for the current query to finish.

get_query_execution() → Dict[str, Any]

Get execution status of the current query.

Returns

Response dict from Athena.

as_dataframe() → pandas.core.frame.DataFrame

Download the result of the current query and load it into a DataFrame.

Returns

A pandas DataFrame contains the query result.

class sagemaker.feature_store.feature_group.IngestionManagerPandas(feature_group_name: str, sagemaker_fs_runtime_client_config: botocore.config.Config, max_workers: int = 1, max_processes: int = 1, async_result: multiprocessing.pool.ApplyResult = None, processing_pool: pathos.multiprocessing.ProcessPool = None, failed_indices: List[int] = NOTHING)

Bases: object

Class to manage the multi-threaded data ingestion process.

This class will manage the data ingestion process which is multi-threaded.

feature_group_name

name of the Feature Group.

Type

str

sagemaker_fs_runtime_client_config

instance of the Config class for boto calls.

Type

Config

data_frame

pandas DataFrame to be ingested to the given feature group.

Type

DataFrame

max_workers

number of threads to create.

Type

int

max_processes

number of processes to create. Each process spawns max_workers threads.

Type

int

Method generated by attrs for class IngestionManagerPandas.

property failed_rows

Get rows that failed to ingest.

Returns

List of row indices that failed to be ingested.

wait(timeout=None)

Wait for the ingestion process to finish.

Parameters

timeout (Union[int, float]) – concurrent.futures.TimeoutError will be raised if timeout is reached.

run(data_frame: pandas.core.frame.DataFrame, wait=True, timeout=None)

Start the ingestion process.

Parameters
  • data_frame (DataFrame) – source DataFrame to be ingested.

  • wait (bool) – whether to wait for the ingestion to finish or not.

  • timeout (Union[int, float]) – concurrent.futures.TimeoutError will be raised if timeout is reached.

Feature definition

class sagemaker.feature_store.feature_definition.FeatureDefinition(feature_name: str, feature_type: sagemaker.feature_store.feature_definition.FeatureTypeEnum)

Bases: sagemaker.feature_store.inputs.Config

Feature definition.

This instantiates a Feature Definition object where FeatureDefinition is a subclass of Config.

feature_name

The name of the feature

Type

str

feature_type

The type of the feature

Type

FeatureTypeEnum

Method generated by attrs for class FeatureDefinition.

to_dict() → Dict[str, Any]

Construct a dictionary based on each attribute.

class sagemaker.feature_store.feature_definition.FractionalFeatureDefinition(feature_name: str)

Bases: sagemaker.feature_store.feature_definition.FeatureDefinition

Fractional feature definition.

This class instantiates a FractionalFeatureDefinition object, a subclass of FeatureDefinition where the data type of the feature being defined is a Fractional.

feature_name

The name of the feature

Type

str

feature_type

A FeatureTypeEnum.FRACTIONAL type

Type

FeatureTypeEnum

Construct an instance of FractionalFeatureDefinition.

Parameters

feature_name (str) – the name of the feature.

class sagemaker.feature_store.feature_definition.IntegralFeatureDefinition(feature_name: str)

Bases: sagemaker.feature_store.feature_definition.FeatureDefinition

Fractional feature definition.

This class instantiates a IntegralFeatureDefinition object, a subclass of FeatureDefinition where the data type of the feature being defined is a Integral.

feature_name

the name of the feature.

Type

str

feature_type

a FeatureTypeEnum.INTEGRAL type.

Type

FeatureTypeEnum

Construct an instance of IntegralFeatureDefinition.

Parameters

feature_name (str) – the name of the feature.

class sagemaker.feature_store.feature_definition.StringFeatureDefinition(feature_name: str)

Bases: sagemaker.feature_store.feature_definition.FeatureDefinition

Fractional feature definition.

This class instantiates a StringFeatureDefinition object, a subclass of FeatureDefinition where the data type of the feature being defined is a String.

feature_name

the name of the feature.

Type

str

feature_type

a FeatureTypeEnum.STRING type.

Type

FeatureTypeEnum

Construct an instance of StringFeatureDefinition.

Parameters

feature_name (str) – the name of the feature.

class sagemaker.feature_store.feature_definition.FeatureTypeEnum(value)

Bases: enum.Enum

Enum of feature types.

The data type of a feature can be Fractional, Integral or String.

Inputs

class sagemaker.feature_store.inputs.Config

Bases: abc.ABC

Base config object for FeatureStore.

Configs must implement the to_dict method.

abstract to_dict() → Dict[str, Any]

Get the dictionary from attributes.

Returns

dict contains the attributes.

classmethod construct_dict(**kwargs) → Dict[str, Any]

Construct the dictionary based on the args.

Parameters

kwargs – args to be used to construct the dict.

Returns

dict represents the given kwargs.

class sagemaker.feature_store.inputs.DataCatalogConfig(table_name: str = NOTHING, catalog: str = NOTHING, database: str = NOTHING)

Bases: sagemaker.feature_store.inputs.Config

DataCatalogConfig for FeatureStore.

table_name

name of the table.

Type

str

catalog

name of the catalog.

Type

str

database

name of the database.

Type

str

Method generated by attrs for class DataCatalogConfig.

to_dict() → Dict[str, Any]

Construct a dictionary based on the attributes provided.

Returns

dict represents the attributes.

class sagemaker.feature_store.inputs.OfflineStoreConfig(s3_storage_config: sagemaker.feature_store.inputs.S3StorageConfig, disable_glue_table_creation: bool = False, data_catalog_config: sagemaker.feature_store.inputs.DataCatalogConfig = None)

Bases: sagemaker.feature_store.inputs.Config

OfflineStoreConfig for FeatureStore.

s3_storage_config

configuration of S3 storage.

Type

S3StorageConfig

disable_glue_table_creation

whether to disable the Glue table creation.

Type

bool

data_catalog_config

configuration of the data catalog.

Type

DataCatalogConfig

Method generated by attrs for class OfflineStoreConfig.

to_dict() → Dict[str, Any]

Construct a dictionary based on the attributes.

Returns

dict represents the attributes.

class sagemaker.feature_store.inputs.OnlineStoreConfig(enable_online_store: bool = True, online_store_security_config: sagemaker.feature_store.inputs.OnlineStoreSecurityConfig = None)

Bases: sagemaker.feature_store.inputs.Config

OnlineStoreConfig for FeatureStore.

enable_online_store

whether to enable the online store.

Type

bool

online_store_security_config

configuration of security setting.

Type

OnlineStoreSecurityConfig

Method generated by attrs for class OnlineStoreConfig.

to_dict() → Dict[str, Any]

Construct a dictionary based on the attributes.

Returns

dict represents the attributes.

class sagemaker.feature_store.inputs.OnlineStoreSecurityConfig(kms_key_id: str = NOTHING)

Bases: sagemaker.feature_store.inputs.Config

OnlineStoreSecurityConfig for FeatureStore.

kms_key_id

KMS key id.

Type

str

Method generated by attrs for class OnlineStoreSecurityConfig.

to_dict() → Dict[str, Any]

Construct a dictionary based on the attributes.

class sagemaker.feature_store.inputs.S3StorageConfig(s3_uri: str, kms_key_id: str = None)

Bases: sagemaker.feature_store.inputs.Config

S3StorageConfig for FeatureStore.

s3_uri

S3 URI.

Type

str

kms_key_id

KMS key id.

Type

str

Method generated by attrs for class S3StorageConfig.

to_dict() → Dict[str, Any]

Construct a dictionary based on the attributes provided.

Returns

dict represents the attributes.

class sagemaker.feature_store.inputs.FeatureValue(feature_name: str = None, value_as_string: str = None)

Bases: sagemaker.feature_store.inputs.Config

FeatureValue for FeatureStore.

feature_name

name of the Feature.

Type

str

value_as_string

value of the Feature in string form.

Type

str

Method generated by attrs for class FeatureValue.

to_dict() → Dict[str, Any]

Construct a dictionary based on the attributes provided.

Returns

dict represents the attributes.