Feature Store APIs¶
Feature group¶
-
class
sagemaker.feature_store.feature_group.FeatureGroup(name: str = NOTHING, sagemaker_session: sagemaker.session.Session = <class 'sagemaker.session.Session'>, feature_definitions: Sequence[sagemaker.feature_store.feature_definition.FeatureDefinition] = NOTHING)¶ Bases:
objectFeatureGroup definition.
This class instantiates a FeatureGroup object that comprises of a name for the FeatureGroup, session instance, and a list of feature definition objects i.e., FeatureDefinition.
-
feature_definitions¶ list of FeatureDefinitions.
- Type
Sequence[FeatureDefinition]
Method generated by attrs for class FeatureGroup.
-
create(s3_uri: Union[str, bool], record_identifier_name: str, event_time_feature_name: str, role_arn: str, online_store_kms_key_id: str = None, enable_online_store: bool = False, offline_store_kms_key_id: str = None, disable_glue_table_creation: bool = False, data_catalog_config: sagemaker.feature_store.inputs.DataCatalogConfig = None, description: str = None, tags: List[Dict[str, str]] = None) → Dict[str, Any]¶ Create a SageMaker FeatureStore FeatureGroup.
- Parameters
s3_uri (Union[str, bool]) – S3 URI of the offline store, set to
Falseto disable offline store.record_identifier_name (str) – name of the record identifier feature.
event_time_feature_name (str) – name of the event time feature.
role_arn (str) – ARN of the role used to call CreateFeatureGroup.
online_store_kms_key_id (str) – KMS key id for online store.
enable_online_store (bool) – whether to enable online store or not.
offline_store_kms_key_id (str) – KMS key id for offline store. If a KMS encryption key is not specified, SageMaker encrypts all data at rest using the default AWS KMS key. By defining your bucket-level key for SSE, you can reduce the cost of AWS KMS requests. For more information, see Bucket Key in the Amazon S3 User Guide.
disable_glue_table_creation (bool) – whether to turn off Glue table creation no not.
data_catalog_config (DataCatalogConfig) – configuration for Metadata store.
description (str) – description of the FeatureGroup.
tags (List[Dict[str, str]]) – list of tags for labeling a FeatureGroup.
- Returns
Response dict from service.
-
delete()¶ Delete a FeatureGroup.
-
describe(next_token: str = None) → Dict[str, Any]¶ Describe a FeatureGroup.
- Parameters
next_token (str) – next_token to get next page of features.
- Returns
Response dict from the service.
-
update(feature_additions: Sequence[sagemaker.feature_store.feature_definition.FeatureDefinition]) → Dict[str, Any]¶ Update a FeatureGroup and add new features from the given feature definitions.
-
update_feature_metadata(feature_name: str, description: str = None, parameter_additions: Sequence[sagemaker.feature_store.inputs.FeatureParameter] = None, parameter_removals: Sequence[str] = None) → Dict[str, Any]¶ Update a feature metadata and add/remove metadata.
- Parameters
- Returns
Response dict from service.
-
describe_feature_metadata(feature_name: str) → Dict[str, Any]¶ Describe feature metadata by feature name.
- Parameters
feature_name (str) – name of the feature.
- Returns
Response dict from service.
-
put_record(record: Sequence[sagemaker.feature_store.inputs.FeatureValue])¶ Put a single record in the FeatureGroup.
- Parameters
record (Sequence[FeatureValue]) – a list contains feature values.
-
ingest(data_frame: pandas.core.frame.DataFrame, max_workers: int = 1, max_processes: int = 1, wait: bool = True, timeout: Union[int, float] = None, profile_name: str = None) → sagemaker.feature_store.feature_group.IngestionManagerPandas¶ Ingest the content of a pandas DataFrame to feature store.
max_workernumber of thread will be created to work on different partitions of thedata_framein parallel.max_processesnumber of processes will be created to work on different partitions of thedata_framein parallel, each withmax_workerthreads.The ingest function will attempt to ingest all records in the data frame. If
waitis True, then an exception is thrown after all records have been processed. Ifwaitis False, then a later call to the returned instance IngestionManagerPandas’wait()function will throw an exception.Zero based indices of rows that failed to be ingested can be found in the exception. They can also be found from the IngestionManagerPandas’
failed_rowsfunction after the exception is thrown.profile_name argument is an optional one. It will use the default credential if None is passed. This profile_name is used in the sagemaker_featurestore_runtime client only. See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html for more about the default credential.
- Parameters
data_frame (DataFrame) – data_frame to be ingested to feature store.
max_workers (int) – number of threads to be created.
max_processes (int) – number of processes to be created. Each process spawns
max_workernumber of threads.wait (bool) – whether to wait for the ingestion to finish or not.
timeout (Union[int, float]) –
concurrent.futures.TimeoutErrorwill be raised if timeout is reached.profile_name (str) – the profile credential should be used for
PutRecord(default: None).
- Returns
An instance of IngestionManagerPandas.
-
athena_query() → sagemaker.feature_store.feature_group.AthenaQuery¶ Create an AthenaQuery instance.
- Returns
An instance of AthenaQuery initialized with data catalog configurations.
-
as_hive_ddl(database: str = 'sagemaker_featurestore', table_name: str = None) → str¶ Generate Hive DDL commands to define or change structure of tables or databases in Hive.
Schema of the table is generated based on the feature definitions. Columns are named after feature name and data-type are inferred based on feature type. Integral feature type is mapped to INT data-type. Fractional feature type is mapped to FLOAT data-type. String feature type is mapped to STRING data-type.
- Parameters
database – name of the database. If not set “sagemaker_featurestore” will be used.
table_name – name of the table. If not set the name of this feature group will be used.
- Returns
Generated create table DDL string.
-
-
class
sagemaker.feature_store.feature_group.AthenaQuery(catalog: str, database: str, table_name: str, sagemaker_session: sagemaker.session.Session)¶ Bases:
objectClass to manage querying of feature store data with AWS Athena.
This class instantiates a AthenaQuery object that is used to retrieve data from feature store via standard SQL queries.
Method generated by attrs for class AthenaQuery.
-
run(query_string: str, output_location: str, kms_key: str = None, workgroup: str = None) → str¶ Execute a SQL query given a query string, output location and kms key.
This method executes the SQL query using Athena and outputs the results to output_location and returns the execution id of the query.
- Parameters
query_string – SQL query string.
output_location – S3 URI of the query result.
kms_key – KMS key id. If set, will be used to encrypt the query result file.
workgroup (str) – The name of the workgroup in which the query is being started.
- Returns
Execution id of the query.
-
wait()¶ Wait for the current query to finish.
-
get_query_execution() → Dict[str, Any]¶ Get execution status of the current query.
- Returns
Response dict from Athena.
-
as_dataframe() → pandas.core.frame.DataFrame¶ Download the result of the current query and load it into a DataFrame.
- Returns
A pandas DataFrame contains the query result.
-
-
class
sagemaker.feature_store.feature_group.IngestionManagerPandas(feature_group_name: str, sagemaker_fs_runtime_client_config: botocore.config.Config, max_workers: int = 1, max_processes: int = 1, profile_name: str = None, async_result: multiprocessing.pool.ApplyResult = None, processing_pool: pathos.multiprocessing.ProcessPool = None, failed_indices: List[int] = NOTHING)¶ Bases:
objectClass to manage the multi-threaded data ingestion process.
This class will manage the data ingestion process which is multi-threaded.
-
data_frame¶ pandas DataFrame to be ingested to the given feature group.
- Type
DataFrame
Method generated by attrs for class IngestionManagerPandas.
-
property
failed_rows¶ Get rows that failed to ingest.
- Returns
List of row indices that failed to be ingested.
-
wait(timeout=None)¶ Wait for the ingestion process to finish.
-
run(data_frame: pandas.core.frame.DataFrame, wait=True, timeout=None)¶ Start the ingestion process.
-
Feature definition¶
-
class
sagemaker.feature_store.feature_definition.FeatureDefinition(feature_name: str, feature_type: sagemaker.feature_store.feature_definition.FeatureTypeEnum)¶ Bases:
sagemaker.feature_store.inputs.ConfigFeature definition.
This instantiates a Feature Definition object where FeatureDefinition is a subclass of Config.
-
feature_type¶ The type of the feature
- Type
Method generated by attrs for class FeatureDefinition.
-
-
class
sagemaker.feature_store.feature_definition.FractionalFeatureDefinition(feature_name: str)¶ Bases:
sagemaker.feature_store.feature_definition.FeatureDefinitionFractional feature definition.
This class instantiates a FractionalFeatureDefinition object, a subclass of FeatureDefinition where the data type of the feature being defined is a Fractional.
-
feature_type¶ A FeatureTypeEnum.FRACTIONAL type
- Type
Construct an instance of FractionalFeatureDefinition.
- Parameters
feature_name (str) – the name of the feature.
-
-
class
sagemaker.feature_store.feature_definition.IntegralFeatureDefinition(feature_name: str)¶ Bases:
sagemaker.feature_store.feature_definition.FeatureDefinitionFractional feature definition.
This class instantiates a IntegralFeatureDefinition object, a subclass of FeatureDefinition where the data type of the feature being defined is a Integral.
-
feature_type¶ a FeatureTypeEnum.INTEGRAL type.
- Type
Construct an instance of IntegralFeatureDefinition.
- Parameters
feature_name (str) – the name of the feature.
-
-
class
sagemaker.feature_store.feature_definition.StringFeatureDefinition(feature_name: str)¶ Bases:
sagemaker.feature_store.feature_definition.FeatureDefinitionFractional feature definition.
This class instantiates a StringFeatureDefinition object, a subclass of FeatureDefinition where the data type of the feature being defined is a String.
-
feature_type¶ a FeatureTypeEnum.STRING type.
- Type
Construct an instance of StringFeatureDefinition.
- Parameters
feature_name (str) – the name of the feature.
-
Inputs¶
-
class
sagemaker.feature_store.inputs.Config¶ Bases:
abc.ABCBase config object for FeatureStore.
Configs must implement the to_dict method.
-
class
sagemaker.feature_store.inputs.DataCatalogConfig(table_name: str = NOTHING, catalog: str = NOTHING, database: str = NOTHING)¶ Bases:
sagemaker.feature_store.inputs.ConfigDataCatalogConfig for FeatureStore.
Method generated by attrs for class DataCatalogConfig.
-
class
sagemaker.feature_store.inputs.OfflineStoreConfig(s3_storage_config: sagemaker.feature_store.inputs.S3StorageConfig, disable_glue_table_creation: bool = False, data_catalog_config: sagemaker.feature_store.inputs.DataCatalogConfig = None)¶ Bases:
sagemaker.feature_store.inputs.ConfigOfflineStoreConfig for FeatureStore.
-
s3_storage_config¶ configuration of S3 storage.
- Type
-
data_catalog_config¶ configuration of the data catalog.
- Type
Method generated by attrs for class OfflineStoreConfig.
-
-
class
sagemaker.feature_store.inputs.OnlineStoreConfig(enable_online_store: bool = True, online_store_security_config: sagemaker.feature_store.inputs.OnlineStoreSecurityConfig = None)¶ Bases:
sagemaker.feature_store.inputs.ConfigOnlineStoreConfig for FeatureStore.
-
online_store_security_config¶ configuration of security setting.
Method generated by attrs for class OnlineStoreConfig.
-
-
class
sagemaker.feature_store.inputs.OnlineStoreSecurityConfig(kms_key_id: str = NOTHING)¶ Bases:
sagemaker.feature_store.inputs.ConfigOnlineStoreSecurityConfig for FeatureStore.
Method generated by attrs for class OnlineStoreSecurityConfig.
-
class
sagemaker.feature_store.inputs.S3StorageConfig(s3_uri: str, kms_key_id: str = None)¶ Bases:
sagemaker.feature_store.inputs.ConfigS3StorageConfig for FeatureStore.
Method generated by attrs for class S3StorageConfig.
-
class
sagemaker.feature_store.inputs.FeatureValue(feature_name: str = None, value_as_string: str = None)¶ Bases:
sagemaker.feature_store.inputs.ConfigFeatureValue for FeatureStore.
Method generated by attrs for class FeatureValue.