FeatureGroup Utilities

Utilities for working with FeatureGroups and FeatureStores.

sagemaker.feature_store.feature_utils.get_session_from_role(region, assume_role=None)

Method used to get the sagemaker.session.Session from a region and/or a role.

Description:

If invoked from a session with a role that lacks permissions, it can temporarily assume another role to perform certain tasks. If assume_role is not specified it will attempt to use the default sagemaker execution role to get the session to use the Feature Store runtime client.

Parameters
  • assume_role (str) – (Optional) role name to be assumed

  • region (str) – region name

Returns

sagemaker.session.Session

Return type

Session

sagemaker.feature_store.feature_utils.get_feature_group_as_dataframe(feature_group_name, athena_bucket, query='SELECT * FROM "sagemaker_featurestore"."#{table}"\n                    WHERE is_deleted=False ', role=None, region=None, session=None, event_time_feature_name=None, latest_ingestion=True, verbose=True, **kwargs)

sagemaker.feature_store.feature_group.FeatureGroup as pandas.DataFrame

Examples

>>> from sagemaker.feature_store.feature_utils import get_feature_group_as_dataframe
>>>
>>> region = "eu-west-1"
>>> fg_data = get_feature_group_as_dataframe(feature_group_name="feature_group",
>>>                                          athena_bucket="s3://bucket/athena_queries",
>>>                                          region=region,
>>>                                          event_time_feature_name="EventTimeId"
>>>                                          )
>>>
>>> type(fg_data)
<class 'pandas.core.frame.DataFrame'>
Description:

Method to run an athena query over a sagemaker.feature_store.feature_group.FeatureGroup in a Feature Store to retrieve its data. It needs the sagemaker.session.Session linked to a role or the region and/or role used to work with Feature Stores (it uses the module sagemaker.feature_store.feature_utils.get_session_from_role to get the session).

Parameters
  • region (str) – region of the target Feature Store

  • feature_group_name (str) – feature store name

  • query (str) – query to run. By default, it will take the latest ingest with data that wasn’t deleted. If latest_ingestion is False it will take all the data in the feature group that wasn’t deleted. It needs to use the keyword “#{table}” to refer to the FeatureGroup name. e.g.: ‘SELECT * FROM “sagemaker_featurestore”.”#{table}”’ It must not end by ‘;’.

  • athena_bucket (str) – Amazon S3 bucket for running the query

  • role (str) – role to be assumed to extract data from feature store. If not specified the default sagemaker execution role will be used.

  • session (str) – sagemaker.session.Session of SageMaker used to work with the feature store. Optional, with role and region parameters it will infer the session.

  • event_time_feature_name (str) – eventTimeId feature. Mandatory only if the latest ingestion is True.

  • latest_ingestion (bool) – if True it will get the data only from the latest ingestion. If False it will take whatever is specified in the query, or if not specify it, it will get all the data that wasn’t deleted.

  • verbose (bool) – if True show messages, if False is silent.

  • **kwargs (object) – key arguments used for the method pandas.read_csv to be able to have a better tuning on data. For more info read: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

Returns

dataset with the data retrieved from feature group

Return type

pandas.DataFrame

sagemaker.feature_store.feature_utils.prepare_fg_from_dataframe_or_file(dataframe_or_path, feature_group_name, role=None, region=None, session=None, record_id='record_id', event_id='data_as_of_date', verbose=False, **kwargs)

Prepares a dataframe to create a sagemaker.feature_store.feature_group.FeatureGroup

Description:

Function to prepare a pandas.DataFrame read from a path to a csv file or pass it directly to create a sagemaker.feature_store.feature_group.FeatureGroup. The path to the file needs proper dtypes, feature names and mandatory features (record_id, event_id). It needs the sagemaker.session.Session linked to a role or the region and/or role used to work with Feature Stores (it uses the module sagemaker.feature_store.feature_utils.get_session_from_role to get the session). If record_id or event_id are not specified it will create ones by default with the names ‘record_id’ and ‘data_as_of_date’.

Parameters
  • feature_group_name (str) – feature group name

  • dataframe_or_path (str, Path, pandas.DataFrame) – pandas.DataFrame or path to the data

  • verbose (bool) – True for displaying messages, False for silent method.

  • record_id (str, 'record_id') – (Optional) Feature identifier of the rows. If specified each value of that feature has to be unique. If not specified or record_id=’record_id’, then it will create a new feature from the index of the pandas.DataFrame.

  • event_id (str) – (Optional) Feature with the time of the creation of data rows. If not specified it will create one with the current time called data_as_of_date

  • role (str) – role used to get the session.

  • region (str) – region used to get the session.

  • session (str) – session of SageMaker used to work with the feature store

  • **kwargs (object) – key arguments used for the method pandas.read_csv to be able to have a better tuning on data. For more info read: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

Returns

FG prepared with all the methods and definitions properly defined

Return type

sagemaker.feature_store.feature_group.FeatureGroup