FeatureGroup Utilities¶
Utilities for working with FeatureGroups and FeatureStores.
- sagemaker.feature_store.feature_utils.get_session_from_role(region, assume_role=None)¶
Method used to get the
sagemaker.session.Session
from a region and/or a role.- Description:
If invoked from a session with a role that lacks permissions, it can temporarily assume another role to perform certain tasks. If assume_role is not specified it will attempt to use the default sagemaker execution role to get the session to use the Feature Store runtime client.
- Parameters
- Returns
- Return type
- sagemaker.feature_store.feature_utils.get_feature_group_as_dataframe(feature_group_name, athena_bucket, query='SELECT * FROM "sagemaker_featurestore"."#{table}"\n WHERE is_deleted=False ', role=None, region=None, session=None, event_time_feature_name=None, latest_ingestion=True, verbose=True, **kwargs)¶
sagemaker.feature_store.feature_group.FeatureGroup
aspandas.DataFrame
Examples
>>> from sagemaker.feature_store.feature_utils import get_feature_group_as_dataframe >>> >>> region = "eu-west-1" >>> fg_data = get_feature_group_as_dataframe(feature_group_name="feature_group", >>> athena_bucket="s3://bucket/athena_queries", >>> region=region, >>> event_time_feature_name="EventTimeId" >>> ) >>> >>> type(fg_data) <class 'pandas.core.frame.DataFrame'>
- Description:
Method to run an athena query over a
sagemaker.feature_store.feature_group.FeatureGroup
in a Feature Store to retrieve its data. It needs thesagemaker.session.Session
linked to a role or the region and/or role used to work with Feature Stores (it uses the module sagemaker.feature_store.feature_utils.get_session_from_role to get the session).
- Parameters
region (str) – region of the target Feature Store
feature_group_name (str) – feature store name
query (str) – query to run. By default, it will take the latest ingest with data that wasn’t deleted. If latest_ingestion is False it will take all the data in the feature group that wasn’t deleted. It needs to use the keyword “#{table}” to refer to the FeatureGroup name. e.g.: ‘SELECT * FROM “sagemaker_featurestore”.”#{table}”’ It must not end by ‘;’.
athena_bucket (str) – Amazon S3 bucket for running the query
role (str) – role to be assumed to extract data from feature store. If not specified the default sagemaker execution role will be used.
session (str) –
sagemaker.session.Session
of SageMaker used to work with the feature store. Optional, with role and region parameters it will infer the session.event_time_feature_name (str) – eventTimeId feature. Mandatory only if the latest ingestion is True.
latest_ingestion (bool) – if True it will get the data only from the latest ingestion. If False it will take whatever is specified in the query, or if not specify it, it will get all the data that wasn’t deleted.
verbose (bool) – if True show messages, if False is silent.
**kwargs (object) – key arguments used for the method pandas.read_csv to be able to have a better tuning on data. For more info read: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
- Returns
dataset with the data retrieved from feature group
- Return type
pandas.DataFrame
- sagemaker.feature_store.feature_utils.prepare_fg_from_dataframe_or_file(dataframe_or_path, feature_group_name, role=None, region=None, session=None, record_id='record_id', event_id='data_as_of_date', verbose=False, **kwargs)¶
Prepares a dataframe to create a
sagemaker.feature_store.feature_group.FeatureGroup
- Description:
Function to prepare a
pandas.DataFrame
read from a path to a csv file or pass it directly to create asagemaker.feature_store.feature_group.FeatureGroup
. The path to the file needs proper dtypes, feature names and mandatory features (record_id, event_id). It needs thesagemaker.session.Session
linked to a role or the region and/or role used to work with Feature Stores (it uses the module sagemaker.feature_store.feature_utils.get_session_from_role to get the session). If record_id or event_id are not specified it will create ones by default with the names ‘record_id’ and ‘data_as_of_date’.
- Parameters
feature_group_name (str) – feature group name
dataframe_or_path (str, Path, pandas.DataFrame) – pandas.DataFrame or path to the data
verbose (bool) – True for displaying messages, False for silent method.
record_id (str, 'record_id') – (Optional) Feature identifier of the rows. If specified each value of that feature has to be unique. If not specified or record_id=’record_id’, then it will create a new feature from the index of the pandas.DataFrame.
event_id (str) – (Optional) Feature with the time of the creation of data rows. If not specified it will create one with the current time called data_as_of_date
role (str) – role used to get the session.
region (str) – region used to get the session.
session (str) – session of SageMaker used to work with the feature store
**kwargs (object) – key arguments used for the method pandas.read_csv to be able to have a better tuning on data. For more info read: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
- Returns
FG prepared with all the methods and definitions properly defined
- Return type