sagemaker.train.data_mixing_config#
Configuration for blending customer training data with Nova curated datasets.
Classes
|
Configuration for blending customer training data with Nova curated datasets. |
- class sagemaker.train.data_mixing_config.DataMixingConfig(*, customer_data_percent: float, nova_data_percentages: Dict[str, float] | None = None)[source]#
Bases:
BaseModelConfiguration for blending customer training data with Nova curated datasets.
- customer_data_percent#
Percentage of total training mix that is customer data (0-100).
- Type:
float
- nova_data_percentages#
Optional per-category percentage distribution within the Nova data portion. Keys are category names (e.g., “en-entertainment”, “code”, “math”). Values must each be 0-100 and must sum to 100 when provided. If None, all default percentages from the recipe template are used at submission time. If provided, unspecified recipe categories are set to 0.
- Type:
Dict[str, float] | None
- customer_data_percent: float#
- classmethod from_recipe_config(config: Dict[str, Any]) DataMixingConfig[source]#
Reconstruct a DataMixingConfig from the serialized recipe config format.
- Parameters:
config – Dictionary in the to_recipe_config() output format, with “customer_data” containing a “percent” field and “nova_data” containing per-category entries each with a “percent” field.
- Returns:
A DataMixingConfig instance equivalent to the one that produced the config.
- model_config: ClassVar[ConfigDict] = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- nova_data_percentages: Dict[str, float] | None#
- to_hyperparameters() Dict[str, str][source]#
Serialize to flat hyperparameter format for SMTJ serverless.
Returns a dictionary of hyperparameter names to string values, using the naming convention expected by the serverless training platform: -
customer_data_percentfor the customer data portion -nova_<category>_percentfor each Nova data categoryWhen nova_data_percentages is None, only customer_data_percent is returned.
- Returns:
- {
“customer_data_percent”: “70”, “nova_code_percent”: “30”, “nova_math_percent”: “20”, …
}
- Return type:
Dictionary with flat hyperparameter keys and string values, e.g.
- to_recipe_config() Dict[str, Any][source]#
Serialize to the recipe configuration format.
Returns a dictionary structured for the data_mixing.sources section of the training recipe format.
- Returns:
- {
“customer_data”: {“percent”: <float>}, “nova_data”: {“<category>”: {“percent”: <float>}, …}
}
When nova_data_percentages is None, nova_data will be an empty dictionary.
- Return type:
Dictionary with structure