sagemaker.train.evaluate.llmaj_inference_benchmark

sagemaker.train.evaluate.llmaj_inference_benchmark#

This module generates the InspectAI benchmark Python file and supporting configuration that runs inside the InspectAI container to produce inference responses for LLM-as-Judge evaluation. It also handles dataset format conversion from LLMAJ format to InspectAI format.

Functions

convert_dataset_to_inspectai_format(...)

Convert LLMAJ dataset format to InspectAI format.

generate_benchmark_files()

Generate the benchmark directory file contents.

sagemaker.train.evaluate.llmaj_inference_benchmark.convert_dataset_to_inspectai_format(dataset_content: str) str[source]#

Convert LLMAJ dataset format to InspectAI format.

Transform each JSONL line from {"prompt": "..."} or {"query": "..."} to {"input": "...", "target": ""} as expected by InspectAI’s json_dataset loader.

Parameters:

dataset_content (str) – Raw JSONL content from the customer’s dataset. Each non-empty line must be a JSON object containing either a "prompt" or "query" field.

Returns:

Converted JSONL string in InspectAI format with one {"input": ..., "target": ""} object per line.

Return type:

str

Raises:

ValueError – If a line contains neither "prompt" nor "query" field.

sagemaker.train.evaluate.llmaj_inference_benchmark.generate_benchmark_files() dict[str, str][source]#

Generate the benchmark directory file contents.

Returns:

Dict mapping filename to file content string.

Return type:

dict[str, str]