Interpreting Results

Once a Test Run is complete, the Portal gives you three layers of information:

Details: what configuration was used + timestamps
Results: macro metrics and score distribution
Requests: per-request scores, criteria breakdowns, and response comparisons

Run details (what was executed)

On the Test Run page, the Details section includes:

Date created / completed
Status (e.g. COMPLETED, ERROR, etc.)
Description
Configuration
- The UI lets you open a read-only config panel to inspect the run’s config (including the model).

Results (macro view)

The Results section includes:

# Requests: total requests in the run
Completed % / Error %: how many requests completed successfully vs errored
Response time percentiles
- The UI shows a headline percentile and exposes additional percentiles (e.g. p50/p90/p95) in a tooltip.

Score distribution (1–5)

The score distribution shows the run broken into five buckets:

1 — Poor
2 — Fair
3 — Good
4 — Great
5 — Perfect

You can click buckets to filter the requests table to those scores. Use this to quickly focus on the lowest-performing cases.

The Test Runs table on the Test Set page shows Pass Rate, computed as the percentage of completed requests with a score of 3 (Good) or better.

Requests table (debugging view)

The requests table is where you debug individual cases. You’ll see:

Status for each request execution
Tags associated with the underlying test request
Score
- Hover to see the criteria breakdown (criteria name + 1–5 star value + optional description)
Time (per-request response time, when available)
Actions
- View: open the original request for deeper inspection
- Responses: compare the baseline/original response against the test-run response side-by-side
- Compare: open a comparison modal for the same request across multiple runs

Comparing runs

Maitai supports two comparison workflows in the Portal:

Compare Runs (test-set level): select multiple completed runs from the Test Set page and open a table that shows request-by-request scores across runs. You can also open a “Responses” comparison to view multiple run outputs side-by-side.
Compare (single request across runs): from a Test Run’s requests table, open a modal that lists how that specific request performed across different runs.

Test Run Execution Regression Monitoring

⌘I

​Run details (what was executed)

​Results (macro view)

​Score distribution (1–5)

​Requests table (debugging view)

​Comparing runs

Run details (what was executed)

Results (macro view)

Score distribution (1–5)

Requests table (debugging view)

Comparing runs