- Run-level score distribution (1–5): used for the score distribution visualization and “Pass Rate”.
- Request-level criteria breakdown: used to compute a normalized score and show which criteria passed/failed.
Run-level scoring (1–5)
On a completed Test Run, the Results section shows a Score Distribution broken into 5 buckets:- 1 — Poor
- 2 — Fair
- 3 — Good
- 4 — Great
- 5 — Perfect
- Pass Rate: computed as “share of completed requests with score ≥ 3”.
- Filtering: clicking a score bucket filters the run’s requests list to that score.
Request-level criteria (1–5 stars per criterion)
Each request in a Test Run can include a list of qualitative evaluation results. Each result contains:- Criteria name (a human-readable label)
- Score value (1–5)
- Description (optional explanatory text)
Normalized “percent score” (0–100)
In the requests table, the Portal derives a normalized percent score for each request by aggregating all criteria:How to use rubrics when debugging
- Start with the distribution: click into low-scoring buckets to focus your debugging.
- Use criteria tooltips: the criteria list tells you why a request scored the way it did.
- Compare responses: for any request, use the response comparison tools to see the original vs. the run output side-by-side (see Interpreting Results).