Performance & Quality

Model Drift Monitor

Track whether repeated prompt outputs change in length, refusals, correctness, formatting quality, or tone over time.

How to use this dashboard

Track whether repeated prompt outputs change in length, refusals, correctness, formatting quality, or tone over time.

Use this monitor to compare repeated model answers over time for length, refusal rate, correctness, formatting quality, and tone.

3 records

Search Provider / Source


2026-04-29	Manual test model	Manual log template	Reasoning baseline	reasoning-001	Not checked	Not scored	Not scored	Not scored	Baseline template	Paste repeat-test results here after running the same prompt against the same model over time.
2026-04-29	Manual test model	Manual log template	Coding baseline	coding-001	Not checked	Not scored	Not scored	Not scored	Baseline template	Track whether repeated coding answers become shorter, less direct, more restrictive, or less accurate.
2026-04-29	Manual test model	Manual log template	Writing baseline	writing-001	Not checked	Not scored	Not scored	Not scored	Baseline template	Use a consistent 1–5 scoring rubric to compare clarity, directness, and usefulness over time.