Performance & Quality

Model Drift Monitor

Track whether repeated prompt outputs change in length, refusals, correctness, formatting quality, or tone over time.

How to use this dashboard

Track whether repeated prompt outputs change in length, refusals, correctness, formatting quality, or tone over time.

Use this monitor to compare repeated model answers over time for length, refusal rate, correctness, formatting quality, and tone.

Model Drift Monitor

3 records
2026-04-29Manual test modelManual log templateReasoning baselinereasoning-0010Not checkedNot scoredNot scoredNot scoredBaseline templatePaste repeat-test results here after running the same prompt against the same model over time.
2026-04-29Manual test modelManual log templateCoding baselinecoding-0010Not checkedNot scoredNot scoredNot scoredBaseline templateTrack whether repeated coding answers become shorter, less direct, more restrictive, or less accurate.
2026-04-29Manual test modelManual log templateWriting baselinewriting-0010Not checkedNot scoredNot scoredNot scoredBaseline templateUse a consistent 1–5 scoring rubric to compare clarity, directness, and usefulness over time.