Performance & Quality

Frontier Leaderboard

A public-source leaderboard framework for frontier models using public human-preference and hard-reasoning benchmark signals.

How to use this dashboard

A public-source leaderboard framework for frontier models using public human-preference and hard-reasoning benchmark signals.

Use this leaderboard to compare frontier-model quality signals across public benchmark, arena, and model-catalog references.

Frontier Leaderboard

40 records
1Google Gemini Pro Latest~Google95Public benchmark slotReasoning signalGeminiaudio, file, image, text, videoFree proxy from public model metadata. Replace with sourced HLE/GPQA/LMArena scores when exact rows are available.
2Google Gemini Flash Latest~Google95Public benchmark slotReasoning signalGeminitext, image, file, audio, videoFree proxy from public model metadata. Replace with sourced HLE/GPQA/LMArena scores when exact rows are available.
3Google: Gemini 3.1 Flash Lite PreviewGoogle95Public benchmark slotReasoning signalGeminitext, image, video, file, audioFree proxy from public model metadata. Replace with sourced HLE/GPQA/LMArena scores when exact rows are available.
4Google: Gemini 3.1 Pro Preview Custom ToolsGoogle95Public benchmark slotReasoning signalGeminitext, audio, image, video, fileFree proxy from public model metadata. Replace with sourced HLE/GPQA/LMArena scores when exact rows are available.
5Google: Gemini 3.1 Pro PreviewGoogle95Public benchmark slotReasoning signalGeminiaudio, file, image, text, videoFree proxy from public model metadata. Replace with sourced HLE/GPQA/LMArena scores when exact rows are available.
6Google: Gemini 3 Flash PreviewGoogle95Public benchmark slotReasoning signalGeminitext, image, file, audio, videoFree proxy from public model metadata. Replace with sourced HLE/GPQA/LMArena scores when exact rows are available.
7Google: Gemini 2.5 Flash Lite Preview 09-2025Google95Public benchmark slotReasoning signalGeminitext, image, file, audio, videoFree proxy from public model metadata. Replace with sourced HLE/GPQA/LMArena scores when exact rows are available.
8Google: Gemini 2.5 Flash LiteGoogle95Public benchmark slotReasoning signalGeminitext, image, file, audio, videoFree proxy from public model metadata. Replace with sourced HLE/GPQA/LMArena scores when exact rows are available.
9Google: Gemini 2.5 FlashGoogle95Public benchmark slotReasoning signalGeminifile, image, text, audio, videoFree proxy from public model metadata. Replace with sourced HLE/GPQA/LMArena scores when exact rows are available.
10Google: Gemini 2.5 ProGoogle95Public benchmark slotReasoning signalGeminitext, image, file, audio, videoFree proxy from public model metadata. Replace with sourced HLE/GPQA/LMArena scores when exact rows are available.
11Google: Gemini 2.5 Pro Preview 06-05Google95Public benchmark slotReasoning signalGeminifile, image, text, audioFree proxy from public model metadata. Replace with sourced HLE/GPQA/LMArena scores when exact rows are available.
12Google: Gemini 2.5 Pro Preview 05-06Google95Public benchmark slotReasoning signalGeminitext, image, file, audio, videoFree proxy from public model metadata. Replace with sourced HLE/GPQA/LMArena scores when exact rows are available.
13Anthropic Claude Sonnet Latest~Anthropic91Public benchmark slotReasoning signalClaudetext, imageFree proxy from public model metadata. Replace with sourced HLE/GPQA/LMArena scores when exact rows are available.
14OpenAI GPT Latest~Openai91Public benchmark slotReasoning signalGPTfile, image, textFree proxy from public model metadata. Replace with sourced HLE/GPQA/LMArena scores when exact rows are available.
15Qwen: Qwen3.5 Plus 2026-04-20Qwen91Public benchmark slotReasoning signalQwentext, image, videoFree proxy from public model metadata. Replace with sourced HLE/GPQA/LMArena scores when exact rows are available.
16Qwen: Qwen3.6 FlashQwen91Public benchmark slotReasoning signalQwentext, image, videoFree proxy from public model metadata. Replace with sourced HLE/GPQA/LMArena scores when exact rows are available.
17OpenAI: GPT-5.5 ProOpenai91Public benchmark slotReasoning signalGPTfile, image, textFree proxy from public model metadata. Replace with sourced HLE/GPQA/LMArena scores when exact rows are available.
18OpenAI: GPT-5.5Openai91Public benchmark slotReasoning signalGPTfile, image, textFree proxy from public model metadata. Replace with sourced HLE/GPQA/LMArena scores when exact rows are available.
19Anthropic: Claude Opus Latest~Anthropic91Public benchmark slotReasoning signalClaudetext, imageFree proxy from public model metadata. Replace with sourced HLE/GPQA/LMArena scores when exact rows are available.
20Anthropic: Claude Opus 4.7Anthropic91Public benchmark slotReasoning signalClaudetext, imageFree proxy from public model metadata. Replace with sourced HLE/GPQA/LMArena scores when exact rows are available.
21Anthropic: Claude Opus 4.6 (Fast)Anthropic91Public benchmark slotReasoning signalClaudetext, imageFree proxy from public model metadata. Replace with sourced HLE/GPQA/LMArena scores when exact rows are available.
22Qwen: Qwen3.6 PlusQwen91Public benchmark slotReasoning signalQwentext, image, videoFree proxy from public model metadata. Replace with sourced HLE/GPQA/LMArena scores when exact rows are available.
23xAI: Grok 4.20 Multi-AgentX Ai91Public benchmark slotReasoning signalGroktext, image, fileFree proxy from public model metadata. Replace with sourced HLE/GPQA/LMArena scores when exact rows are available.
24xAI: Grok 4.20X Ai91Public benchmark slotReasoning signalGroktext, image, fileFree proxy from public model metadata. Replace with sourced HLE/GPQA/LMArena scores when exact rows are available.
25OpenAI: GPT-5.4 ProOpenai91Public benchmark slotReasoning signalGPTtext, image, fileFree proxy from public model metadata. Replace with sourced HLE/GPQA/LMArena scores when exact rows are available.
26OpenAI: GPT-5.4Openai91Public benchmark slotReasoning signalGPTtext, image, fileFree proxy from public model metadata. Replace with sourced HLE/GPQA/LMArena scores when exact rows are available.
27Qwen: Qwen3.5-FlashQwen91Public benchmark slotReasoning signalQwentext, image, videoFree proxy from public model metadata. Replace with sourced HLE/GPQA/LMArena scores when exact rows are available.
28Anthropic: Claude Sonnet 4.6Anthropic91Public benchmark slotReasoning signalClaudetext, imageFree proxy from public model metadata. Replace with sourced HLE/GPQA/LMArena scores when exact rows are available.
29Qwen: Qwen3.5 Plus 2026-02-15Qwen91Public benchmark slotReasoning signalQwentext, image, videoFree proxy from public model metadata. Replace with sourced HLE/GPQA/LMArena scores when exact rows are available.
30Anthropic: Claude Opus 4.6Anthropic91Public benchmark slotReasoning signalClaudetext, imageFree proxy from public model metadata. Replace with sourced HLE/GPQA/LMArena scores when exact rows are available.
31xAI: Grok 4.1 FastX Ai91Public benchmark slotReasoning signalGroktext, image, fileFree proxy from public model metadata. Replace with sourced HLE/GPQA/LMArena scores when exact rows are available.
32Anthropic: Claude Sonnet 4.5Anthropic91Public benchmark slotReasoning signalClaudetext, image, fileFree proxy from public model metadata. Replace with sourced HLE/GPQA/LMArena scores when exact rows are available.
33xAI: Grok 4 FastX Ai91Public benchmark slotReasoning signalGroktext, image, fileFree proxy from public model metadata. Replace with sourced HLE/GPQA/LMArena scores when exact rows are available.
34Anthropic: Claude Sonnet 4Anthropic91Public benchmark slotReasoning signalClaudeimage, text, fileFree proxy from public model metadata. Replace with sourced HLE/GPQA/LMArena scores when exact rows are available.
35OpenAI: GPT-4.1Openai91Public benchmark slotReasoning signalGPTimage, text, fileFree proxy from public model metadata. Replace with sourced HLE/GPQA/LMArena scores when exact rows are available.
36Anthropic Claude Haiku Latest~Anthropic85Public benchmark slotReasoning signalClaudeimage, textFree proxy from public model metadata. Replace with sourced HLE/GPQA/LMArena scores when exact rows are available.
37OpenAI GPT Mini Latest~Openai85Public benchmark slotReasoning signalGPTfile, image, textFree proxy from public model metadata. Replace with sourced HLE/GPQA/LMArena scores when exact rows are available.
38MoonshotAI Kimi Latest~Moonshotai85Public benchmark slotReasoning signalKimi / Moonshottext, imageFree proxy from public model metadata. Replace with sourced HLE/GPQA/LMArena scores when exact rows are available.
39Qwen: Qwen3.6 35B A3BQwen85Public benchmark slotReasoning signalQwentext, image, videoFree proxy from public model metadata. Replace with sourced HLE/GPQA/LMArena scores when exact rows are available.
40Qwen: Qwen3.6 27BQwen85Public benchmark slotReasoning signalQwentext, image, videoFree proxy from public model metadata. Replace with sourced HLE/GPQA/LMArena scores when exact rows are available.