Skip to content
Reasoning

MMLU-Pro

MMLU upgraded with harder questions and 10 answer options.

30 models published a score
# Model Company Score
1 Claude Opus 4.7 Anthropic 91.5
2 Claude Opus 4.6 Anthropic 90.5
3 Claude Opus 4.5 Anthropic 90.0
4 Gemini 3 Pro Google DeepMind 89.8
5 Qwen3.7-Max Alibaba 89.6
6 Doubao Seed 2.0 Lite ByteDance 87.7
7 DeepSeek V4 Pro DeepSeek 87.5
8 Kimi K2.5 Moonshot AI 87.1
9 Grok 4 xAI 87.0
10 Doubao Seed 2.0 Pro ByteDance 87.0
11 Nemotron 3 Ultra 550B-A55B Nvidia 86.8
12 Qwen3-Max-Thinking Alibaba 85.7
13 Gemma 4 (31B dense) Google DeepMind 85.2
14 Qwen3.6-35B-A3B Alibaba 85.2
15 DeepSeek V3.2 DeepSeek 85.0
16 K-EXAONE 236B-A23B LG AI Research 83.8
17 Nemotron 3 Super Nvidia 83.7
18 EXAONE 4.5 33B LG AI Research 83.3
19 Gemma 4 26B-A4B Google DeepMind 82.6
20 Llama 4 Behemoth Meta 82.2
21 Nova 2 Lite Amazon 80.9
22 Llama 4 Maverick Meta 80.5
23 Nemotron 3 Nano Nvidia 78.3
24 Mistral Large 3 Mistral AI 78.0
25 Llama 4 Scout Meta 74.3
26 Nova Premier Amazon 73.3
27 Gemma 4 E4B Google DeepMind 69.4
28 Reka Flash 3 Reka 65.0
29 Gemma 4 E2B Google DeepMind 60.0
30 Jamba 1.7 Large AI21 Labs 57.7