Skip to content
FB
Frontier Benchmarks AI
Models
Wizard
Battle
Hardware
Pricing
Methodology
Download
Search
/
EN
ES
Home
Models
Wizard
Battle
Hardware
Pricing
Methodology
Download
home
/
benchmarks
/
MMLU-Pro
Reasoning
MMLU-Pro
MMLU upgraded with harder questions and 10 answer options.
30 models published a score
#
Model
Company
Score
1
Claude Opus 4.7
Anthropic
91.5
2
Claude Opus 4.6
Anthropic
90.5
3
Claude Opus 4.5
Anthropic
90.0
4
Gemini 3 Pro
Google DeepMind
89.8
5
Qwen3.7-Max
Alibaba
89.6
6
Doubao Seed 2.0 Lite
ByteDance
87.7
7
DeepSeek V4 Pro
DeepSeek
87.5
8
Kimi K2.5
Moonshot AI
87.1
9
Grok 4
xAI
87.0
10
Doubao Seed 2.0 Pro
ByteDance
87.0
11
Nemotron 3 Ultra 550B-A55B
Nvidia
86.8
12
Qwen3-Max-Thinking
Alibaba
85.7
13
Gemma 4 (31B dense)
Google DeepMind
85.2
14
Qwen3.6-35B-A3B
Alibaba
85.2
15
DeepSeek V3.2
DeepSeek
85.0
16
K-EXAONE 236B-A23B
LG AI Research
83.8
17
Nemotron 3 Super
Nvidia
83.7
18
EXAONE 4.5 33B
LG AI Research
83.3
19
Gemma 4 26B-A4B
Google DeepMind
82.6
20
Llama 4 Behemoth
Meta
82.2
21
Nova 2 Lite
Amazon
80.9
22
Llama 4 Maverick
Meta
80.5
23
Nemotron 3 Nano
Nvidia
78.3
24
Mistral Large 3
Mistral AI
78.0
25
Llama 4 Scout
Meta
74.3
26
Nova Premier
Amazon
73.3
27
Gemma 4 E4B
Google DeepMind
69.4
28
Reka Flash 3
Reka
65.0
29
Gemma 4 E2B
Google DeepMind
60.0
30
Jamba 1.7 Large
AI21 Labs
57.7
← All benchmarks
How we measure