Saltar al contenido
FB
Frontier Benchmarks AI
Modelos
Wizard
Battle
Hardware
Pricing
Methodology
Descargar
Buscar
/
EN
ES
Home
Modelos
Wizard
Battle
Hardware
Pricing
Methodology
Descargar
home
/
benchmarks
/
GPQA-Diamond
Reasoning
GPQA-Diamond
Graduate-level Physics, Chemistry, Biology - preguntas de nivel doctoral.
61 modelos publicaron score
#
Modelo
Empresa
Score
1
Claude Mythos Preview
Anthropic
94.6
2
GPT-5.4 Pro
OpenAI
94.4
3
Gemini 3.1 Pro
Google DeepMind
94.3
4
Claude Opus 4.7
Anthropic
94.2
5
Claude Mythos 5
Anthropic
94.1
6
GPT-5.5
OpenAI
93.6
7
Claude Opus 4.8
Anthropic
93.6
8
GPT-5.2 Pro
OpenAI
93.2
9
GPT-5.4
OpenAI
92.8
10
GPT-5.2
OpenAI
92.4
11
Qwen3.7-Max
Alibaba
92.4
12
Gemini 3 Pro
Google DeepMind
91.9
13
Claude Opus 4.6
Anthropic
91.3
14
GLM-5.2
Zhipu AI
91.2
15
Kimi K2.6
Moonshot AI
90.5
16
Gemini 3 Flash
Google DeepMind
90.4
17
DeepSeek V4 Pro
DeepSeek
90.1
18
Claude Sonnet 4.6
Anthropic
89.9
19
Doubao Seed 2.0 Pro
ByteDance
88.9
20
Ring-2.6-1T
Ant Group
88.3
21
GPT-5.4 mini
OpenAI
88.0
22
Grok 4 Heavy
xAI
88.0
23
Grok 4
xAI
88.0
24
Kimi K2.5
Moonshot AI
87.6
25
Qwen3-Max-Thinking
Alibaba
87.4
26
Hunyuan Hy3-preview
Tencent
87.2
27
Claude Opus 4.5
Anthropic
87.0
28
Nemotron 3 Ultra 550B-A55B
Nvidia
87.0
29
Gemini 3.1 Flash-Lite
Google DeepMind
86.9
30
GLM-5.1
Zhipu AI
86.2
31
Qwen3.6-35B-A3B
Alibaba
86.0
32
GLM-5
Zhipu AI
86.0
33
DeepSeek V3.2 Speciale
DeepSeek
85.7
34
Gemma 4 (31B dense)
Google DeepMind
84.3
35
MiMo V2 Flash
Xiaomi
83.7
36
GLM-4.6
Zhipu AI
82.9
37
GPT-5.4 nano
OpenAI
82.8
38
DeepSeek V3.2
DeepSeek
82.4
39
Gemma 4 26B-A4B
Google DeepMind
82.3
40
DeepSeek R1 0528
DeepSeek
81.0
41
EXAONE 4.5 33B
LG AI Research
80.5
42
Nova 2 Lite
Amazon
79.6
43
Nemotron 3 Super
Nvidia
79.2
44
K-EXAONE 236B-A23B
LG AI Research
79.1
45
Step 3.7 Flash
StepFun
77.8
46
Qwen3-Max
Alibaba
76.4
47
Magistral Medium 1.2
Mistral AI
76.3
48
Llama 4 Behemoth
Meta
73.7
49
Nemotron 3 Nano
Nvidia
73.0
50
Mistral Small 4
Mistral AI
71.2
51
Step-3
StepFun
70.0
52
Llama 4 Maverick
Meta
69.8
53
Gemma 4 E4B
Google DeepMind
58.6
54
Llama 4 Scout
Meta
57.2
55
Reka Flash 3
Reka
52.9
56
Yi-Lightning
01.AI
50.9
57
Command A
Cohere
50.8
58
Nova Pro
Amazon
46.9
59
Mistral Large 3
Mistral AI
43.9
60
Gemma 4 E2B
Google DeepMind
43.4
61
Jamba 1.7 Large
AI21 Labs
39.0
← Todos los benchmarks
Como medimos