Saltar al contenido
Reasoning

GPQA-Diamond

Graduate-level Physics, Chemistry, Biology - preguntas de nivel doctoral.

61 modelos publicaron score
# Modelo Empresa Score
1 Claude Mythos Preview Anthropic 94.6
2 GPT-5.4 Pro OpenAI 94.4
3 Gemini 3.1 Pro Google DeepMind 94.3
4 Claude Opus 4.7 Anthropic 94.2
5 Claude Mythos 5 Anthropic 94.1
6 GPT-5.5 OpenAI 93.6
7 Claude Opus 4.8 Anthropic 93.6
8 GPT-5.2 Pro OpenAI 93.2
9 GPT-5.4 OpenAI 92.8
10 GPT-5.2 OpenAI 92.4
11 Qwen3.7-Max Alibaba 92.4
12 Gemini 3 Pro Google DeepMind 91.9
13 Claude Opus 4.6 Anthropic 91.3
14 GLM-5.2 Zhipu AI 91.2
15 Kimi K2.6 Moonshot AI 90.5
16 Gemini 3 Flash Google DeepMind 90.4
17 DeepSeek V4 Pro DeepSeek 90.1
18 Claude Sonnet 4.6 Anthropic 89.9
19 Doubao Seed 2.0 Pro ByteDance 88.9
20 Ring-2.6-1T Ant Group 88.3
21 GPT-5.4 mini OpenAI 88.0
22 Grok 4 Heavy xAI 88.0
23 Grok 4 xAI 88.0
24 Kimi K2.5 Moonshot AI 87.6
25 Qwen3-Max-Thinking Alibaba 87.4
26 Hunyuan Hy3-preview Tencent 87.2
27 Claude Opus 4.5 Anthropic 87.0
28 Nemotron 3 Ultra 550B-A55B Nvidia 87.0
29 Gemini 3.1 Flash-Lite Google DeepMind 86.9
30 GLM-5.1 Zhipu AI 86.2
31 Qwen3.6-35B-A3B Alibaba 86.0
32 GLM-5 Zhipu AI 86.0
33 DeepSeek V3.2 Speciale DeepSeek 85.7
34 Gemma 4 (31B dense) Google DeepMind 84.3
35 MiMo V2 Flash Xiaomi 83.7
36 GLM-4.6 Zhipu AI 82.9
37 GPT-5.4 nano OpenAI 82.8
38 DeepSeek V3.2 DeepSeek 82.4
39 Gemma 4 26B-A4B Google DeepMind 82.3
40 DeepSeek R1 0528 DeepSeek 81.0
41 EXAONE 4.5 33B LG AI Research 80.5
42 Nova 2 Lite Amazon 79.6
43 Nemotron 3 Super Nvidia 79.2
44 K-EXAONE 236B-A23B LG AI Research 79.1
45 Step 3.7 Flash StepFun 77.8
46 Qwen3-Max Alibaba 76.4
47 Magistral Medium 1.2 Mistral AI 76.3
48 Llama 4 Behemoth Meta 73.7
49 Nemotron 3 Nano Nvidia 73.0
50 Mistral Small 4 Mistral AI 71.2
51 Step-3 StepFun 70.0
52 Llama 4 Maverick Meta 69.8
53 Gemma 4 E4B Google DeepMind 58.6
54 Llama 4 Scout Meta 57.2
55 Reka Flash 3 Reka 52.9
56 Yi-Lightning 01.AI 50.9
57 Command A Cohere 50.8
58 Nova Pro Amazon 46.9
59 Mistral Large 3 Mistral AI 43.9
60 Gemma 4 E2B Google DeepMind 43.4
61 Jamba 1.7 Large AI21 Labs 39.0