Saltar al contenido
Coding

SWE-bench-Verified

Issues reales de GitHub de 12 repos populares de Python.

47 modelos publicaron score
# Modelo Empresa Score
1 Claude Mythos 5 Anthropic 95.5
2 Claude Fable 5 Anthropic 95.0
3 Claude Mythos Preview Anthropic 93.9
4 Claude Opus 4.8 Anthropic 88.6
5 Claude Opus 4.7 Anthropic 87.6
6 Claude Opus 4.5 Anthropic 80.9
7 Claude Opus 4.6 Anthropic 80.8
8 Gemini 3.1 Pro Google DeepMind 80.6
9 DeepSeek V4 Pro DeepSeek 80.6
10 Qwen3.7-Max Alibaba 80.4
11 MiniMax M2.5 MiniMax 80.2
12 Kimi K2.6 Moonshot AI 80.2
13 GPT-5.4 OpenAI 80.0
14 GPT-5.2 OpenAI 80.0
15 Claude Sonnet 4.6 Anthropic 79.6
16 MiMo V2.5 Pro Xiaomi 78.9
17 Qwen3.6-Plus Alibaba 78.8
18 Gemini 3 Flash Google DeepMind 78.0
19 MiniMax M2.7 MiniMax 78.0
20 GLM-5 Zhipu AI 77.8
21 GLM-5.1 Zhipu AI 77.8
22 Mistral Medium 3.5 Mistral AI 77.6
23 Qwen3.6-27B Alibaba 77.2
24 Kimi K2.5 Moonshot AI 76.8
25 Doubao Seed 2.0 Pro ByteDance 76.5
26 Qwen3.5-397B-A17B Alibaba 76.4
27 Gemini 3 Pro Google DeepMind 76.2
28 Qwen3-Max-Thinking Alibaba 75.3
29 Grok 4 xAI 75.0
30 Step 3.5 Flash StepFun 74.4
31 Hunyuan Hy3-preview Tencent 74.4
32 Ring-2.6-1T Ant Group 74.0
33 GLM-4.7 Zhipu AI 73.8
34 Doubao Seed 2.0 Lite ByteDance 73.5
35 Qwen3.6-35B-A3B Alibaba 73.4
36 Claude Haiku 4.5 Anthropic 73.3
37 DeepSeek V3.2 DeepSeek 73.1
38 Devstral 2 Mistral AI 72.2
39 Nemotron 3 Ultra 550B-A55B Nvidia 71.9
40 Qwen3-Coder-Next Alibaba 70.6
41 Qwen3-Max Alibaba 69.6
42 Devstral Small 2 Mistral AI 68.0
43 GLM-4.6 Zhipu AI 68.0
44 GLM-4.5 Zhipu AI 64.2
45 Nemotron 3 Super Nvidia 60.5
46 DeepSeek R1 0528 DeepSeek 57.6
47 K-EXAONE 236B-A23B LG AI Research 49.4