Skip to content
FB
Frontier Benchmarks AI
Models
Wizard
Battle
Hardware
Pricing
Methodology
Download
Search
/
EN
ES
Home
Models
Wizard
Battle
Hardware
Pricing
Methodology
Download
home
/
benchmarks
/
SWE-bench-Verified
Coding
SWE-bench-Verified
Real GitHub issues from 12 popular Python repos.
47 models published a score
#
Model
Company
Score
1
Claude Mythos 5
Anthropic
95.5
2
Claude Fable 5
Anthropic
95.0
3
Claude Mythos Preview
Anthropic
93.9
4
Claude Opus 4.8
Anthropic
88.6
5
Claude Opus 4.7
Anthropic
87.6
6
Claude Opus 4.5
Anthropic
80.9
7
Claude Opus 4.6
Anthropic
80.8
8
Gemini 3.1 Pro
Google DeepMind
80.6
9
DeepSeek V4 Pro
DeepSeek
80.6
10
Qwen3.7-Max
Alibaba
80.4
11
MiniMax M2.5
MiniMax
80.2
12
Kimi K2.6
Moonshot AI
80.2
13
GPT-5.4
OpenAI
80.0
14
GPT-5.2
OpenAI
80.0
15
Claude Sonnet 4.6
Anthropic
79.6
16
MiMo V2.5 Pro
Xiaomi
78.9
17
Qwen3.6-Plus
Alibaba
78.8
18
Gemini 3 Flash
Google DeepMind
78.0
19
MiniMax M2.7
MiniMax
78.0
20
GLM-5
Zhipu AI
77.8
21
GLM-5.1
Zhipu AI
77.8
22
Mistral Medium 3.5
Mistral AI
77.6
23
Qwen3.6-27B
Alibaba
77.2
24
Kimi K2.5
Moonshot AI
76.8
25
Doubao Seed 2.0 Pro
ByteDance
76.5
26
Qwen3.5-397B-A17B
Alibaba
76.4
27
Gemini 3 Pro
Google DeepMind
76.2
28
Qwen3-Max-Thinking
Alibaba
75.3
29
Grok 4
xAI
75.0
30
Step 3.5 Flash
StepFun
74.4
31
Hunyuan Hy3-preview
Tencent
74.4
32
Ring-2.6-1T
Ant Group
74.0
33
GLM-4.7
Zhipu AI
73.8
34
Doubao Seed 2.0 Lite
ByteDance
73.5
35
Qwen3.6-35B-A3B
Alibaba
73.4
36
Claude Haiku 4.5
Anthropic
73.3
37
DeepSeek V3.2
DeepSeek
73.1
38
Devstral 2
Mistral AI
72.2
39
Nemotron 3 Ultra 550B-A55B
Nvidia
71.9
40
Qwen3-Coder-Next
Alibaba
70.6
41
Qwen3-Max
Alibaba
69.6
42
Devstral Small 2
Mistral AI
68.0
43
GLM-4.6
Zhipu AI
68.0
44
GLM-4.5
Zhipu AI
64.2
45
Nemotron 3 Super
Nvidia
60.5
46
DeepSeek R1 0528
DeepSeek
57.6
47
K-EXAONE 236B-A23B
LG AI Research
49.4
← All benchmarks
How we measure