Saltar al contenido
Reasoning

MMLU

Massive Multitask Language Understanding - 57 materias academicas, ~16K preguntas.

12 modelos publicaron score
# Modelo Empresa Score
1 Gemini 3 Pro Google DeepMind 91.8
2 DeepSeek R1 0528 DeepSeek 90.8
3 Nova Premier Amazon 87.4
4 Grok 4 xAI 86.6
5 Nova Pro Amazon 85.9
6 Llama 4 Maverick Meta 85.5
7 Mistral Large 3 Mistral AI 85.5
8 Command A Cohere 85.5
9 Nova Lite Amazon 80.5
10 AFM Server Apple 80.0
11 Llama 4 Scout Meta 79.6
12 AFM On-Device Apple 67.8