Reasoning

MMLU

Massive Multitask Language Understanding - 57 materias academicas, ~16K preguntas.

12 modelos publicaron score

#	Modelo	Empresa	Score
1	Gemini 3 Pro	Google DeepMind	91.8
2	DeepSeek R1 0528	DeepSeek	90.8
3	Nova Premier	Amazon	87.4
4	Grok 4	xAI	86.6
5	Nova Pro	Amazon	85.9
6	Llama 4 Maverick	Meta	85.5
7	Mistral Large 3	Mistral AI	85.5
8	Command A	Cohere	85.5
9	Nova Lite	Amazon	80.5
10	AFM Server	Apple	80.0
11	Llama 4 Scout	Meta	79.6
12	AFM On-Device	Apple	67.8