Apple Intelligence Foundation Model Server. PT-MoE transformer. Multilingue, multimodal. (MGSM removido: Apple solo publica regression % rel...
3B params optimizado para Apple Silicon. 2-bit QAT. Supera Qwen-2.5-3B, Gemma-3-4B. (MMLU corregido 67.9→67.8 per arxiv 2507.13575; MGSM rem...