Ollama has released a performance-focused update for macOS that leverages Apple’s MLX framework to significantly increase processing speeds for local AI models on Apple silicon hardware.
Key points
- Ollama version 0.19 delivers a 1.6x increase in prompt prefill speeds and nearly doubles response generation rates.
- The update utilizes Apple’s GPU Neural Accelerators, providing the most substantial performance gains for Macs equipped with M5-series chips.
- Improved memory management enhances responsiveness for AI-powered coding agents and personal assistants during extended sessions.
- The current preview release requires a Mac with at least 32GB of unified memory to function.
- Initial support is limited to Alibaba’s Qwen3.5 model, with plans to expand compatibility to additional AI models in future updates.
This update lowers the barrier for developers and power users to run sophisticated AI models locally on high-end Apple hardware without relying on cloud-based services. By optimizing performance for local silicon, Ollama is making private, high-speed AI tools more practical for resource-intensive tasks like coding and data analysis.