Most of your phone’s AI features rely quietly on a server somewhere. Ask a question, get an answer, but behind the scenes, your data zips up to the cloud. Google wants to change that, and Gemma 4 on Android is a big part of that.
Google DeepMind launched Gemma 4 in partnership with Arm last week, optimizing the model to run directly on Arm-based Android devices. According to Google, Gemma 4 on Android is up to 4 times faster than previous generations and uses 60% less battery. Smaller E2B and E4B models, built specifically for phones, can handle text, images and audio without touching the Internet.
Arm’s SME2 instruction set, built into the new Armv9 CPUs, accelerates the matrix math that AI models rely on. Arm’s initial engineering tests show an average of 5.5 times faster processing of user input and 1.6 times faster response generation on the Gemma 4 E2B model. Those benefits come through Arm’s CloudAI software layer, which plugs into Google’s existing runtime libraries. Developers do not need to change their code to profit.
What does this mean in practice
The clearest real-world example comes from EnVision, an accessibility app for blind and low vision users. Apps have historically relied on cloud connectivity for visual interpretation. In a prototype running Gemma 4 locally on an Arm CPU, a user can take a photo and get a detailed description of what is in front of them without the need for a network. Offline availability is not a good thing for this kind of feature. This is the whole issue.
Google is also using Gemma 4 as the foundation for the Gemini Nano 4, the next generation of its on-device model for Android. Developers building Android apps with AI features using Gemini 4 today will get automatic compatibility with the Gemini Nano 4 when it arrives on the flagship later this year. The Gemini Nano already runs natively on Android thanks to features like smart replies and audio summaries, and chip makers like MediaTek have been optimizing for on-device AI for some time now. Gemma 4 extends this with multimodal support and agentic capabilities.
Developers can access the E2B and E4B models through the Google AI Edge Gallery on Android and iOS today under the Apache 2.0 license.
