News

8 April 20264 min read

AI Is Leaving the Cloud — And It's Happening Faster Than You Think

Google just shipped Gemma 4 running fully offline on your iPhone. Ollama just got MLX-powered on Apple Silicon. Local AI is not a power-user trick anymore — it's going mainstream.

Delv Editorial

Delv Team

For years, local AI meant convincing your laptop to melt through a model that answered questions slightly worse than a Magic 8-Ball. That era is over.

This week, Google shipped the AI Edge Gallery app for iPhone — and it's actually good. Running Gemma 4 directly on your device, no internet connection, no monthly subscription, no data going anywhere. You download a 2.54GB model file and off you go: chat, image questions, audio transcription, even tool use via Agent Skills that hook into Wikipedia, interactive maps, and community-built widgets.

This is Google's first official app for on-device AI. And it's not a demo — it works.

Why this matters more than another chatbot launch

We've had a lot of model releases lately. Each one slightly smarter than the last, each one living on someone else's server, reading your prompts, billing you by the token.

Local AI flips that entirely. When the model runs on your hardware:

Your data never leaves your device. Full stop.
No internet required. Use it on a plane, in a tunnel, in a country with censored AI access.
No subscription. Download once, use forever.
No latency from the cloud. Just your phone's chips doing the work.

The trade-off used to be capability — local models were noticeably worse. Gemma 4's 2B model is small by today's standards, but it's fast, genuinely useful for most everyday tasks, and the gap is closing faster than anyone expected.

The Ollama angle: this is happening on desktop too

Ollama just shipped MLX support for Apple Silicon (March 30, 2026 — barely a week ago). MLX is Apple's own machine learning framework optimised for M-series chips, and the result is significantly faster local model inference on Mac.

Combined with the OpenAI Codex CLI integration — which lets you point Codex at open-weight models running locally via Ollama instead of OpenAI's API — you can now run a capable coding assistant entirely offline, on your own hardware, with open-source models.

That's not a power-user trick anymore. That's a realistic setup for anyone with a recent Mac.

What's the catch?

Size matters. The better the model, the bigger the download. If you want something closer to GPT-4o quality locally, you're looking at 70B+ parameter models — which means serious hardware, not a phone. Context windows are still limited. Local models running on mobile can't hold long conversations the way cloud models can. You'll hit memory limits faster. No persistent history — at least in Google's Edge Gallery app right now. Conversations are ephemeral. Close the app and it's gone. The ecosystem is fragmented. Different apps, different model formats, different UIs. There's no single local AI experience yet. Ollama is the closest thing to a standard on desktop, but it's still a bit of a wild west.

The tools to watch

Google AI Edge Gallery — the easiest on-ramp for iPhone users right now. Free, official, Gemma 4 support, Agent Skills baked in.
Ollama — the backbone of local AI on desktop. MLX-powered on Apple Silicon as of last week.
LM Studio — polished GUI for running local models on Mac, Windows, and Linux, with a built-in model browser.
Jan — open-source, privacy-first alternative. Local by design, no cloud fallback.

The bigger picture

There's a growing group of people who don't want their prompts going to San Francisco. Privacy-conscious users, enterprise teams with compliance requirements, people in regions with restricted internet access, developers who want to work offline — they all have good reasons to want AI that runs on their own hardware.

The cloud is not going anywhere. For heavy lifting — long documents, complex reasoning, multimodal tasks — the big cloud models are still significantly better. But good enough, private, free, and offline is a compelling package, and the tools delivering that are improving at a remarkable pace.

Watch this space. In 12 months, the conversation will not be cloud vs local — it'll be about which tasks you route where, and why.

Delv Editorial

Delv Team

The Delv editorial team reviews AI tools, MCP servers, Agent Skills, and autonomous agents. Reviews are drafted with AI assistance and human oversight. Every install command and config snippet is verified against the source. We're independent, we don't sell tools, and we say when something isn't worth it.

AI ToolsMCPSkillsAgents

AI Is Leaving the Cloud — And It's Happening Faster Than You Think

Google just shipped Gemma 4 running fully offline on your iPhone. Ollama just got MLX-powered on Apple Silicon. Local AI is not a power-user trick anymore — it's going mainstream.

By Delv Editorial8 April 20264 min read

For years, local AI meant convincing your laptop to melt through a model that answered questions slightly worse than a Magic 8-Ball. That era is over.

This is Google's first official app for on-device AI. And it's not a demo — it works.

Why this matters more than another chatbot launch

We've had a lot of model releases lately. Each one slightly smarter than the last, each one living on someone else's server, reading your prompts, billing you by the token.

Local AI flips that entirely. When the model runs on your hardware: - Your data never leaves your device. Full stop. - No internet required. Use it on a plane, in a tunnel, in a country with censored AI access. - No subscription. Download once, use forever. - No latency from the cloud. Just your phone's chips doing the work.

The Ollama angle: this is happening on desktop too

ollama just shipped MLX support for Apple Silicon (March 30, 2026 — barely a week ago). MLX is Apple's own machine learning framework optimised for M-series chips, and the result is significantly faster local model inference on Mac.

That's not a power-user trick anymore. That's a realistic setup for anyone with a recent Mac.

What's the catch?

Size matters. The better the model, the bigger the download. If you want something closer to GPT-4o quality locally, you're looking at 70B+ parameter models — which means serious hardware, not a phone.

Context windows are still limited. Local models running on mobile can't hold long conversations the way cloud models can. You'll hit memory limits faster.

No persistent history — at least in Google's Edge Gallery app right now. Conversations are ephemeral. Close the app and it's gone.

The ecosystem is fragmented. Different apps, different model formats, different UIs. There's no single local AI experience yet. Ollama is the closest thing to a standard on desktop, but it's still a bit of a wild west.

The tools to watch - Google AI Edge Gallery — the easiest on-ramp for iPhone users right now. Free, official, Gemma 4 support, Agent Skills baked in. - ollama — the backbone of local AI on desktop. MLX-powered on Apple Silicon as of last week. - LM Studio — polished GUI for running local models on Mac, Windows, and Linux, with a built-in model browser. - Jan — open-source, privacy-first alternative. Local by design, no cloud fallback.

The bigger picture

Watch this space. In 12 months, the conversation will not be cloud vs local — it'll be about which tasks you route where, and why.

It felt sudden. It wasn't. A short history of how the iceberg surfaced.

8 min read

Karpathy's actual CLAUDE.md is boring. The viral one is something else entirely.

5 min read

I installed Osaurus on my Mac this week. Here's what it actually changes.

5 min read

AI Is Leaving the Cloud — And It's Happening Faster Than You Think

Why this matters more than another chatbot launch

The Ollama angle: this is happening on desktop too

What's the catch?

The tools to watch

The bigger picture

Related Articles

It felt sudden. It wasn't. A short history of how the iceberg surfaced.

Karpathy's actual CLAUDE.md is boring. The viral one is something else entirely.

I installed Osaurus on my Mac this week. Here's what it actually changes.