Google’s Gemini AI models have improved by leaps and bounds over the past year,
Sorry about that. Google gave us the wrong link!Ollama link is incorrect: https://ollama.com/library/gemma4
Why wait. It's Google! The sooner you leave the better it is. No brainer stuffIf Assistant goes away and Gemini becomes mandatory, I'm buying an iPhone.
These LLMs need an option for concise reponse.If this is a leaps and bounds improvement, well, compared to what? Assistant is short and to the point. Gemini won't shut up. Ever. It will stop talking when it is damn good and ready. Your desire for it to just shut up and go is immaterial.
Gemini is annoying, stupid, and, like Trump, it has to continually demonstrate how not stupid it is.
If Assistant goes away and Gemini becomes mandatory, I'm buying an iPhone.
Ha! They probably had Gemini compile the links.Sorry about that. Google gave us the wrong link!
I'm not surprised. Google and Microsoft compete for the dumbest dropped balls.Sorry about that. Google gave us the wrong link!
You know, I have begun to suspect that Google is using LLMs to assemble press kits. Some oddly consistent errors keep happening.Ha! They probably had Gemini compile the links.
Why would they when they are paid per token?These LLMs need an option for concise reponse.
The non-determinism (or at least, highly obfuscated determinism) of LLMs is fascinating. I have never had Gemini pop back with suggestions when navigating, but it must have some sort of context switch that tells it to be more or lessApparently, Google forced Gemini onto my phone last night. It came up this morning when I gave an address to navigate to. Thereafter followed an annoying and stupid conversation.
"I know what's at that address, the Emory Brain Health Center. There are several clinics in the building, are you going to one in particular?"
No. shut up and go.
"Here are some places nearby. Would you like to add any of these to your trip?"
No. Shut up and go.
"It seems you want to navigate to the Emory Brain Health Center, Is that correct?"
Yes. Shut up and go.
"To do this, you need to give me permission to access Maps. Please press the OK button on the screen."
Which I did. And the damn thing forgot the entire conversation we just had. It no longer knew where I wanted to go. I gave it the address and had to go through the whole stupid conversation a second time.
If this is a leaps and bounds improvement, well, compared to what? Assistant is short and to the point. Gemini won't shut up. Ever. It will stop talking when it is damn good and ready. Your desire for it to just shut up and go is immaterial.
Gemini is annoying, stupid, and, like Trump, it has to continually demonstrate how not stupid it is.
If Assistant goes away and Gemini becomes mandatory, I'm buying an iPhone.
"That is a great point! A concise answer would be helpful because it would allow you, the user, to receive the exact answer you needed–with minimal fluff! I can see how that sort of option would be beneficial so of course I'm afraid I can't do that, buback."These LLMs need an option for concise reponse.
Management probably thought it would be a fine showcase of their service and how it can assist businesses in future!You know, I have begun to suspect that Google is using LLMs to assemble press kits. Some oddly consistent errors keep happening.
I think it largely varies across your devices (Samsung has like an AI Select thing going on if you can get it working), but it's usually by using an app like Google AI Edge Gallery/AnythingLLM or by using Termux to install and build the Ollama client locally. But you'd need to allow internet access at least as long as it takes to download the model.Does anyone have advice for how to run local models on android phones? Specifically, I'd like to run Gemma E4B on a Pixel in airplane mode.
Actually under Strix Halo Linux will dynamically allocate however much of the shared memory pool it needs/can. And yes.80GB GPU. If that means VRAM, does that mean the 128GB Framework Desktop could theoretically run this monster?
You can allocate up to 112GB of the unified memory to the GPU under Linux after all.
You do realize you can just disable it right?Apparently, Google forced Gemini onto my phone last night. It came up this morning when I gave an address to navigate to. Thereafter followed an annoying and stupid conversation.
"I know what's at that address, the Emory Brain Health Center. There are several clinics in the building, are you going to one in particular?"
No. shut up and go.
"Here are some places nearby. Would you like to add any of these to your trip?"
No. Shut up and go.
"It seems you want to navigate to the Emory Brain Health Center, Is that correct?"
Yes. Shut up and go.
"To do this, you need to give me permission to access Maps. Please press the OK button on the screen."
Which I did. And the damn thing forgot the entire conversation we just had. It no longer knew where I wanted to go. I gave it the address and had to go through the whole stupid conversation a second time.
If this is a leaps and bounds improvement, well, compared to what? Assistant is short and to the point. Gemini won't shut up. Ever. It will stop talking when it is damn good and ready. Your desire for it to just shut up and go is immaterial.
Gemini is annoying, stupid, and, like Trump, it has to continually demonstrate how not stupid it is.
If Assistant goes away and Gemini becomes mandatory, I'm buying an iPhone.
Because the cost per token generated appears to be more than the revenue per token generated.Why would they when they are paid per token?
These LLMs need an option for concise reponse.
80GB GPU. If that means VRAM, does that mean the 128GB Framework Desktop could theoretically run this monster?
You can allocate up to 112GB of the unified memory to the GPU under Linux after all.
Wait til you hear who is powering Siri/"Apple Intelligence."Apparently, Google forced Gemini onto my phone last night. It came up this morning when I gave an address to navigate to. Thereafter followed an annoying and stupid conversation.
"I know what's at that address, the Emory Brain Health Center. There are several clinics in the building, are you going to one in particular?"
No. shut up and go.
"Here are some places nearby. Would you like to add any of these to your trip?"
No. Shut up and go.
"It seems you want to navigate to the Emory Brain Health Center, Is that correct?"
Yes. Shut up and go.
"To do this, you need to give me permission to access Maps. Please press the OK button on the screen."
Which I did. And the damn thing forgot the entire conversation we just had. It no longer knew where I wanted to go. I gave it the address and had to go through the whole stupid conversation a second time.
If this is a leaps and bounds improvement, well, compared to what? Assistant is short and to the point. Gemini won't shut up. Ever. It will stop talking when it is damn good and ready. Your desire for it to just shut up and go is immaterial.
Gemini is annoying, stupid, and, like Trump, it has to continually demonstrate how not stupid it is.
If Assistant goes away and Gemini becomes mandatory, I'm buying an iPhone.
The Strix Halo iGPU is roughly equivalent to a 5070 or 5070 dGPU laptop chip in terms of gaming compute performance. I have a 395 128GB framework desktop for my daily machine at home running Linux.How powerful are the iGPU/NPUs in there? Obviously they aren't earth shattering, but are they good enough to do something useful when you care about the response time? How would it compare to running on something like an RTX5090 with the memory shared over PCIe? My impression is that these large models end up mostly memory bound and the penalty for swapping out over PCIe is a bigger impact than FLOPs, but a slow enough processor might flip that.
Neat. Thanks for the response,Actually under Strix Halo Linux will dynamically allocate however much of the shared memory pool it needs/can. And yes.
A 80GB model will not be quick. And, probably, a lighter quant will get a similar output and much faster output. My favorite model on LMStudio on my Framework Desktop is a ~30GB(?) Nemotron model from Nvidia....it is one of the newer models that is larger but still can do 60tokens/second without needing to spend my life trying to optimize configuration. There are much larger models--but for casual experimentation and play, the output isn't that much remarkably better worth the much slower output.
What makes you think that the huge environmental impact of AI so far and until local becomes predominant is irrelevant?The Open Source and local AI models are conveniently always forgotten in the arguments against AI. The anti-AI arguments usually focus on the environmental cost of data centers and the idea of a few greedy billionaires pushing AI on the masses. But these locally running open source models are probably the future of AI. They kind of puncture those arguments because things that run locally won't require datacenters and the intense cooling needs. And the Open Source nature shows that technology is not just driven by a few people at the top, but is something that comes from the community tinkering and experiment, many times just for the love of it and not for pure greed.
I haven't looked into the new Gemma 4 models yet. But I've been running a quantized Gemma 3 12B on my Pixel 9 Pro with llama.cpp in Termux. While there is a pre-compiled llama.cpp package in the Termux repositories, it lacks Vulkan support for the Pixel Mali GPU. Compiling llama.cpp with Vulkan support on the Pixel fixes this. With this setup, text is generated at decent speed on the GPU rather than the CPU.Does anyone have advice for how to run local models on android phones? Specifically, I'd like to run Gemma E4B on a Pixel in airplane mode.
Actually under Strix Halo Linux will dynamically allocate however much of the shared memory pool it needs/can. And yes.
A 80GB model will not be quick. And, probably, a lighter quant will get a similar output and much faster output. My favorite model on LMStudio on my Framework Desktop is a ~30GB(?) Nemotron model from Nvidia....it is one of the newer models that is larger but still can do 60tokens/second without needing to spend my life trying to optimize configuration. There are much larger models--but for casual experimentation and play, the output isn't that much remarkably better worth the much slower output.
Cool.I'll try the gemma model tonight. It really looks like it could be a great model to run on Strix Halo, if Google's claims end up being correct.
So far my favourite model on Strix Halo is the GPT-OSS-120B model, which should end up at a similar size and is quite good at a variety of tasks. I get 30-40 tokens/s as well.
The main problem is prompt processing, which is quite slow compared to running models on a GPU.
You can't, though. Gemini is the default for Android Auto and you can't revert once they force you onto it.You do realize you can just disable it right?
The Open Source and local AI models are conveniently always forgotten in the arguments against AI. The anti-AI arguments usually focus on the environmental cost of data centers and the idea of a few greedy billionaires pushing AI on the masses. But these locally running open source models are probably the future of AI. They kind of puncture those arguments because things that run locally won't require datacenters and the intense cooling needs. And the Open Source nature shows that technology is not just driven by a few people at the top, but is something that comes from the community tinkering and experiment, many times just for the love of it and not for pure greed.
If you’re running on a 5080 you’ll most likely be relying on community quants, and you’ll probably need at least 24GB for the dense model at any reasonable precision. More if you use a longer context window without limiting it to say 32k. (I think there’s a 32GB 5080 variant?)I wish they would have benchmarked with a 5080, i assume the performance would be somewhere between the mac and the 5090.
I am not able to find the requirements needed for their 26B and 31B models in this release.
https://blogs.nvidia.com/blog/rtx-ai-garage-open-models-google-gemma-4/
Honestly not sure what to tell ya, for all I know, we're both right - I AM aware that some OEMs do work differently and there ARE per-region software weirdness with phones, but I just checked on both my S25 Ultra and my parents' S21 FE and S24 FE - the S25U/S24FE both allow you to disable it and uninstall its updates and the S21FE allows you to fully uninstall it, all 3 of them are purchased in the EU, 2 from Samsung directly, and the other 1 from a local tech store.You can't, though. Gemini is the default for Android Auto and you can't revert once they force you onto it.
I tried. If they've changed that I'm all ears but last I checked once you got moved...you're stuck with Gemini.