Show HN: Running Gemma-4 26B at 124 tokens/SEC on a CPU, no GPU
By arun-prasath · 2026-06-30 · 6 points · 0 comments
https://apeg.dev/writing/running-gemma4-26b-on-a-cpu/
I wanted to know how fast a 26B mixture-of-experts model could run on a desktop CPU with no GPU. Got ~40 tok/s single-stream (lossless) and ~124 batched. The surprising part was the byte budget: for this model you compress the output head (32% of per-token bytes), not the e…
Open the full discussion on BetterNews