I patched llama.cpp to gain 20% prompt processing TPS. Help me make a PR
By i_am_rocoe · 2026-06-27 · 4 points · 2 comments
I've been running Qwen3.6-35B-A3B locally on llama.cpp and noticed that prompt processing throughput gets too low with MTP. I got nerd-sniped. What started as curiosity turned into a two-week rabbit hole of experiments and ended with a PoC that fully recovers the MTP PP overhead…
Open the full discussion on BetterNews