I was curious why MTP affects PP TPS in llama.cpp. My PoC recovers it?
By i_am_rocoe · 2026-06-25 · 2 points · 0 comments
I've been running Qwen3.6-35B-A3B locally on llama.cpp and noticed that prompt processing throughput gets too low with MTP. I got nerd-sniped. I'm not a C++ dev, I know almost nothing about ML, and I'm only scratching the surface of how LLMs work. What started as curiosity turne…
Open the full discussion on BetterNews