Qwen3-30B-A3B-Instruct-2507 device-optimized quant variants without output quality falling off a cliff.
A 30B runs on a Raspberry Pi 5 (16GB) achieving 8.03 TPS at 2.70 BPW, while retaining 94.18% of BF16 quality. ShapeLearn tends to find better TPS/quality tradeoffs versus alternatives.
What’s new/interesting in this one
- CPU behavior is mostly sane
On CPUs, once you’re past “it fits,” smaller tends to be faster in a fairly monotonic way. The tradeoff curve behaves like you’d expect.
- GPU behavior is quirky
On GPUs, performance depends as much on kernel choice as on memory footprint. So you often get sweet spots (especially around ~4b) where the kernels are “golden path,” and pushing lower-bit can get weird.
A 30B Qwen Model Walks Into a Raspberry Pi… and Runs in Real Time
Submitted 1 day ago by cm0002@lemdro.id to technology@lemmy.zip
https://byteshape.com/blogs/Qwen3-30B-A3B-Instruct-2507/