Twitter/X

@ivanfioravanti: GLM-5.2 8bit running on two M3 Ultra 512GB with MLX distributed? Here it is! 🚀 Decode speed: 17.9 t...

GLM-5.2 8bit running on two M3 Ultra 512GB with MLX distributed? Here it is! 🚀

Decode speed: 17.9 tokens/sec 🔥
Memory used: ~ 760GB 👀

Again keep in mind it's a preliminary PR by super @pcuenq still a WIP!

Video