GLM-5.2 8bit running on two M3 Ultra 512GB with MLX distributed? Here it is! 🚀
Decode speed: 17.9 tokens/sec 🔥
Memory used: ~ 760GB 👀
Again keep in mind it's a preliminary PR by super @pcuenq still a WIP!
Video
GLM-5.2 8bit running on two M3 Ultra 512GB with MLX distributed? Here it is! 🚀
Decode speed: 17.9 tokens/sec 🔥
Memory used: ~ 760GB 👀
Again keep in mind it's a preliminary PR by super @pcuenq still a WIP!
Video