Twitter/X

@ivanfioravanti: MLX GLM 5.2 Distributed on two M3 Ultra 512GB 🔥 One M3 Ultra: 18.8 tokens/sec Two M3 Ultra: 23.4 to...

MLX GLM 5.2 Distributed on two M3 Ultra 512GB 🔥

One M3 Ultra: 18.8 tokens/sec
Two M3 Ultra: 23.4 tokens/sec

Context:
- PR by @pcuenq is still open and probably there is room for improvement: github.com/ml-explore/mlx-lm…
- basic generation test to measure decoding performance here, I will do a full context benchmarking once PR is more mature
- nvfp4 quantization used
- Video alternates standard speed and x20, with one Mac first and distributed later.

Enjoy! 🙌🏻

Video