@jxmnop: There is a simple reason why Gemini is so much worse than GPT or Claude engineers at OpenAI or Ant ...

There is a simple reason why Gemini is so much worse than GPT or Claude

engineers at OpenAI or Ant can read incoming user queries. all the data is visible

but at Google there are tons of privacy restrictions preventing ppl from looking at data

basically building a model blind

Arena.ai (@arena)

GLM-5.2 (Max) by @Zai_org ranks #10 on the new Agent Arena leaderboard, closely matching Claude-Opus-4.8 (non-thinking) and is the #1 open model by a wide margin!

In Agent Arena, we measure models on millions of real-world, long-horizon agentic tasks from a global community of users. Models can access web search, filesystem, and terminal tools to complete complex workflows. The leaderboard measures model performance on outcomes relative to the average model using a causal tracing methodology.

Compared to 5.1, GLM-5.2 (Max) climbs from #13 to #10. Its clearest gains are confirmed task success, and user praise vs. complaint. Bash capabilities and tool hallucination remain stable. There is a tradeoff in steerability compared to the previous model (-6.0% vs. +1.2%).

GLM-5.2 remains the same price as GLM-5.1, $1.4/$4.4 per input/output MTokens. 1M context window.

Huge congrats @Zai_org for the incredible release!

See thread for details on how GLM-5.2 (Max) performs across 5 different signals.

— https://nitter.net/arena/status/2066943450914943025#m