Namiru.ai
    few clicks agent

    GLM-5: China's Open-Source Giant That Rivals Claude and GPT

    10 min read
    Ing. Patrik Kelemen
    GLM-5: China's Open-Source Giant That Rivals Claude and GPT

    Zhipu AI's GLM-5 comes with 744 billion parameters, ships under the MIT license, and benchmarks within striking distance of Claude Opus 4.5 and GPT-5.2. Trained entirely on Huawei chips and priced at roughly 6x less than its proprietary rivals, it's one of the strongest open-source models available today.

    On February 11, 2026, Chinese AI lab Zhipu AI (now rebranded as Z.ai) released GLM-5, a new open-source large language model that competes directly with Claude Opus 4.5, GPT-5.2, and Gemini 3 Pro on coding, reasoning, and agentic benchmarks.

    The market reacted strongly. Zhipu's Hong Kong shares surged 28.7% on the day of release. Interestingly, before the official announcement, a mysterious model called "Pony Alpha" had already been posting top scores on OpenRouter. It turned out to be GLM-5 running under a different name.

    Here's what's inside and how it compares.


    What is GLM-5?

    GLM-5 is the fifth-generation large language model from Zhipu AI, a company that spun out of Tsinghua University in 2019 and completed a Hong Kong IPO in January 2026, raising approximately $558 million.

    The model is built on a Mixture-of-Experts (MoE) architecture with 744 billion total parameters, of which only 40 billion are active per inference. This is nearly double the size of its predecessor GLM-4.5, which had 355 billion parameters. Pre-training data also jumped from 23 trillion to 28.5 trillion tokens.

    GLM-5 integrates DeepSeek Sparse Attention (DSA), a technique originally developed by DeepSeek, to reduce deployment costs while preserving long-context capacity up to 200K tokens.

    GLM-5 at a Glance

    SpecificationDetails
    Total Parameters744B
    Active Parameters (per token)40B
    ArchitectureMixture-of-Experts (MoE), 256 experts, 8 active per token
    Pre-training Data28.5 trillion tokens
    Context Window200K tokens
    Attention MechanismDeepSeek Sparse Attention (DSA)
    LicenseMIT
    Training HardwareHuawei Ascend chips (fully domestic)
    AvailabilityHuggingFace, Z.ai API, OpenRouter

    GLM-5 was trained entirely on Huawei Ascend chips using the MindSpore framework, achieving complete independence from US-manufactured hardware. Given the current US export restrictions on advanced AI chips, this is a significant strategic milestone for China's AI ecosystem.


    Benchmark Performance: How Does GLM-5 Stack Up?

    Zhipu AI positions GLM-5 as the most capable open-source model available, and the benchmark numbers largely back that up. Here's how it compares against the current frontier models.

    Coding & Engineering

    BenchmarkGLM-5Claude Opus 4.5GPT-5.2Gemini 3 ProDeepSeek-V3.2Kimi K2.5
    SWE-bench Verified77.8%80.9%80.0%76.2%73.1%76.8%
    SWE-bench Multilingual73.3%77.5%72.0%65.0%70.2%73.0%
    Terminal-Bench 2.056.259.354.054.239.350.8

    Claude Opus 4.5 still leads in coding, but GLM-5 is not far behind, and it's open-source and free.

    Reasoning

    BenchmarkGLM-5Claude Opus 4.5GPT-5.2Gemini 3 ProDeepSeek-V3.2Kimi K2.5
    HLE (Humanity's Last Exam)30.528.435.437.225.131.5
    HLE w/ Tools50.443.445.545.840.851.8
    AIME 2026 I92.793.3-90.692.792.5
    GPQA-Diamond86.087.092.491.982.487.6

    GLM-5 outperforms Claude Opus 4.5 on Humanity's Last Exam (both text-only and with tools) and holds its own against GPT-5.2 and Gemini 3 Pro on math-heavy benchmarks.

    Agentic Tasks

    BenchmarkGLM-5Claude Opus 4.5GPT-5.2Gemini 3 ProDeepSeek-V3.2Kimi K2.5
    BrowseComp (w/ Context)75.967.865.859.267.674.9
    τ²-Bench89.791.685.590.785.380.2
    MCP-Atlas67.865.268.066.662.263.8
    Vending Bench 2$4,432$4,967$3,591$5,478$1,034$1,198

    The agentic benchmarks are worth a closer look. On BrowseComp (agent-based web search and context management), GLM-5 outperforms every model in the comparison, including the proprietary ones. Vending Bench 2, where a model runs a simulated vending machine business for 365 days, shows GLM-5 close behind Claude and Gemini, and well ahead of GPT-5.2 and DeepSeek.


    Hallucination: A Record-Low Rate

    GLM-5 scored -1 on the Artificial Analysis AA-Omniscience Index, which represents a 35-point improvement over its predecessor. This makes it the top-performing model when it comes to recognizing the limits of its own knowledge and abstaining from generating false information. It currently leads all tested models from OpenAI, Anthropic, and Google in this category.

    For enterprise use cases where accuracy matters more than creativity, this is a meaningful advantage.


    Pricing: 6x Cheaper Than Claude Opus

    GLM-5 is available on OpenRouter and Z.ai's API at competitive pricing:

    ModelInput (per 1M tokens)Output (per 1M tokens)
    GLM-5~$0.80~$2.56
    GPT-5.2$2.50$10.00
    Claude Opus 4.6$5.00$25.00

    That's approximately 6x cheaper on input and nearly 10x cheaper on output compared to Claude Opus 4.6. For teams running high-volume inference, the cost savings add up quickly.


    Document Generation: Beyond Chat

    GLM-5 introduces native "Agent Mode" capabilities that go beyond traditional chat. Instead of just returning text, the model can take a prompt and produce a finished .docx, .pdf, or .xlsx file directly. The Z.ai platform (chat.z.ai) has this mode built in.

    In practice, this means you can describe what you need, for example "create a quarterly sales report with revenue breakdown by region", and GLM-5 will output a formatted document rather than raw text you'd have to copy-paste into Word yourself. The same applies to spreadsheets with formulas and structured PDFs.

    The model is also compatible with popular coding agents like Claude Code, OpenCode, and Roo Code, as well as OpenClaw, a framework for cross-app and cross-device agentic workflows. This makes it possible to plug GLM-5 into existing development pipelines without building custom integrations from scratch.


    Serve GLM-5 Locally

    One of the biggest advantages of an MIT-licensed model: you can run it yourself. GLM-5 supports deployment via vLLM, SGLang, and xLLM.

    Using vLLM (Docker)

    bash
    docker pull vllm/vllm-openai:nightly
    

    Or via pip:

    bash
    pip install -U vllm --pre --index-url https://pypi.org/simple --extra-index-url https://wheels.vllm.ai/nightly
    pip install git+https://github.com/huggingface/transformers.git
    

    Deploy

    bash
    vllm serve zai-org/GLM-5-FP8 \
         --tensor-parallel-size 8 \
         --gpu-memory-utilization 0.95
    

    Using SGLang (Docker)

    bash
    # For Hopper GPUs
    docker pull lmsysorg/sglang:glm5-hopper
    
    # For Blackwell GPUs
    docker pull lmsysorg/sglang:glm5-blackwell
    

    The FP8 quantized version is recommended for production deployments, balancing performance with memory efficiency.

    Model weights are available on HuggingFace: zai-org/GLM-5


    China's Open-Source AI Wave

    GLM-5 isn't happening in isolation. It's part of an accelerating wave of Chinese open-source AI releases. According to a Stanford study, Chinese AI models have historically lagged about seven months behind their US counterparts. GLM-5 arrived only about three months after the latest releases from Anthropic, Google, and OpenAI, effectively cutting that delay in half.

    The competition within China's AI landscape is also intensifying. Moonshot AI's Kimi K2.5 takes a different architectural approach, using swarms of agents working in parallel. Meanwhile, DeepSeek-V3.2, despite the massive attention it received in early 2025, now trails both GLM-5 and Kimi K2.5 on multiple benchmarks.

    For developers and enterprises, this translates to more options, lower prices, and growing pressure on proprietary models to justify their premium pricing.


    Should You Try GLM-5?

    GLM-5 makes the most sense if you need a capable model but can't or don't want to rely on proprietary APIs. A few scenarios where it stands out:

    • Self-hosted inference with data sovereignty. If you're operating under GDPR or similar regulations and need full control over where your data goes, GLM-5 with an MIT license and local deployment is one of the strongest options available right now.
    • High-volume workloads on a budget. At ~$0.80/1M input tokens, teams running thousands of daily requests can cut costs significantly compared to Claude or GPT without a major drop in quality.
    • Coding and agentic tasks. GLM-5's benchmark scores on SWE-bench and BrowseComp put it in the same league as proprietary models, making it a viable backbone for AI-powered development tools or autonomous agents.

    The usual caveats apply: benchmark scores don't always translate to real-world usability, and open-source models sometimes underperform proprietary alternatives in everyday use. But the gap is narrowing, and GLM-5 raises the bar for what's available as a free, open model.

    Try it at chat.z.ai or grab the weights from HuggingFace.


    Sources:

    Built by Namiru.ai - smart AI agents that know your data.

    Patrik Kelemen
    Author
    Ing. Patrik Kelemen
    Founder of Namiru.aiSlovakia, EU

    Senior software engineer with 10+ years of experience, specializing in AI agents and automation. Building Namiru.ai to help businesses leverage AI without complexity.

    AI AgentsAngularReactNodeJSAWSAzure
    Enjoyed this article?

    Launch AI chat in minutes, not weeks.

    Connect files, Google Drive, webhooks (GET/POST), and tools. Try it live, then deploy in your app or share via link.

    Try the live demo

    Namiru.ai

    Build AI agents in clicks