By Thorsten Meyer — May 2026
DeepSeek V4-Pro is the largest open-weight model ever released. One trillion parameters. One million tokens of context. Free to download.
One week earlier, Alibaba shipped Qwen 3.6-35B-A3B. Earlier in April, Meta dropped Llama 4 (Scout + Maverick), Mistral released Small 4, Google released Gemma 4, and Zhipu AI open-sourced GLM-5.1.
Six labs. Five competitive open-weight families. One quarter.
The benchmark gap between the best open and the best closed model is now in the single digits on every evaluation enterprises actually pay for.
If you were planning a 2026 AI budget around proprietary API pricing, that budget is wrong.
Executive Summary
| Open-Weight Model | Lab | Released | Differentiator |
|---|---|---|---|
| DeepSeek V4-Pro | DeepSeek (CN) | 2026-04-23 | ~1T MoE, 1M context, multimodal |
| DeepSeek V4-Flash | DeepSeek (CN) | 2026-04-23 | Cheaper inference variant |
| Qwen 3.6-35B-A3B | Alibaba | 2026-04-16 | Smaller, fast, MoE |
| Llama 4 Scout | Meta | 2026-04 | 109B / 17B active, 16 experts |
| Llama 4 Maverick | Meta | 2026-04 | 400B raw capability |
| Gemma 4 | 2026-04 | Open, on-device-friendly | |
| Mistral Small 4 | Mistral | 2026-04 | Apache-2 license |
| GLM-5.1 | Zhipu AI (CN) | 2026-04 | Open weights |
The list is not exhaustive. It is what shipped in a single month.

The NVIDIA Rubin CPX GPU Architecture: Transforming AI Inference Infrastructure for High-Performance Computing and Generative Applications
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
1. The Closed-Model Premium Was Just Re-Priced
For three years, “frontier model” meant “API model.” Closed weights, paid per token, accessible only through the lab that built it. Enterprises paid the premium because the alternative — open models — were measurably worse.
The April 2026 benchmark numbers no longer say that.
| Eval Category | Closed Frontier (Mar 2026) | Best Open Weight (Apr 2026) | Gap |
|---|---|---|---|
| Reasoning (MATH, GSM8K) | 95.1 | 92.4 | 2.7 pts |
| Code (HumanEval, MBPP) | 94.8 | 91.2 | 3.6 pts |
| Long-context retrieval (128K+) | 89.3 | 87.8 | 1.5 pts |
| Multimodal (MMMU) | 76.4 | 71.1 | 5.3 pts |
| Tool use / agentic (TAU-bench) | 82.1 | 77.5 | 4.6 pts |
A 3-point gap on a benchmark does not justify a 30× pricing differential at the API.
For a CTO running a customer-support agent at scale, the math now reads: spend €10K to host an open model on your own GPUs and pay €0/token forever, or pay €30K/month to a frontier lab in perpetuity. The crossover used to be three years. It is now three months.

Fine Tuning LLM Practical Implementation and Adaptation: Domain Specific Model Training, Optimization Strategies, and Responsible Deployment (The Applied Agentic AI Engineering Series)
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
2. What This Means for the Frankenstein Thesis
In February 2026, this site published Rent-and-Distill — the playbook by which a Chinese cohort siphoned reasoning traces from closed Western models, ran fine-tuning on rented U.S. compute, and shipped open-weight Frankenstein models at €10–20M per launch.
The April releases are the empirical proof.
DeepSeek V4 was not built by a lab with thousands of PhDs. It was built by a lab with engineering discipline, access to open base weights, and a distillation pipeline. The gap to a model built by Anthropic with thousands of PhDs is single digits.
The deeper reading: distillation is not just theoretically effective. It is now demonstrably scalable to the frontier.
“The moat is not the weights. The moat is whatever you refuse to show.” That was the closing line of Rent-and-Distill in February. Six weeks later, the open-weight benchmark gap closed by another two points.

HIWONDER Robot Car with ChatGPT Large AI Models, 3D Depth Camera Mecanum-Wheel Chassis ROS2-HUMBLE Lidar SLAM Mapping Navigation Autonomous Driving, MentorPi M1 Standard Kit Without Raspberry Pi
For Raspberry Pi 5 & ROS2 Robot Car. MentorPi M1 smart AI robot car kit is powered by…
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
3. The Three Strategic Shifts This Forces
Shift 1: Inference economics flip. When a 70B-class model runs on a single H200 node at $4/hour, per-token cost drops below any API. Every token-heavy workflow — call summarization, document extraction, code review, ticket triage — has different unit economics in May than it did in March.
Shift 2: Model selection becomes a portfolio question. No serious enterprise will run on one model. Closed APIs for the hardest 5% of queries. Open weights for the 95% the open models now handle as well as closed ones did six months ago. Routing logic — not model quality — becomes the new differentiator.
Shift 3: Sovereignty and licensing matter again. Llama 4’s license still excludes companies above a certain MAU threshold. DeepSeek V4 is unrestricted but Chinese-origin. Mistral Small 4 is Apache-2. The license is now a procurement criterion that matters as much as the benchmark.

Cloud Ninjas AI Workstation Designed for Avid Media Composer Ryzen Threadripper 9960X 4.2GHz 24 Core Geforce RTX PRO 4000 Blackwell SFF 24GB 128GB ECC Reg DDR5 1TB 2TB M.2 NVMe 1600W PSU
Ryzen Threadripper 9960X 4.2GHz (Up To 5.4GHz Turbo) 24 Core
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
4. What Closed Frontier Labs Will Do Next
Three predictions for the next two quarters.
Prediction 1: The closed labs raise the bar. Expect GPT-6 / Claude 5 / Gemini 3 in summer 2026, with capability gaps re-opened to double digits — for six months. Then the open weights catch up again. This is now the rhythm.
Prediction 2: The closed labs move up the stack. API models are commoditizing. The defensible product is the agent platform — long memory, tool integration, organizational context. Watch Anthropic, OpenAI, and Google ship platform offerings that make the underlying model less important. Google already did, on 2026-04-22, with the $750M Gemini Enterprise Agent Platform launch.
Prediction 3: The closed labs lobby for compute restrictions on open-weight training. The Remote Access Security Act blocked one cloud loophole. Expect the next regulatory front to be FLOP thresholds for open releases — which only the closed labs would benefit from.
5. The Quiet Winner
While the open-weight race accelerates, one player wins quietly: NVIDIA.
A 1T-parameter open model needs hardware to run. Self-hosted inference at enterprise scale means H200s, B200s, and the entire datacenter retrofit. The same Chinese labs that built Frankenstein models on rented U.S. compute are now selling the inference dependency to every Western enterprise that downloads the weights.
This is the second loop the regulators did not anticipate. RASA (January 2026) closed the training loophole. The April releases just reopened the inference one — except this time, NVIDIA is the beneficiary, not the threat.
What Leaders Should Do This Quarter
1. If you spend more than €1M/yr on closed APIs: run a hosted open-weight pilot on the next refresh. The crossover math is real.
2. If you sell an AI product: assume your moat is not your model. Build the data, the workflow, and the trust layer. The weights underneath you will commoditize.
3. If you set procurement policy: treat license terms (MAU caps, country-of-origin, redistribution rights) as a first-class procurement criterion, not a footnote.
4. If you set national policy: RASA needs a sequel. The next loophole is in the inference layer, not the training one.
The Strategic Read
April 2026 was the month the open-weight curve crossed the closed-weight curve on the metrics that matter to enterprises. It will be remembered as the inflection point.
The closed labs are not finished. They will pull ahead again, briefly, with the next release. But the structural fact is now established: every frontier capability shipped by a closed lab has an 18-month half-life before it is replicated in open weights.
The strategic question for any enterprise is no longer “which closed API do we sign?” It is “what part of our stack would we still pay for if the model underneath was free?”
The benchmark gap is in the single digits. The pricing gap is not. That is the arbitrage.
About the Author
Thorsten Meyer is a Munich-based futurist, post-labor economist, and recipient of OpenAI’s 10 Billion Token Award. He spent two decades managing €1B+ portfolios in enterprise ICT before deciding that writing about the transition was more useful than managing quarterly slides through it. More at ThorstenMeyerAI.com.
Related Dispatches
- Your AI Vendor’s AI Vendor — File 0426 — agent supply chain compromise (Vercel × Context AI)
- This file — File 0427 — the April 2026 open-weight inflection
- AI-Washed — File 0428 — the 47.9% / 9% layoff narrative gap
- The 27% Problem — File 0429 — Anthropic’s enterprise lead and Google’s $750M check
- The Bubble Is Not in Valuations — File 0430 — the productivity gap
- The Agent Trap — File 0431 — why 90% of AI “launches” are infrastructure liars
Sources
- llm-stats, AI Updates Today: Latest AI Model Releases (2026-04)
- DeepSeek, V4 Technical Report (2026-04-23)
- Alibaba Qwen Team, Qwen 3.6-35B-A3B Release Notes (2026-04-16)
- Meta, Llama 4 Model Card: Scout & Maverick (2026-04)
- Sebastian Raschka, A Dream of Spring for Open-Weight LLMs (2026-02)
- Lushbinary, Best Open-Source LLMs April 2026 (2026-04)
- BuildFastWithAI, Best AI Models April 2026: Ranked by Benchmarks (2026-04)
- Renovate, Chinese AI Models in April 2026: DeepSeek V4, Qwen 3.5, Kimi K2.5, GLM-5 (2026-04)