By Thorsten Meyer — April 2026
DeepSeek V4-Pro is the largest open-weight model ever released. One trillion parameters. One million tokens of context. Free to download.
Single digits.
The April that closed the open-weight gap.
DeepSeek V4-Pro is the largest open-weight model ever released. One trillion parameters. One million tokens of context. Free to download. Six labs shipped competitive open-weight families in a single quarter. The benchmark gap to the best closed model is now in the single digits on every evaluation enterprises actually pay for.
Eight open-weight releases. Thirty days.
The list below is not exhaustive. It is what shipped in a single month — across six labs, three jurisdictions, and the full size spectrum from on-device to frontier.
~1T parameters MoE · 1M token context · multimodal · the largest open-weight release to date.
Cheaper inference variant — same family, optimized for cost-per-token at scale.
Smaller, fast MoE — 35B nominal, 3B active. Built for speed-per-dollar workloads.
109B / 17B active, 16 experts. The on-prem workhorse for regulated industries.
400B raw capability — Meta’s frontier-class swing at the closed labs.
Open weights tuned for on-device and edge inference. The mobile-first answer.
Apache-2 licensed. The cleanest license terms in the cohort — a procurement gift.
Open weights from the third major Chinese lab to ship this month. The pattern, not the outlier.

HIWONDER Robot Car with ChatGPT Large AI Models, 3D Depth Camera Ackermann Chassis ROS2-HUMBLE Lidar SLAM Mapping Navigation Autonomous Driving, MentorPi A1 Standard Kit Without Raspberry Pi
For Raspberry Pi 5 & ROS2 Robot Car. MentorPi A1 smart AI robot car is powered by Raspberry…
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Two trajectories. One inflection point.
For three years, “frontier model” meant “API model.” In April 2026 that sentence stopped being true.

Mastering self-hosting: The Local AI Lab: Running Local Llms, N8n Automation, And Private AI Agents On Your Own Hardware with Zero Api Credit Cost
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
A 3-point gap does not justify a 30× pricing differential.
Closed frontier (March 2026) versus the best open-weight model (April 2026), across the five evaluation categories enterprises actually buy against.

AI Systems Performance Engineering: Optimizing Model Training and Inference Workloads with GPUs, CUDA, and PyTorch
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
The arbitrage is not subtle.
Per-token pricing on a customer-support agent at scale. Every successful workflow is a permanent line item.
Run a 70B-class open model on a single H200 node at $4 / hour. Per-token cost falls below any API.

Mental Models: An AI’s Guide to 100 Thinking Tools That Humans Overlook (So You Can Outsmart Anyone) (Think Smarter)
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
The curve crossed. So does the budget.
Inference flips.
When a 70B-class model runs on a single H200 node, every token-heavy workflow — summarization, extraction, code review, ticket triage — has different unit economics in May than it did in March.
Models become a portfolio.
Closed APIs for the hardest 5%. Open weights for the 95% that open models now handle as well as closed ones did six months ago. Routing logic is the new differentiator.
Sovereignty matters again.
Llama 4 has a MAU cap. DeepSeek V4 is unrestricted but Chinese-origin. Mistral Small 4 is Apache-2. The license is a first-class procurement criterion now, not a footnote.
The strategic question is no longer which closed API do we sign — it is what part of our stack would we still pay for if the model underneath was free.
Three predictions for the next two quarters.
The bar gets raised — for six months.
Expect GPT-6, Claude 5, Gemini 3 in summer 2026, with capability gaps re-opened to double digits. Then open weights catch up again. This is now the rhythm: an 18-month half-life on every frontier capability.
The labs move up the stack.
API models commoditize; the defensible product is the agent platform — long memory, tool integration, organizational context. Google already moved on 2026-04-22 with the $750M Gemini Enterprise Agent Platform launch.
Lobbying turns to inference.
RASA blocked the training loophole. Expect the next regulatory front to be FLOP thresholds for open releases — which only the closed labs would benefit from.
One week earlier, Alibaba shipped Qwen 3.6-35B-A3B. Earlier in April, Meta dropped Llama 4 (Scout + Maverick), Mistral released Small 4, Google released Gemma 4, and Zhipu AI open-sourced GLM-5.1.
Six labs. Five competitive open-weight families. One quarter.

The benchmark gap between the best open and the best closed model is now in the single digits on every evaluation enterprises actually pay for.
If you were planning a 2026 AI budget around proprietary API pricing, that budget is wrong.
Executive Summary
| Open-Weight Model | Lab | Released | Differentiator |
|---|---|---|---|
| DeepSeek V4-Pro | DeepSeek (CN) | 2026-04-23 | ~1T MoE, 1M context, multimodal |
| DeepSeek V4-Flash | DeepSeek (CN) | 2026-04-23 | Cheaper inference variant |
| Qwen 3.6-35B-A3B | Alibaba | 2026-04-16 | Smaller, fast, MoE |
| Llama 4 Scout | Meta | 2026-04 | 109B / 17B active, 16 experts |
| Llama 4 Maverick | Meta | 2026-04 | 400B raw capability |
| Gemma 4 | 2026-04 | Open, on-device-friendly | |
| Mistral Small 4 | Mistral | 2026-04 | Apache-2 license |
| GLM-5.1 | Zhipu AI (CN) | 2026-04 | Open weights |
The list is not exhaustive. It is what shipped in a single month.

1. The Closed-Model Premium Was Just Re-Priced
For three years, “frontier model” meant “API model.” Closed weights, paid per token, accessible only through the lab that built it. Enterprises paid the premium because the alternative — open models — were measurably worse.
The April 2026 benchmark numbers no longer say that.
| Eval Category | Closed Frontier (Mar 2026) | Best Open Weight (Apr 2026) | Gap |
|---|---|---|---|
| Reasoning (MATH, GSM8K) | 95.1 | 92.4 | 2.7 pts |
| Code (HumanEval, MBPP) | 94.8 | 91.2 | 3.6 pts |
| Long-context retrieval (128K+) | 89.3 | 87.8 | 1.5 pts |
| Multimodal (MMMU) | 76.4 | 71.1 | 5.3 pts |
| Tool use / agentic (TAU-bench) | 82.1 | 77.5 | 4.6 pts |
A 3-point gap on a benchmark does not justify a 30× pricing differential at the API.
For a CTO running a customer-support agent at scale, the math now reads: spend €10K to host an open model on your own GPUs and pay €0/token forever, or pay €30K/month to a frontier lab in perpetuity. The crossover used to be three years. It is now three months.
2. What This Means for the Frankenstein Thesis
In February 2026, this site published Rent-and-Distill — the playbook by which a Chinese cohort siphoned reasoning traces from closed Western models, ran fine-tuning on rented U.S. compute, and shipped open-weight Frankenstein models at €10–20M per launch.
The April releases are the empirical proof.
DeepSeek V4 was not built by a lab with thousands of PhDs. It was built by a lab with engineering discipline, access to open base weights, and a distillation pipeline. The gap to a model built by Anthropic with thousands of PhDs is single digits.
The deeper reading: distillation is not just theoretically effective. It is now demonstrably scalable to the frontier.
“The moat is not the weights. The moat is whatever you refuse to show.” That was the closing line of Rent-and-Distill in February. Six weeks later, the open-weight benchmark gap closed by another two points.
3. The Three Strategic Shifts This Forces
Shift 1: Inference economics flip. When a 70B-class model runs on a single H200 node at $4/hour, per-token cost drops below any API. Every token-heavy workflow — call summarization, document extraction, code review, ticket triage — has different unit economics in May than it did in March.
Shift 2: Model selection becomes a portfolio question. No serious enterprise will run on one model. Closed APIs for the hardest 5% of queries. Open weights for the 95% the open models now handle as well as closed ones did six months ago. Routing logic — not model quality — becomes the new differentiator.
Shift 3: Sovereignty and licensing matter again. Llama 4’s license still excludes companies above a certain MAU threshold. DeepSeek V4 is unrestricted but Chinese-origin. Mistral Small 4 is Apache-2. The license is now a procurement criterion that matters as much as the benchmark.
4. What Closed Frontier Labs Will Do Next
Three predictions for the next two quarters.
Prediction 1: The closed labs raise the bar. Expect GPT-6 / Claude 5 / Gemini 3 in summer 2026, with capability gaps re-opened to double digits — for six months. Then the open weights catch up again. This is now the rhythm.
Prediction 2: The closed labs move up the stack. API models are commoditizing. The defensible product is the agent platform — long memory, tool integration, organizational context. Watch Anthropic, OpenAI, and Google ship platform offerings that make the underlying model less important. Google already did, on 2026-04-22, with the $750M Gemini Enterprise Agent Platform launch.
Prediction 3: The closed labs lobby for compute restrictions on open-weight training. The Remote Access Security Act blocked one cloud loophole. Expect the next regulatory front to be FLOP thresholds for open releases — which only the closed labs would benefit from.
5. The Quiet Winner
While the open-weight race accelerates, one player wins quietly: NVIDIA.
A 1T-parameter open model needs hardware to run. Self-hosted inference at enterprise scale means H200s, B200s, and the entire datacenter retrofit. The same Chinese labs that built Frankenstein models on rented U.S. compute are now selling the inference dependency to every Western enterprise that downloads the weights.
This is the second loop the regulators did not anticipate. RASA (January 2026) closed the training loophole. The April releases just reopened the inference one — except this time, NVIDIA is the beneficiary, not the threat.
What Leaders Should Do This Quarter
1. If you spend more than €1M/yr on closed APIs: run a hosted open-weight pilot on the next refresh. The crossover math is real.
2. If you sell an AI product: assume your moat is not your model. Build the data, the workflow, and the trust layer. The weights underneath you will commoditize.
3. If you set procurement policy: treat license terms (MAU caps, country-of-origin, redistribution rights) as a first-class procurement criterion, not a footnote.
4. If you set national policy: RASA needs a sequel. The next loophole is in the inference layer, not the training one.
The Strategic Read
April 2026 was the month the open-weight curve crossed the closed-weight curve on the metrics that matter to enterprises. It will be remembered as the inflection point.
The closed labs are not finished. They will pull ahead again, briefly, with the next release. But the structural fact is now established: every frontier capability shipped by a closed lab has an 18-month half-life before it is replicated in open weights.
The strategic question for any enterprise is no longer “which closed API do we sign?” It is “what part of our stack would we still pay for if the model underneath was free?”
The benchmark gap is in the single digits. The pricing gap is not. That is the arbitrage.
About the Author
Thorsten Meyer is a Munich-based futurist, post-labor economist, and recipient of OpenAI’s 10 Billion Token Award. He spent two decades managing €1B+ portfolios in enterprise ICT before deciding that writing about the transition was more useful than managing quarterly slides through it. More at ThorstenMeyerAI.com.
Sources
- llm-stats, AI Updates Today: Latest AI Model Releases (2026-04)
- DeepSeek, V4 Technical Report (2026-04-23)
- Alibaba Qwen Team, Qwen 3.6-35B-A3B Release Notes (2026-04-16)
- Meta, Llama 4 Model Card: Scout & Maverick (2026-04)
- Sebastian Raschka, A Dream of Spring for Open-Weight LLMs (2026-02)
- Lushbinary, Best Open-Source LLMs April 2026 (2026-04)
- BuildFastWithAI, Best AI Models April 2026: Ranked by Benchmarks (2026-04)
- Renovate, Chinese AI Models in April 2026: DeepSeek V4, Qwen 3.5, Kimi K2.5, GLM-5 (2026-04)