By Thorsten Meyer — April 2026

DeepSeek V4-Pro is the largest open-weight model ever released. One trillion parameters. One million tokens of context. Free to download.

Single Digits — The April That Closed the Open-Weight Gap

DISPATCH / APRIL 2026 FILE NO. 0427 — OPEN WEIGHTS Q2

Single digits.

The April that closed the open-weight gap.

DeepSeek V4-Pro is the largest open-weight model ever released. One trillion parameters. One million tokens of context. Free to download. Six labs shipped competitive open-weight families in a single quarter. The benchmark gap to the best closed model is now in the single digits on every evaluation enterprises actually pay for.

Thorsten Meyer / ThorstenMeyerAI.com / April 2026

Labs · One Quarter

Five competitive open-weight families

DeepSeek V4-Pro

Largest open-weight ever · 1M context

30×

The Pricing Gap

€10K self-host vs €30K/mo API

● DEEPSEEK V4-PRO · 1T MOE · 04-23 ● QWEN 3.6-35B-A3B · ALIBABA · 04-16 ● LLAMA 4 SCOUT · META · APR ● LLAMA 4 MAVERICK · 400B · APR ● MISTRAL SMALL 4 · APACHE-2 · APR ● GEMMA 4 · GOOGLE · APR ● GLM-5.1 · ZHIPU AI · APR ● DEEPSEEK V4-PRO · 1T MOE · 04-23 ● QWEN 3.6-35B-A3B · ALIBABA · 04-16 ● LLAMA 4 SCOUT · META · APR ● LLAMA 4 MAVERICK · 400B · APR ● MISTRAL SMALL 4 · APACHE-2 · APR ● GEMMA 4 · GOOGLE · APR ● GLM-5.1 · ZHIPU AI · APR

The April drop

Eight open-weight releases. Thirty days.

The list below is not exhaustive. It is what shipped in a single month — across six labs, three jurisdictions, and the full size spectrum from on-device to frontier.

DeepSeek V4-Pro

2026-04-23

CNDeepSeek

~1T parameters MoE · 1M token context · multimodal · the largest open-weight release to date.

DeepSeek V4-Flash

2026-04-23

CNDeepSeek

Cheaper inference variant — same family, optimized for cost-per-token at scale.

Qwen 3.6-35B-A3B

2026-04-16

CNAlibaba

Smaller, fast MoE — 35B nominal, 3B active. Built for speed-per-dollar workloads.

Llama 4 Scout

2026-04

USMeta

109B / 17B active, 16 experts. The on-prem workhorse for regulated industries.

Llama 4 Maverick

2026-04

USMeta

400B raw capability — Meta’s frontier-class swing at the closed labs.

Gemma 4

2026-04

USGoogle

Open weights tuned for on-device and edge inference. The mobile-first answer.

Mistral Small 4

2026-04

FRMistral

Apache-2 licensed. The cleanest license terms in the cohort — a procurement gift.

GLM-5.1

2026-04

CNZhipu AI

Open weights from the third major Chinese lab to ship this month. The pattern, not the outlier.

The crossing curves

HIWONDER Robot Car with ChatGPT Large AI Models, 3D Depth Camera Ackermann Chassis ROS2-HUMBLE Lidar SLAM Mapping Navigation Autonomous Driving, MentorPi A1 Standard Kit Without Raspberry Pi

For Raspberry Pi 5 & ROS2 Robot Car. MentorPi A1 smart AI robot car is powered by Raspberry…

As an affiliate, we earn on qualifying purchases.

Two trajectories. One inflection point.

For three years, “frontier model” meant “API model.” In April 2026 that sentence stopped being true.

Closed frontier (proprietary API) Best open-weight (downloadable) ● Crossover · Apr 2026

Eval-by-eval breakdown

Mastering self-hosting: The Local AI Lab: Running Local Llms, N8n Automation, And Private AI Agents On Your Own Hardware with Zero Api Credit Cost

As an affiliate, we earn on qualifying purchases.

A 3-point gap does not justify a 30× pricing differential.

Closed frontier (March 2026) versus the best open-weight model (April 2026), across the five evaluation categories enterprises actually buy against.

ReasoningMATH · GSM8K

95.1

92.4

2.7POINTS

CodeHumanEval · MBPP

94.8

91.2

3.6POINTS

Long context128K+ retrieval

89.3

87.8

1.5POINTS

MultimodalMMMU

76.4

71.1

5.3POINTS

Tool use · AgenticTAU-bench

82.1

77.5

4.6POINTS

The crossover math

AI Systems Performance Engineering: Optimizing Model Training and Inference Workloads with GPUs, CUDA, and PyTorch

As an affiliate, we earn on qualifying purchases.

The arbitrage is not subtle.

Closed API · Recurring

€30K

/ month, in perpetuity

Per-token pricing on a customer-support agent at scale. Every successful workflow is a permanent line item.

Open weights · Self-host

€10K

one-time GPU outlay

Run a 70B-class open model on a single H200 node at $4 / hour. Per-token cost falls below any API.

Three strategic shifts this forces

Mental Models: An AI’s Guide to 100 Thinking Tools That Humans Overlook (So You Can Outsmart Anyone) (Think Smarter)

As an affiliate, we earn on qualifying purchases.

The curve crossed. So does the budget.

SHIFT 01 / ECONOMICS

Inference flips.

When a 70B-class model runs on a single H200 node, every token-heavy workflow — summarization, extraction, code review, ticket triage — has different unit economics in May than it did in March.

SHIFT 02 / SELECTION

Models become a portfolio.

Closed APIs for the hardest 5%. Open weights for the 95% that open models now handle as well as closed ones did six months ago. Routing logic is the new differentiator.

SHIFT 03 / LICENSE

Sovereignty matters again.

Llama 4 has a MAU cap. DeepSeek V4 is unrestricted but Chinese-origin. Mistral Small 4 is Apache-2. The license is a first-class procurement criterion now, not a footnote.

The strategic question is no longer which closed API do we sign — it is what part of our stack would we still pay for if the model underneath was free.

What closed labs do next

Three predictions for the next two quarters.

The bar gets raised — for six months.

Expect GPT-6, Claude 5, Gemini 3 in summer 2026, with capability gaps re-opened to double digits. Then open weights catch up again. This is now the rhythm: an 18-month half-life on every frontier capability.

The labs move up the stack.

API models commoditize; the defensible product is the agent platform — long memory, tool integration, organizational context. Google already moved on 2026-04-22 with the $750M Gemini Enterprise Agent Platform launch.

Lobbying turns to inference.

RASA blocked the training loophole. Expect the next regulatory front to be FLOP thresholds for open releases — which only the closed labs would benefit from.

One week earlier, Alibaba shipped Qwen 3.6-35B-A3B. Earlier in April, Meta dropped Llama 4 (Scout + Maverick), Mistral released Small 4, Google released Gemma 4, and Zhipu AI open-sourced GLM-5.1.

Six labs. Five competitive open-weight families. One quarter.

The benchmark gap between the best open and the best closed model is now in the single digits on every evaluation enterprises actually pay for.

If you were planning a 2026 AI budget around proprietary API pricing, that budget is wrong.

Executive Summary

Open-Weight Model	Lab	Released	Differentiator
DeepSeek V4-Pro	DeepSeek (CN)	2026-04-23	~1T MoE, 1M context, multimodal
DeepSeek V4-Flash	DeepSeek (CN)	2026-04-23	Cheaper inference variant
Qwen 3.6-35B-A3B	Alibaba	2026-04-16	Smaller, fast, MoE
Llama 4 Scout	Meta	2026-04	109B / 17B active, 16 experts
Llama 4 Maverick	Meta	2026-04	400B raw capability
Gemma 4	Google	2026-04	Open, on-device-friendly
Mistral Small 4	Mistral	2026-04	Apache-2 license
GLM-5.1	Zhipu AI (CN)	2026-04	Open weights

The list is not exhaustive. It is what shipped in a single month.

1. The Closed-Model Premium Was Just Re-Priced

For three years, “frontier model” meant “API model.” Closed weights, paid per token, accessible only through the lab that built it. Enterprises paid the premium because the alternative — open models — were measurably worse.

The April 2026 benchmark numbers no longer say that.

Eval Category	Closed Frontier (Mar 2026)	Best Open Weight (Apr 2026)	Gap
Reasoning (MATH, GSM8K)	95.1	92.4	2.7 pts
Code (HumanEval, MBPP)	94.8	91.2	3.6 pts
Long-context retrieval (128K+)	89.3	87.8	1.5 pts
Multimodal (MMMU)	76.4	71.1	5.3 pts
Tool use / agentic (TAU-bench)	82.1	77.5	4.6 pts

A 3-point gap on a benchmark does not justify a 30× pricing differential at the API.

For a CTO running a customer-support agent at scale, the math now reads: spend €10K to host an open model on your own GPUs and pay €0/token forever, or pay €30K/month to a frontier lab in perpetuity. The crossover used to be three years. It is now three months.

Open-Weight-Parity-Single-Digit-Quarter-ThorstenMeyer Download

2. What This Means for the Frankenstein Thesis

In February 2026, this site published Rent-and-Distill — the playbook by which a Chinese cohort siphoned reasoning traces from closed Western models, ran fine-tuning on rented U.S. compute, and shipped open-weight Frankenstein models at €10–20M per launch.

The April releases are the empirical proof.

DeepSeek V4 was not built by a lab with thousands of PhDs. It was built by a lab with engineering discipline, access to open base weights, and a distillation pipeline. The gap to a model built by Anthropic with thousands of PhDs is single digits.

The deeper reading: distillation is not just theoretically effective. It is now demonstrably scalable to the frontier.

“The moat is not the weights. The moat is whatever you refuse to show.” That was the closing line of Rent-and-Distill in February. Six weeks later, the open-weight benchmark gap closed by another two points.

3. The Three Strategic Shifts This Forces

Shift 1: Inference economics flip. When a 70B-class model runs on a single H200 node at $4/hour, per-token cost drops below any API. Every token-heavy workflow — call summarization, document extraction, code review, ticket triage — has different unit economics in May than it did in March.

Shift 2: Model selection becomes a portfolio question. No serious enterprise will run on one model. Closed APIs for the hardest 5% of queries. Open weights for the 95% the open models now handle as well as closed ones did six months ago. Routing logic — not model quality — becomes the new differentiator.

Shift 3: Sovereignty and licensing matter again. Llama 4’s license still excludes companies above a certain MAU threshold. DeepSeek V4 is unrestricted but Chinese-origin. Mistral Small 4 is Apache-2. The license is now a procurement criterion that matters as much as the benchmark.

4. What Closed Frontier Labs Will Do Next

Three predictions for the next two quarters.

Prediction 1: The closed labs raise the bar. Expect GPT-6 / Claude 5 / Gemini 3 in summer 2026, with capability gaps re-opened to double digits — for six months. Then the open weights catch up again. This is now the rhythm.

Prediction 2: The closed labs move up the stack. API models are commoditizing. The defensible product is the agent platform — long memory, tool integration, organizational context. Watch Anthropic, OpenAI, and Google ship platform offerings that make the underlying model less important. Google already did, on 2026-04-22, with the $750M Gemini Enterprise Agent Platform launch.

Prediction 3: The closed labs lobby for compute restrictions on open-weight training. The Remote Access Security Act blocked one cloud loophole. Expect the next regulatory front to be FLOP thresholds for open releases — which only the closed labs would benefit from.

5. The Quiet Winner

While the open-weight race accelerates, one player wins quietly: NVIDIA.

A 1T-parameter open model needs hardware to run. Self-hosted inference at enterprise scale means H200s, B200s, and the entire datacenter retrofit. The same Chinese labs that built Frankenstein models on rented U.S. compute are now selling the inference dependency to every Western enterprise that downloads the weights.

This is the second loop the regulators did not anticipate. RASA (January 2026) closed the training loophole. The April releases just reopened the inference one — except this time, NVIDIA is the beneficiary, not the threat.

What Leaders Should Do This Quarter

1. If you spend more than €1M/yr on closed APIs: run a hosted open-weight pilot on the next refresh. The crossover math is real.

2. If you sell an AI product: assume your moat is not your model. Build the data, the workflow, and the trust layer. The weights underneath you will commoditize.

3. If you set procurement policy: treat license terms (MAU caps, country-of-origin, redistribution rights) as a first-class procurement criterion, not a footnote.

4. If you set national policy: RASA needs a sequel. The next loophole is in the inference layer, not the training one.

The Strategic Read

April 2026 was the month the open-weight curve crossed the closed-weight curve on the metrics that matter to enterprises. It will be remembered as the inflection point.

The closed labs are not finished. They will pull ahead again, briefly, with the next release. But the structural fact is now established: every frontier capability shipped by a closed lab has an 18-month half-life before it is replicated in open weights.

The strategic question for any enterprise is no longer “which closed API do we sign?” It is “what part of our stack would we still pay for if the model underneath was free?”

The benchmark gap is in the single digits. The pricing gap is not. That is the arbitrage.

About the Author

Thorsten Meyer is a Munich-based futurist, post-labor economist, and recipient of OpenAI’s 10 Billion Token Award. He spent two decades managing €1B+ portfolios in enterprise ICT before deciding that writing about the transition was more useful than managing quarterly slides through it. More at ThorstenMeyerAI.com.

Sources

llm-stats, AI Updates Today: Latest AI Model Releases (2026-04)
DeepSeek, V4 Technical Report (2026-04-23)
Alibaba Qwen Team, Qwen 3.6-35B-A3B Release Notes (2026-04-16)
Meta, Llama 4 Model Card: Scout & Maverick (2026-04)
Sebastian Raschka, A Dream of Spring for Open-Weight LLMs (2026-02)
Lushbinary, Best Open-Source LLMs April 2026 (2026-04)
BuildFastWithAI, Best AI Models April 2026: Ranked by Benchmarks (2026-04)
Renovate, Chinese AI Models in April 2026: DeepSeek V4, Qwen 3.5, Kimi K2.5, GLM-5 (2026-04)

Single Digits: The April That Closed the Open-Weight Gap

Up next

Your AI Vendor’s AI Vendor: What the Vercel Breach Means for the Agent Supply Chain

Author

Thorsten Meyer

Share article

Single digits.

Eight open-weight releases. Thirty days.

HIWONDER Robot Car with ChatGPT Large AI Models, 3D Depth Camera Ackermann Chassis ROS2-HUMBLE Lidar SLAM Mapping Navigation Autonomous Driving, MentorPi A1 Standard Kit Without Raspberry Pi

Two trajectories. One inflection point.

Mastering self-hosting: The Local AI Lab: Running Local Llms, N8n Automation, And Private AI Agents On Your Own Hardware with Zero Api Credit Cost

A 3-point gap does not justify a 30× pricing differential.

AI Systems Performance Engineering: Optimizing Model Training and Inference Workloads with GPUs, CUDA, and PyTorch

The arbitrage is not subtle.

Mental Models: An AI’s Guide to 100 Thinking Tools That Humans Overlook (So You Can Outsmart Anyone) (Think Smarter)