Apple researchers unveiled a retrofit that lets standard autoregressive LLMs predict several future tokens at once—and then verify them—cutting latency without hurting quality. In “Your LLM Knows the Future,” they add learned mask tokens, a small gated‑LoRA adapter, and a lightweight sampler; proposed tokens are verified via linear or “quadratic” decoding. The team frames the approach as a minimal supervised fine‑tune that preserves next‑token performance. On Tulu3‑8B fine‑tuned to predict eight future tokens, Apple reports ~2.5× faster chat/knowledge and up to ~5× on code/math with no quality loss.

TL;DR

Apple researchers published a new paper showing how to make standard autoregressive LLMs predict several future tokens at once with minimal retraining. Using special mask tokens, a tiny gated LoRA adapter, a lightweight sampler head, and a verification step they call linear/quadratic decoding, they report ~2.5× speedups on general chat/QA and up to ~5× on coding & math with no quality loss in their tests on Tulu3‑8B.

Why it matters: Lower latency and compute per user—especially in predictable domains—without a second draft model or major architecture changes; viable for on‑device or server inference.

How it compares: Speculative decoding uses a smaller helper model; Apple’s approach keeps a single model proposing and verifying its own futures, aiming for “lossless” quality. Coverage pegs gains at 2–3× on average, up to 5× for code/math.

Bottom line: A light, practical recipe to accelerate existing LLMs with minimal retraining. It’s research, not a shipping feature, but likely to influence Apple Intelligence and open‑model ecosystems—and broader developer tooling adoption soon.

Amazon

Top picks for "apple crack multi"

Open Amazon search results for this keyword.

As an affiliate, we earn on qualifying purchases.

You May Also Like

Deloitte’s Landmark Partnership with Anthropic: Scaling AI Deployment and Setting a Benchmark for Enterprise Adoption

Introduction On 6 October 2025, Anthropic and the global professional‑services firm Deloitte announced an…

Chatbots Vs Humans: Customer Service AI Often Needs Human Help After All

Navigating the line between AI efficiency and human empathy reveals why customer service often still depends on human help.

The Global Race for AI Chips and Compute Power

The rise of generative AI and large-scale machine learning has triggered an…

Reality Check: Should Everyone Learn to Code in the Age of AI?

Many wonder if learning to code is still essential in an AI-driven world; discover why it might be more important than ever.