Most teams using AI coding agents are still doing the equivalent of retyping the same instructions every morning. Anthropic just published what it learned from the opposite approach — packaging that knowledge once, into reusable units it calls Skills, and running hundreds of them across its own engineering org.

The write-up, from a Claude Code engineer, is ostensibly a how-to. Read it as a business memo and a different point jumps out: this is how ad-hoc prompting becomes durable institutional capability — the standard operating procedures your agents actually follow, versioned and shared like any other asset. Here’s the useful core, translated for the person building the skills and the person paying for them.

A Skill Is a Folder, Not a Prompt — Insights

AI Dispatch · Insights · 1 July 2026

A Skill is a folder, not a prompt

Anthropic published what it learned running hundreds of Skills across its own engineering org. Read as a business memo, the point is bigger than a coding trick: this is how ad-hoc prompting becomes durable institutional capability — the SOPs your agents actually follow, versioned and shared.

✕ The misconception

“A Skill is just a clever markdown prompt you save in a file.”

✓ What it actually is

A folder the agent can discover, read & run — instructions, scripts, references, templates, config & on-demand hooks.

Anatomy of a Skill — the file system is context engineering

my-skill/the unit you share & version

├─ SKILL.mdroot instructions + a description written for the model (its trigger)

├─ references/deep detail pulled in only when needed — progressive disclosure

├─ scripts/real code, so the agent composes instead of rebuilding boilerplate

├─ assets/templates & files to copy into the output

├─ config.jsonsetup the agent asks for if it’s missing (e.g. which Slack channel)

└─ hooks + memoryon-demand guardrails + an append-only log so it remembers

Why it matters: the folder itself is the knowledge base. The agent reads the root, then reaches deeper only when the task demands it — the same way you’d hand a new hire a one-pager that points to the detailed docs.

The nine types — a gap-analysis map for your own library

1Library / API reference

2Product verification ★ top impact

3Data fetching & analysis

4Business-process automation

5Code scaffolding & templates

6Code quality & review

7CI/CD & deployment

8Runbooks

9Infrastructure operations

By Anthropic’s own measurement, verification Skills — the ones that check the work — moved output quality the most. If you build one category well, build that one.

The craft — what separates a good Skill from a useless one

Gotchas = highest-signal section Describe for the model, not humans (it’s the trigger) Don’t state the obvious Ship scripts, not just prose On-demand guardrail hooks (/careful, /freeze) Let it remember (log / SQLite) Don’t railroad — leave room to adapt

The take

The knowledge of how your organization actually operates can be captured, versioned, shared & executed — and the thing capturing it is a humble folder with a script and a gotchas list inside. For the builder, that’s context engineering with real tools attached. For whoever owns the budget, it’s the difference between AI that starts from zero every morning and an asset that compounds. Caveats: best practices are still evolving, checked-in Skills cost context, and curation beats accumulation. Start with one Skill, one gotcha, and the category that catches your mistakes.

Source: “Lessons from building Claude Code: How we use skills,” Thariq Shihipar (Anthropic), Claude blog, 3 June 2026. Categories, examples & measured claims are Anthropic’s; framing is the author’s. Docs: code.claude.com/docs/en/skills.

thorstenmeyerai.com

First, kill the “it’s just markdown” idea

The single most load-bearing correction in the piece is definitional. A Skill is not a clever prompt saved in a text file. It’s a folder — one that can contain instructions, reference documents, runnable scripts, templates, data, configuration, and even hooks that fire only while the Skill is active. The agent can discover that folder, read what’s in it, and execute the scripts inside it.

For a technical reader, that reframing changes everything about how you design one, and we’ll get to the mechanics. For a business reader, the translation is simpler and more important: a Skill is a container for how your organization actually does a thing — with the tribal knowledge, the guardrails, and the tools bundled in — not a sticky note. That’s the difference between a tip and an asset.

Amazon

AI knowledge management folder system

As an affiliate, we earn on qualifying purchases.

Why this is a business story, not just a developer trick

Strip away the syntax and a Skill does three things a company should care about.

It makes agent output consistent: the same task gets done the same way whether it’s run by a senior engineer or a new hire’s assistant. It compresses onboarding: the knowledge that used to live in one person’s head, or in a wiki nobody reads, becomes something the agent applies automatically. And it compounds: Anthropic’s own framing is that its best Skills started as a few lines and one hard-won caveat, then got better every time the agent hit a new edge case and someone wrote it down.

That last property is the one to internalize. A Skills library is not a cost; it’s an appreciating asset — a record of how your organization gets work done that keeps getting sharper. The tell that Anthropic believes this: it says a team can justify spending an entire engineer-week making a single category of Skill excellent. Companies don’t spend engineer-weeks on sticky notes.

INCRA MTL2 Master Reference Guide with Templates

Over 200 detailed illustrations and photos, plus numerous handy tips help guarantee success.

As an affiliate, we earn on qualifying purchases.

The nine-category map — use it to find your gaps

After cataloging its internal Skills, Anthropic found they cluster into nine types. The list matters less as a taxonomy than as a gap-analysis tool: run your own team against it and the holes are obvious. In plain terms, the nine are:

Library and API reference (how to correctly use an internal or fiddly library, with a “gotchas” list). Product verification (how to actually test that the work works — driving a signup flow or a checkout in a headless browser). Data fetching and analysis (which table holds the canonical ID, which dashboard answers which question). Business-process automation (turning a repetitive workflow — the standup post, the weekly recap, the ticket with all its required fields — into one command). Code scaffolding (generating your boilerplate with your conventions pre-wired). Code quality and review (enforcing house style, running a fresh-eyes critique pass). CI/CD and deployment (babysitting a pull request through flaky tests, doing a gradual rollout with auto-rollback). Runbooks (taking an alert or error and walking a structured investigation to a report). And infrastructure operations (routine, sometimes destructive maintenance done with guardrails).

The business read on that spread: it runs from “help me write code” all the way to “run our operational procedures safely.” The highest-value one, by Anthropic’s own measurement, is the unglamorous middle — verification, the Skills that check the work — because that’s what moved output quality the most. If you only build one category well, build the one that catches mistakes.

Complete Library Skills, Grade 5

As an affiliate, we earn on qualifying purchases.

The craft: what separates a good Skill from a useless one

This is where the technical reader gets paid, and several of the lessons are counterintuitive enough to be worth the price of admission.

Don’t tell the agent what it already knows. A Skill that restates the obvious just burns context for no gain. The valuable content is the stuff that pushes the model off its defaults — Anthropic’s example is a design Skill built specifically to steer the model away from the tired tells of AI-generated UI, like the same overused font and gradient. The business analogy is exact: good documentation captures what’s non-obvious and specific to you, not the generic.

The Gotchas section is the highest-signal part of any Skill. These are the traps the agent keeps falling into: the table that’s append-only so you want the highest version, not the latest timestamp; the field that’s named one thing in one service and another thing elsewhere; the staging environment that returns success even when it silently failed. This is institutional memory in its purest form — the knowledge that only exists because someone got burned once.

Write the description for the model, not for a human. This is the subtlest lesson and the one most people get wrong. When an agent starts up, it scans a list of every available Skill’s description to decide which, if any, applies. So the description isn’t a summary — it’s a trigger definition, and it should include the actual words a user would say, right down to internal slang like “babysit” the PR. A perfect Skill that never fires because its description didn’t match the request is worthless.

Give the agent scripts, not just prose. One of the most powerful moves is bundling real code — helper functions, libraries — into the Skill. That lets the agent spend its effort on composition (deciding what to do) instead of reconstructing boilerplate every time. Hand it a set of data-fetching functions and it can assemble them on the fly to answer “what happened on Tuesday?” This is the technical heart of the whole idea: the Skill’s file system is context engineering — a root instruction file that points to deeper references, templates, and scripts the agent pulls in only when it needs them (what Anthropic calls progressive disclosure).

Add guardrails that only exist when you want them. Skills can carry hooks that activate for a single session — a “careful” mode that blocks destructive commands like force-pushes or table drops when you know you’re touching production, or a “freeze” that stops the agent from editing anything outside the directory you’re working in. Always-on, these would be maddening; on-demand, they’re a safety net. For anyone nervous about handing an agent real access, this is the mechanism that makes it defensible.

Let it remember. A Skill can keep its own memory — as simple as an append-only log or as structured as a small database — so the next run knows what the last run did. The standup Skill that reads its own history can report only what changed. That’s the leap from a stateless tool to something that behaves like a colleague who was there yesterday.

Amazon

AI scripting and configuration files

As an affiliate, we earn on qualifying purchases.

The knowledge only pays off if it spreads, and Anthropic describes two routes. Small teams simply check Skills into the repo where the code lives. At scale, that starts to cost — every checked-in Skill adds a little to what the agent has to hold in context — so larger orgs move to an internal plugin marketplace people opt into.

The governance detail is the interesting part for anyone who’s watched a company try to run a template library: there’s no central committee deciding what’s official. Skills earn their place organically — someone drops one in a shared folder, it gets traction, and only then does it graduate into the marketplace. It’s bottom-up curation, which is usually the only kind that survives contact with real engineers. Skills can also call each other by name, so small, single-purpose units compose into bigger workflows rather than bloating into do-everything monsters.

The honest caveats

A few things the enthusiasm can obscure. Anthropic is candid that best practices here are still evolving — this is a snapshot of what’s working now, not a settled standard. The context cost of checked-in Skills is real and grows with your library, so “add a Skill for everything” is a trap. There’s a genuine tension between giving the agent enough guidance and over-specifying — railroad it with rigid instructions and you lose the adaptability that made it useful. And it’s worth stating plainly that this is Anthropic’s account of its own tool, Claude Code; the underlying pattern — packaging agent knowledge into discoverable, executable folders — is becoming a general one across the industry, but the specifics here are theirs.

None of that undercuts the core idea. It just means a Skills library, like any asset, needs curation, not accumulation.

The take

The framing that makes this worth your time isn’t “here’s a Claude Code feature.” It’s that the messy, valuable, in-people’s-heads knowledge of how your organization actually operates can be captured, versioned, shared, and executed — and that the thing capturing it is a humble folder with a script and a gotchas list inside.

For the builder, that’s context engineering with real tools attached. For whoever owns the budget, it’s the difference between paying for AI that starts from zero every morning and building an institutional capability that compounds. The teams that treat their Skills library as an appreciating asset — curated, sharpened every time the agent trips on something new — will pull away from the ones still retyping instructions. Start with one Skill, one gotcha, and the category that catches your mistakes.

Source: “Lessons from building Claude Code: How we use skills,” by Thariq Shihipar (Anthropic), published June 3, 2026, on the Claude blog. Anthropic’s documentation for Skills lives at code.claude.com/docs/en/skills, with example Skills at github.com/anthropics/skills. This piece summarizes and interprets Anthropic’s published guidance; the categories, examples, and measured claims are Anthropic’s. Analysis and framing are the author’s.

A Skill Is a Folder, Not a Prompt: What Anthropic Learned Running Hundreds of Them

Author

Thorsten Meyer

Share article

A Skill is a folder, not a prompt