The Problem: Shadow AI Is Already in Your Org
The adoption question is settled. Your engineers are already using AI assistants โ the open question is whether that use is governed. Ungoverned, it shows up in three forms, and all three are velocity you are buying with risk you can't see. A governed standard turns all three into an asset. The goal isn't to police AI use; it's to capture it.
Ungrounded copy/paste
Output that was never checked against the real systems, pasted in because it looked right. The plausible-but-wrong answer is the dangerous one โ it passes a glance and fails in production.
Quiet, low-quality use
People who treat AI as cheating, so they use it in private and carry the risk without ever submitting it for review. The work never gets a second set of eyes precisely because it's hidden.
Siloed sophistication
One engineer works out how to safely do something hard โ and that hard-won method dies in their session history instead of becoming something the team inherits.
Each failure mode has the same root: AI use that is invisible and unreviewed. You can't ban it (adoption already won) and you shouldn't want to. The move is to make the safe path the easy path โ capture the expertise once, review it, and hand it to everyone.
The Core Bet: The Substrate, Not the Agent
The value isn't any single clever automation. Anyone can demo an agent; almost no one builds the foundation. The value is the shared, trusted foundation that lets the whole team build on top of it safely.
A reviewed library compounds; a thousand private prompts do not.
The unit of delivery is a skill
A reviewed, versioned instruction module โ a folder of instructions, scripts, and resources an agent loads on demand โ with a declared safety posture and an explicit list of the systems it touches. One engineer captures "here's how we safely do X" once; it's reviewed once; then anyone on the team loads the reviewed copy.
Why it compounds
Knowledge stops dying in session histories. Every contributed skill makes the next problem cheaper to solve and the library more valuable. Private prompts have no such network effect โ each one is solved, used once, and lost.
| Dimension | A thousand private prompts | A reviewed skill library |
|---|---|---|
| Reuse | None โ dies in one session | Loaded by the whole team |
| Review | Never seen by a second person | Reviewed once, inherited by all |
| Safety posture | Implicit, per-person, invisible | Declared and enforced per skill |
| Cost / usage | Unmeasured | Instrumented per skill |
| Value over time | Flat โ resets every prompt | Compounds with each contribution |
The Golden Rule
Everything rests on one rule. A skill exists to make a knowledgeable person faster, not to replace their judgment.
LLMs are safe when a knowledgeable human validates output against a known source of truth. โ The single load-bearing principle; every invariant below is a corollary of it.
Knowledgeable human
Not any human โ someone who can tell a right answer from a plausible one. The skill accelerates an expert; it doesn't manufacture expertise that isn't in the room.
Validates output
An active check, not a rubber stamp. The human's job is the verification step the model can't be trusted to do on itself.
Known source of truth
Validation needs a referent: the authoritative system, spec, or record the answer can be checked against. No source of truth → no safe validation.
The Four Invariants
From the Golden Rule, four properties that must hold for every skill, always. These are architectural guarantees, not policy reminders โ build them so they can't be violated, don't just ask people to remember them.
1The system of record always wins
The system of record is the authoritative store the business runs on โ the canonical truth, not a copy that can be overwritten. The agent reads that truth and drafts against it; it never silently mutates a shared environment or invents state. The same discipline any serious business applies to its books: nothing โ human or agent โ writes to the record without a person accountable for the write.
2Read-only by default; writes are earned and gated
Read-only is an architectural property, not a reminder. The agent drafts or diagnoses; a human approves anything that changes a shared system. A skill that can only read cannot break the system of record โ that's the whole point of the default.
3Higher-stakes paths route to a named human
Anything touching regulated data, money, or a production write is flagged and owned by a named reviewer. Critically, the human checkpoint sits where prompt injection cannot reach it: one layer below the agent, not inside it. A check the agent can talk its way past is not a check.
4Everything is observable; nothing drifts
Each skill declares what it touches, the copy that runs is the reviewed copy (no personal forks), and usage and cost are visible per skill. Observability is the gap most orgs hit first โ build it in from day one, not after the first incident. And it's two jobs, not one: operational analytics (who uses what, at what cost) and agent-quality (why a bad output happened) โ stand up both.
| Invariant | What it prevents | How you enforce it (not just ask) |
|---|---|---|
| System of record wins | Silent mutation, invented state, an agent overwriting canonical truth | Agent gets read creds to the record; writes go through a separate, accountable path |
| Read-only by default | Any unreviewed change to a shared system | Default scope = read; write scopes granted explicitly, per skill |
| Named human on high-stakes | Unowned writes to regulated data / money / prod; injection bypassing the check | Route flag → named reviewer; checkpoint outside the agent's context |
| Observable, no drift | Personal forks, blind spend, "which version ran?" | Auto-synced reviewed copy; declared systems-touched; per-skill usage/cost telemetry |
If your human approval lives inside the agent's prompt or output, a hostile input can rewrite the very text the human is reading. Put the gate one layer down โ in the deploy pipeline, the merge, the write-path โ where the model can't author what the reviewer sees. And isolate the sessions that read untrusted content (inbound email, scraped pages) from the sessions that can act, so the injection chain can't form. In one regulated-lender deployment, that isolation plus a native send-confirmation checkpoint cut prompt-injection success from ~24% unprotected to ~1%.
The Development Lifecycle
Every capability moves through the same reviewed path โ from a real, repeated problem to a governed, instrumented skill whose autonomy is earned, never assumed. Distribution is not permission.
Intake repeated problem → candidate
A real, repeated problem a person already solves by hand becomes a candidate skill. The bar is repetition + a known-good method โ not novelty. If it only happens once, it's not a skill; if nobody can yet do it safely by hand, it's not ready to encode.
Build declared posture + tests
Author it as a versioned skill with three non-negotiables:
- Declared safety posture โ read-only vs. drafts vs. gated-write.
- Explicit systems-touched list โ every system of record / data store it reads or proposes to write.
- Tests โ so the reviewed behavior is the behavior that ships.
Review report, not verdict
A merge gate runs deterministic checks and routes higher-stakes skills to a named reviewer:
- Secrets & PII scan.
- Naming conventions.
- Declared-vs-actual systems โ does what it touches match what it claims?
The gate produces a report, not a verdict โ a human merges. The gate never merges. Automation finds; a person decides.
Governed deploy no drifted forks
The skill ships through an auto-synced channel (e.g. CI-distributed) so everyone loads the reviewed copy. No personal forks, no "the version I'm running is different." The reviewed copy is the running copy.
Observe cost-per-skill
Usage, cost-per-skill, and outcomes are instrumented. You cannot reason about adoption or safety you can't see โ and you can't justify autonomy (next step) without the telemetry to back it.
Earn autonomy distribution ≠ permission
Read-only and drafts-only by default; broader autonomy is granted per-skill, only when telemetry justifies it. A skill being widely distributed does not mean it has earned the right to write. Autonomy is a privilege a skill earns from its own track record, one capability at a time. Calibrate the bar to reality: comparable sales-AI workflows see roughly 70โ80% draft acceptance, not 99%+ โ the telemetry tells you when a skill has earned more rope, not an aspirational target.
Anatomy of a Skill
A skill is a folder whose entry point is a SKILL.md file: YAML frontmatter (required: name + description) plus instructions, and optionally bundled scripts, templates, and reference files the agent loads on demand. The governance metadata โ safety posture and systems-touched โ is what turns a prompt into a governed capability.
---
name: incident-triage
description: Diagnose a production incident from logs and metrics and
draft a root-cause hypothesis. Read-only; never writes or restarts services.
# --- governance metadata (org convention, enforced by the review gate) ---
safety_posture: read-only # read-only | drafts-only | gated-write
systems_touched:
- observability-platform # READ
- incident-tracker # READ
stakes: standard # standard | high (routes to a named reviewer)
owner: sre-oncall
---
# Incident Triage
1. Pull the error spike window from the observability platform.
2. Correlate with recent deploys and the incident tracker.
3. **Draft** a ranked root-cause hypothesis for a human to confirm.
Never restart, roll back, or modify any service โ propose, don't act.
What every governed skill declares
- name / description โ the only fields the open spec requires; the description is how the agent decides when to load it.
- safety_posture โ read-only, drafts-only, or gated-write.
- systems_touched โ every record/store it reads or proposes to write, with the access kind.
- stakes / owner โ high-stakes routes to a named reviewer; an owner is accountable.
Progressive disclosure
A skill costs only ~30โ50 tokens at startup (just its name + description); the full instructions load only when the skill activates for a task. An agent with 100 skills installed pays roughly 3,000โ5,000 tokens at session start. That's what makes a large governed library cheap to carry โ you can install everything and pay only for what fires. (as of Dec 2025 spec)
Build on the Open Standard, Not a Vendor
A crucial design choice: don't lock your institutional knowledge to one model vendor. What you standardize is the format and the governance, not the model. Your reviewed library outlives any single vendor's moment in the lead.
The SKILL.md / Agent Skills standard
Originally from Anthropic and published as an open spec on 18 Dec 2025 (hosted at agentskills.io). The same skill files are portable across tools โ only discovery and distribution differ per tool. By 2026, 30+ tools read the same SKILL.md from the same folder structure.
Why portability is the safety play
A team that arrives on a different toolset joins the same disciplined way of building. The standard travels; your governance rides on it. You're insulated from any one vendor's pricing, deprecation, or fall from the lead.
| Standardize this | Don't lock this |
|---|---|
The skill format (SKILL.md + bundled resources) | The model vendor / specific LLM |
| The governance (posture, systems-touched, review gate) | The tool that discovers and runs skills |
| The review & distribution pipeline | A proprietary, single-vendor skill schema |
SKILL.mdClaude Code, OpenAI Codex, Google Gemini CLI, GitHub Copilot, Cursor, and a growing list of others (30+ by 2026) load the same files. Adoption count is volatile โ verify the current list at agentskills.io before quoting a number.
The Maturity Path: Closing the Loop
A skill library matures in a predictable sequence โ monitor → diagnose → fix → ship. You start with isolated, read-only helpers and add connective tissue over time. The point isn't full autonomy; it's that the org moves from "isolated helpers" to "a reviewed pipeline" without ever giving up the human checkpoint on anything that writes. The loop tightens; oversight never leaves.
| Stage | Example skill | Posture | Human checkpoint |
|---|---|---|---|
| Monitor | Safely diagnose a class of production issue; pr-code-review encoding senior reviewers' judgment | read-only | Not yet writing โ informational |
| Diagnose | incident-triage orchestrator that composes the read-only helpers into one flow | read-only | Human reads the hypothesis |
| Fix | A skill that turns a diagnosis into a draft fix | drafts-only | Human approves the draft |
| Ship | change-verification step that checks a fix against the source of truth before it ships | gated-write | Named reviewer on the write |
Manual toil falls as the stages connect into a pipeline. But the human checkpoint on anything that writes is not one of the things you optimize away โ it's the constant that lets you optimize everything around it.
Governance Is the Velocity Play, Not the Brake
The instinct is that guardrails slow you down. The opposite is true: the guardrails are what let you go fast and keep going fast. Ungoverned AI use is quick โ right up until the first bad write to a system of record, and that single incident sets you back further than the slow status quo ever would.
Guardrails are why adoption scales
They're the reason adoption gets past a handful of enthusiasts without an incident forcing a halt. The org that never has the headline-making bad write keeps compounding; the one that does spends its next two quarters on remediation and a moratorium.
The shape lesson
Big, all-or-nothing platform programs concentrate risk in a single basket and a single cutover โ and most of us have watched one fail. A governed skill library is the opposite shape: small reviewed increments, each owned, each earning the next.
| Big-bang AI platform program | Governed skill library | |
|---|---|---|
| Risk shape | One basket, one cutover | Many small reviewed increments |
| Origin | Mandated top-down | Emerges from a real problem a team solved |
| Failure mode | Whole program fails at once | One skill is rejected; the rest stand |
| Return over time | Binary โ works or doesn't | Compounds with each contribution |
The pattern is shared, but the skills aren't dictated from the top โ each emerges from a real problem a specific team already solved, captured once and handed to everyone. That's why it compounds: a reviewed library gets stronger with every skill a team contributes. A thousand private prompts do not.
Anti-Patterns & What to Do Instead
The fastest way to a safe, fast AI practice is usually to stop doing these โ not to add a new tool.
Banning AI assistants outright โ driving the use underground.
Make the governed path the easy path; capture the use, don't police it.
Letting the agent write directly to a shared system "to save a step."
Read-only by default; the agent drafts, a human approves every write.
Putting the human approval inside the agent's prompt/output.
Place the checkpoint one layer below the agent, beyond prompt injection's reach.
A gate that auto-merges when checks pass โ automation as judge.
The gate produces a report, not a verdict; a person merges.
Retrofitting governance after the library already exists.
Build the review gate before your third skill โ far cheaper than bolting it on.
Shipping skills with no usage/cost instrumentation.
Instrument from day one โ you can't manage safety or spend you can't see.
Personal forks of a skill drifting from the reviewed copy.
Distribute via an auto-synced channel; the reviewed copy is the running copy.
Treating wide distribution as permission to write.
Distribution ≠ permission. Earn autonomy per skill, from telemetry.
Encoding institutional knowledge in one vendor's proprietary format.
Use the open SKILL.md format; standardize format + governance, not the model.
Launching a big-bang AI platform with one risky cutover.
Grow a library of small reviewed increments, each earning the next.
Status inflation โ calling a draft "built" or a prototype "in production."
Use precise status words (drafted ≠ prototyped ≠ built); status inflation is how an AI program loses credibility with risk & audit.
Reviewing skills only one at a time, letting two authors ship overlapping ones.
Add a catalog-level overlap review; overlapping authorship is the failure mode that quietly kills a skills platform.
Relying on catalog or department segmentation as a security boundary.
Treat the connector credential as the real access boundary; catalog membership is documentation & ergonomics, not permission.
Getting Started
You don't need a program or a budget line โ you need the first reviewed skill and the discipline to grow from there. Work the six steps in order; tick them off as you go (progress saves in your browser). The whole thing is one bet, repeated: capture expertise once, review it, and let everyone build on it.
Start with one real, repeated problem
Declare what it touches, default it to read-only
safety_posture: read-only. If it can only read, it can't break the system of record โ that property is the safety, not a promise to be careful.Build the review gate before your third skill
Instrument from day one
Keep the human checkpoint on every write
Use the open SKILL.md format
Each new skill is the same loop again: a repeated problem → declared posture → reviewed once → governed deploy → instrumented → autonomy earned. Do it once well and the seventh skill is far cheaper than the first.
A Worked Example: A Regulated Lender
This was built end-to-end as the embedded AI lead at a regulated commercial lender โ ~3 dozen people across 9 departments, in a 2-month engagement: a CI-distributed skill marketplace, a publish-and-audit governance gate, and AI observability stood up from zero.
On its first run, a follow-up skill was about to request a borrower's Social Security number โ from the wrong people: 4 of the first 5 CRM contacts were brokers, not borrowers, because the skill had inherited a wrong assumption about the source of truth. The gate, the named reviewer, and the drafts-only default existed precisely so this surfaced as a caught request, not a leaked SSN to an intermediary. The fixes became rules: treat the CRM lead-status as authoritative over the application API, gate the SSN ask on a tax-ID-on-file flag, and skip broker-intermediated deals entirely. Every failure like that became an architectural rule โ which is how the four invariants above were earned, not theorized.
Skill marketplace
CI-distributed so everyone loads the reviewed copy โ no drifted forks. (Invariant 4.)
Publish-and-audit gate
A compliance-routing review gate: deterministic checks, high-stakes skills to a named reviewer. (Lifecycle step 3.)
AI observability
Usage and cost per skill, from zero โ so autonomy could be earned on evidence. (Invariant 4 + step 5.)
In a small, regulated, system-of-record-centric business, the highest-leverage AI work is not the cleverest agent โ it's the platform that lets non-engineers ship reviewed, observable, drafts-only automation on top of the existing stack, without ever competing with the system of record. Get the substrate right and capability compounds safely; get it wrong and you accumulate unauditable risk that someone eventually has to unwind.
Full writeup โ including the observability backend and the compliance-routing gate: Founding the AI delivery function at a regulated lender.
Glossary
| Term | Meaning |
|---|---|
| Skill | A reviewed, versioned instruction module โ a folder of instructions, scripts, and resources an agent loads on demand, with a declared safety posture and systems-touched list. |
| SKILL.md | The entry-point file of a skill: YAML frontmatter (required: name, description) plus instructions. The core of the open Agent Skills standard. |
| Agent Skills | The open standard for the skill format, published by Anthropic on 18 Dec 2025 at agentskills.io; read by 30+ tools by 2026. |
| System of record | The authoritative store the business runs on โ canonical truth, not a copy that can be overwritten. |
| Safety posture | A skill's declared write-capability: read-only, drafts-only, or gated-write. |
| Systems-touched | The explicit list of records/stores a skill reads or proposes to write, declared in the skill and checked by the gate. |
| Read-only by default | The architectural default: a skill can read but not change shared systems unless a write scope is explicitly earned. |
| Review gate | A merge-time check that runs deterministic scans and routes high-stakes skills to a named reviewer. Produces a report, not a verdict. |
| Named reviewer | The accountable human who owns approval of a higher-stakes skill (regulated data, money, production writes). |
| Human checkpoint | The point where a person validates output against the source of truth โ placed one layer below the agent, beyond prompt injection's reach. |
| Prompt injection | A hostile input that manipulates an agent's behavior or the text a human is asked to approve; why checkpoints must sit outside the agent. |
| Observability | Per-skill visibility into usage, cost, and outcomes โ the gap most orgs hit first; build it in from day one. |
| Drift | Divergence between the reviewed copy of a skill and what's actually running (e.g. personal forks). Prevented by auto-synced distribution. |
| Governed deploy | Shipping a skill through an auto-synced channel so everyone loads the reviewed copy. |
| Autonomy | How much a skill may do without per-action human approval โ earned per skill from telemetry, never granted by default. |
| Progressive disclosure | The design that loads only a skill's name + description (~30โ50 tokens) at startup, expanding to full instructions only when it activates. |
| Shadow AI | Ungoverned, invisible AI use inside an org โ ungrounded copy/paste, quiet low-quality use, and siloed sophistication. |
| Distribution ≠ permission | The rule that a skill being widely available does not grant it the right to write or act autonomously. |