Governing Agentic AI

A field guide for software organizations: how to turn scattered, private AI use into a shared, reviewed capability โ€” without putting your systems of record at risk. Most writing is about the agent. This is about the substrate underneath it: the review, the observability, and the guardrails that let a whole team build on AI safely.

Shadow AI → shared assetThe Golden Rule4 safety invariants Reviewed-skill lifecycleOpen SKILL.md standardMaturity pathAnti-patterns
Golden Rule
Validate vs. truth
human checks output against a source of record
Default posture
Read-only
writes are earned & gated
Unit of delivery
The Skill
reviewed, versioned module
Review gate
Report, not verdict
a human merges, never the gate
Build the gate by
Skill #3
before, not after the incident
Format
SKILL.md
open spec, ~30+ tools read it
Autonomy
Earned
per-skill, from telemetry
Maturity loop
Monitor→Ship
tighten it; never drop oversight

The Problem: Shadow AI Is Already in Your Org

The adoption question is settled. Your engineers are already using AI assistants โ€” the open question is whether that use is governed. Ungoverned, it shows up in three forms, and all three are velocity you are buying with risk you can't see. A governed standard turns all three into an asset. The goal isn't to police AI use; it's to capture it.

Ungrounded copy/paste

Output that was never checked against the real systems, pasted in because it looked right. The plausible-but-wrong answer is the dangerous one โ€” it passes a glance and fails in production.

Quiet, low-quality use

People who treat AI as cheating, so they use it in private and carry the risk without ever submitting it for review. The work never gets a second set of eyes precisely because it's hidden.

Siloed sophistication

One engineer works out how to safely do something hard โ€” and that hard-won method dies in their session history instead of becoming something the team inherits.

The reframe

Each failure mode has the same root: AI use that is invisible and unreviewed. You can't ban it (adoption already won) and you shouldn't want to. The move is to make the safe path the easy path โ€” capture the expertise once, review it, and hand it to everyone.

The Core Bet: The Substrate, Not the Agent

The value isn't any single clever automation. Anyone can demo an agent; almost no one builds the foundation. The value is the shared, trusted foundation that lets the whole team build on top of it safely.

A reviewed library compounds; a thousand private prompts do not.

The unit of delivery is a skill

A reviewed, versioned instruction module โ€” a folder of instructions, scripts, and resources an agent loads on demand โ€” with a declared safety posture and an explicit list of the systems it touches. One engineer captures "here's how we safely do X" once; it's reviewed once; then anyone on the team loads the reviewed copy.

Why it compounds

Knowledge stops dying in session histories. Every contributed skill makes the next problem cheaper to solve and the library more valuable. Private prompts have no such network effect โ€” each one is solved, used once, and lost.

DimensionA thousand private promptsA reviewed skill library
ReuseNone โ€” dies in one sessionLoaded by the whole team
ReviewNever seen by a second personReviewed once, inherited by all
Safety postureImplicit, per-person, invisibleDeclared and enforced per skill
Cost / usageUnmeasuredInstrumented per skill
Value over timeFlat โ€” resets every promptCompounds with each contribution

The Golden Rule

Everything rests on one rule. A skill exists to make a knowledgeable person faster, not to replace their judgment.

LLMs are safe when a knowledgeable human validates output against a known source of truth. โ€” The single load-bearing principle; every invariant below is a corollary of it.

Knowledgeable human

Not any human โ€” someone who can tell a right answer from a plausible one. The skill accelerates an expert; it doesn't manufacture expertise that isn't in the room.

Validates output

An active check, not a rubber stamp. The human's job is the verification step the model can't be trusted to do on itself.

Known source of truth

Validation needs a referent: the authoritative system, spec, or record the answer can be checked against. No source of truth → no safe validation.

The Four Invariants

From the Golden Rule, four properties that must hold for every skill, always. These are architectural guarantees, not policy reminders โ€” build them so they can't be violated, don't just ask people to remember them.

1The system of record always wins

The system of record is the authoritative store the business runs on โ€” the canonical truth, not a copy that can be overwritten. The agent reads that truth and drafts against it; it never silently mutates a shared environment or invents state. The same discipline any serious business applies to its books: nothing โ€” human or agent โ€” writes to the record without a person accountable for the write.

2Read-only by default; writes are earned and gated

Read-only is an architectural property, not a reminder. The agent drafts or diagnoses; a human approves anything that changes a shared system. A skill that can only read cannot break the system of record โ€” that's the whole point of the default.

3Higher-stakes paths route to a named human

Anything touching regulated data, money, or a production write is flagged and owned by a named reviewer. Critically, the human checkpoint sits where prompt injection cannot reach it: one layer below the agent, not inside it. A check the agent can talk its way past is not a check.

4Everything is observable; nothing drifts

Each skill declares what it touches, the copy that runs is the reviewed copy (no personal forks), and usage and cost are visible per skill. Observability is the gap most orgs hit first โ€” build it in from day one, not after the first incident. And it's two jobs, not one: operational analytics (who uses what, at what cost) and agent-quality (why a bad output happened) โ€” stand up both.

InvariantWhat it preventsHow you enforce it (not just ask)
System of record winsSilent mutation, invented state, an agent overwriting canonical truthAgent gets read creds to the record; writes go through a separate, accountable path
Read-only by defaultAny unreviewed change to a shared systemDefault scope = read; write scopes granted explicitly, per skill
Named human on high-stakesUnowned writes to regulated data / money / prod; injection bypassing the checkRoute flag → named reviewer; checkpoint outside the agent's context
Observable, no driftPersonal forks, blind spend, "which version ran?"Auto-synced reviewed copy; declared systems-touched; per-skill usage/cost telemetry
The injection corollary

If your human approval lives inside the agent's prompt or output, a hostile input can rewrite the very text the human is reading. Put the gate one layer down โ€” in the deploy pipeline, the merge, the write-path โ€” where the model can't author what the reviewer sees. And isolate the sessions that read untrusted content (inbound email, scraped pages) from the sessions that can act, so the injection chain can't form. In one regulated-lender deployment, that isolation plus a native send-confirmation checkpoint cut prompt-injection success from ~24% unprotected to ~1%.

The Development Lifecycle

Every capability moves through the same reviewed path โ€” from a real, repeated problem to a governed, instrumented skill whose autonomy is earned, never assumed. Distribution is not permission.

1 ยท Intake
Candidate
A repeated problem → a candidate skill, not a private prompt.
2 ยท Build
Author
Versioned: declared posture, systems-touched list, tests.
3 ยท Review
Gate
Deterministic checks + named reviewer. Report, not verdict.
4 ยท Deploy
Govern
Auto-synced channel; everyone loads the reviewed copy.
5 ยท Observe
Instrument
Usage, cost-per-skill, outcomes.
6 ยท Autonomy
Earn
Read-only by default; broaden only when telemetry justifies.
Intake repeated problem → candidate

A real, repeated problem a person already solves by hand becomes a candidate skill. The bar is repetition + a known-good method โ€” not novelty. If it only happens once, it's not a skill; if nobody can yet do it safely by hand, it's not ready to encode.

Build declared posture + tests

Author it as a versioned skill with three non-negotiables:

  • Declared safety posture โ€” read-only vs. drafts vs. gated-write.
  • Explicit systems-touched list โ€” every system of record / data store it reads or proposes to write.
  • Tests โ€” so the reviewed behavior is the behavior that ships.
Review report, not verdict

A merge gate runs deterministic checks and routes higher-stakes skills to a named reviewer:

  • Secrets & PII scan.
  • Naming conventions.
  • Declared-vs-actual systems โ€” does what it touches match what it claims?

The gate produces a report, not a verdict โ€” a human merges. The gate never merges. Automation finds; a person decides.

Governed deploy no drifted forks

The skill ships through an auto-synced channel (e.g. CI-distributed) so everyone loads the reviewed copy. No personal forks, no "the version I'm running is different." The reviewed copy is the running copy.

Observe cost-per-skill

Usage, cost-per-skill, and outcomes are instrumented. You cannot reason about adoption or safety you can't see โ€” and you can't justify autonomy (next step) without the telemetry to back it.

Earn autonomy distribution ≠ permission

Read-only and drafts-only by default; broader autonomy is granted per-skill, only when telemetry justifies it. A skill being widely distributed does not mean it has earned the right to write. Autonomy is a privilege a skill earns from its own track record, one capability at a time. Calibrate the bar to reality: comparable sales-AI workflows see roughly 70โ€“80% draft acceptance, not 99%+ โ€” the telemetry tells you when a skill has earned more rope, not an aspirational target.

Anatomy of a Skill

A skill is a folder whose entry point is a SKILL.md file: YAML frontmatter (required: name + description) plus instructions, and optionally bundled scripts, templates, and reference files the agent loads on demand. The governance metadata โ€” safety posture and systems-touched โ€” is what turns a prompt into a governed capability.

incident-triage/SKILL.md — a read-only diagnostic skill
---
name: incident-triage
description: Diagnose a production incident from logs and metrics and
  draft a root-cause hypothesis. Read-only; never writes or restarts services.
# --- governance metadata (org convention, enforced by the review gate) ---
safety_posture: read-only        # read-only | drafts-only | gated-write
systems_touched:
  - observability-platform   # READ
  - incident-tracker         # READ
stakes: standard               # standard | high (routes to a named reviewer)
owner: sre-oncall
---

# Incident Triage
1. Pull the error spike window from the observability platform.
2. Correlate with recent deploys and the incident tracker.
3. **Draft** a ranked root-cause hypothesis for a human to confirm.
   Never restart, roll back, or modify any service โ€” propose, don't act.

What every governed skill declares

  • name / description โ€” the only fields the open spec requires; the description is how the agent decides when to load it.
  • safety_posture โ€” read-only, drafts-only, or gated-write.
  • systems_touched โ€” every record/store it reads or proposes to write, with the access kind.
  • stakes / owner โ€” high-stakes routes to a named reviewer; an owner is accountable.

Progressive disclosure

A skill costs only ~30โ€“50 tokens at startup (just its name + description); the full instructions load only when the skill activates for a task. An agent with 100 skills installed pays roughly 3,000โ€“5,000 tokens at session start. That's what makes a large governed library cheap to carry โ€” you can install everything and pay only for what fires. (as of Dec 2025 spec)

Build on the Open Standard, Not a Vendor

A crucial design choice: don't lock your institutional knowledge to one model vendor. What you standardize is the format and the governance, not the model. Your reviewed library outlives any single vendor's moment in the lead.

The SKILL.md / Agent Skills standard

Originally from Anthropic and published as an open spec on 18 Dec 2025 (hosted at agentskills.io). The same skill files are portable across tools โ€” only discovery and distribution differ per tool. By 2026, 30+ tools read the same SKILL.md from the same folder structure.

Why portability is the safety play

A team that arrives on a different toolset joins the same disciplined way of building. The standard travels; your governance rides on it. You're insulated from any one vendor's pricing, deprecation, or fall from the lead.

Standardize thisDon't lock this
The skill format (SKILL.md + bundled resources)The model vendor / specific LLM
The governance (posture, systems-touched, review gate)The tool that discovers and runs skills
The review & distribution pipelineA proprietary, single-vendor skill schema
Tools that read SKILL.md

Claude Code, OpenAI Codex, Google Gemini CLI, GitHub Copilot, Cursor, and a growing list of others (30+ by 2026) load the same files. Adoption count is volatile โ€” verify the current list at agentskills.io before quoting a number.

The Maturity Path: Closing the Loop

A skill library matures in a predictable sequence โ€” monitor → diagnose → fix → ship. You start with isolated, read-only helpers and add connective tissue over time. The point isn't full autonomy; it's that the org moves from "isolated helpers" to "a reviewed pipeline" without ever giving up the human checkpoint on anything that writes. The loop tightens; oversight never leaves.

Stage 1
Monitor
Read-only helpers that surface a class of issue.
Stage 2
Diagnose
Skills that safely root-cause; encode reviewer judgment.
Stage 3
Fix (draft)
Turn a diagnosis into a draft fix โ€” human approves.
Stage 4
Ship (verify)
Check the fix vs. the source of truth before it ships.
StageExample skillPostureHuman checkpoint
MonitorSafely diagnose a class of production issue; pr-code-review encoding senior reviewers' judgmentread-onlyNot yet writing โ€” informational
Diagnoseincident-triage orchestrator that composes the read-only helpers into one flowread-onlyHuman reads the hypothesis
FixA skill that turns a diagnosis into a draft fixdrafts-onlyHuman approves the draft
Shipchange-verification step that checks a fix against the source of truth before it shipsgated-writeNamed reviewer on the write
The loop tightens; oversight stays

Manual toil falls as the stages connect into a pipeline. But the human checkpoint on anything that writes is not one of the things you optimize away โ€” it's the constant that lets you optimize everything around it.

Governance Is the Velocity Play, Not the Brake

The instinct is that guardrails slow you down. The opposite is true: the guardrails are what let you go fast and keep going fast. Ungoverned AI use is quick โ€” right up until the first bad write to a system of record, and that single incident sets you back further than the slow status quo ever would.

Guardrails are why adoption scales

They're the reason adoption gets past a handful of enthusiasts without an incident forcing a halt. The org that never has the headline-making bad write keeps compounding; the one that does spends its next two quarters on remediation and a moratorium.

The shape lesson

Big, all-or-nothing platform programs concentrate risk in a single basket and a single cutover โ€” and most of us have watched one fail. A governed skill library is the opposite shape: small reviewed increments, each owned, each earning the next.

Big-bang AI platform programGoverned skill library
Risk shapeOne basket, one cutoverMany small reviewed increments
OriginMandated top-downEmerges from a real problem a team solved
Failure modeWhole program fails at onceOne skill is rejected; the rest stand
Return over timeBinary โ€” works or doesn'tCompounds with each contribution
Shared pattern, not mandated skills

The pattern is shared, but the skills aren't dictated from the top โ€” each emerges from a real problem a specific team already solved, captured once and handed to everyone. That's why it compounds: a reviewed library gets stronger with every skill a team contributes. A thousand private prompts do not.

Anti-Patterns & What to Do Instead

The fastest way to a safe, fast AI practice is usually to stop doing these โ€” not to add a new tool.

Anti-pattern

Banning AI assistants outright โ€” driving the use underground.

Do instead

Make the governed path the easy path; capture the use, don't police it.

Anti-pattern

Letting the agent write directly to a shared system "to save a step."

Do instead

Read-only by default; the agent drafts, a human approves every write.

Anti-pattern

Putting the human approval inside the agent's prompt/output.

Do instead

Place the checkpoint one layer below the agent, beyond prompt injection's reach.

Anti-pattern

A gate that auto-merges when checks pass โ€” automation as judge.

Do instead

The gate produces a report, not a verdict; a person merges.

Anti-pattern

Retrofitting governance after the library already exists.

Do instead

Build the review gate before your third skill โ€” far cheaper than bolting it on.

Anti-pattern

Shipping skills with no usage/cost instrumentation.

Do instead

Instrument from day one โ€” you can't manage safety or spend you can't see.

Anti-pattern

Personal forks of a skill drifting from the reviewed copy.

Do instead

Distribute via an auto-synced channel; the reviewed copy is the running copy.

Anti-pattern

Treating wide distribution as permission to write.

Do instead

Distribution ≠ permission. Earn autonomy per skill, from telemetry.

Anti-pattern

Encoding institutional knowledge in one vendor's proprietary format.

Do instead

Use the open SKILL.md format; standardize format + governance, not the model.

Anti-pattern

Launching a big-bang AI platform with one risky cutover.

Do instead

Grow a library of small reviewed increments, each earning the next.

Anti-pattern

Status inflation โ€” calling a draft "built" or a prototype "in production."

Do instead

Use precise status words (drafted ≠ prototyped ≠ built); status inflation is how an AI program loses credibility with risk & audit.

Anti-pattern

Reviewing skills only one at a time, letting two authors ship overlapping ones.

Do instead

Add a catalog-level overlap review; overlapping authorship is the failure mode that quietly kills a skills platform.

Anti-pattern

Relying on catalog or department segmentation as a security boundary.

Do instead

Treat the connector credential as the real access boundary; catalog membership is documentation & ergonomics, not permission.

Getting Started

You don't need a program or a budget line โ€” you need the first reviewed skill and the discipline to grow from there. Work the six steps in order; tick them off as you go (progress saves in your browser). The whole thing is one bet, repeated: capture expertise once, review it, and let everyone build on it.

0 of 6 done
Start with one real, repeated problem
Pick something a person already solves by hand. Capture it as a single skill โ€” not a platform. Repetition plus a known-good method is the bar; resist the urge to build a framework.
Declare what it touches, default it to read-only
List every system of record it reads. Set safety_posture: read-only. If it can only read, it can't break the system of record โ€” that property is the safety, not a promise to be careful.
Build the review gate before your third skill
Deterministic checks (secrets/PII scan, naming, declared-vs-actual systems) + a named reviewer for high-stakes skills. The gate emits a report, not a verdict. Retrofitting governance later is far harder than starting with it.
Instrument from day one
Usage and cost per skill, in one place. You can't reason about adoption or safety you can't see โ€” and you'll need this telemetry to justify autonomy later.
Keep the human checkpoint on every write
Earn autonomy per skill, from telemetry โ€” never grant it by default. Place the checkpoint one layer below the agent so prompt injection can't reach it.
Use the open SKILL.md format
So nothing you build is hostage to one vendor. Standardize the format and the governance, not the model. Your reviewed library then travels to whatever tool the team lands on.
The repeated bet

Each new skill is the same loop again: a repeated problem → declared posture → reviewed once → governed deploy → instrumented → autonomy earned. Do it once well and the seventh skill is far cheaper than the first.

A Worked Example: A Regulated Lender

This was built end-to-end as the embedded AI lead at a regulated commercial lender โ€” ~3 dozen people across 9 departments, in a 2-month engagement: a CI-distributed skill marketplace, a publish-and-audit governance gate, and AI observability stood up from zero.

Skills shipped
~30
19 firm-authored, across 10 plugins
30-day model spend
~$8,980
3,685 sessions ยท ~$2.44 each
Latent capacity
30โ€“80 hr/wk
measured, gated on adoption
The single most instructive moment

On its first run, a follow-up skill was about to request a borrower's Social Security number โ€” from the wrong people: 4 of the first 5 CRM contacts were brokers, not borrowers, because the skill had inherited a wrong assumption about the source of truth. The gate, the named reviewer, and the drafts-only default existed precisely so this surfaced as a caught request, not a leaked SSN to an intermediary. The fixes became rules: treat the CRM lead-status as authoritative over the application API, gate the SSN ask on a tax-ID-on-file flag, and skip broker-intermediated deals entirely. Every failure like that became an architectural rule โ€” which is how the four invariants above were earned, not theorized.

Skill marketplace

CI-distributed so everyone loads the reviewed copy โ€” no drifted forks. (Invariant 4.)

Publish-and-audit gate

A compliance-routing review gate: deterministic checks, high-stakes skills to a named reviewer. (Lifecycle step 3.)

AI observability

Usage and cost per skill, from zero โ€” so autonomy could be earned on evidence. (Invariant 4 + step 5.)

The thesis, in one line

In a small, regulated, system-of-record-centric business, the highest-leverage AI work is not the cleverest agent โ€” it's the platform that lets non-engineers ship reviewed, observable, drafts-only automation on top of the existing stack, without ever competing with the system of record. Get the substrate right and capability compounds safely; get it wrong and you accumulate unauditable risk that someone eventually has to unwind.

Full writeup โ€” including the observability backend and the compliance-routing gate: Founding the AI delivery function at a regulated lender.

Glossary

TermMeaning
SkillA reviewed, versioned instruction module โ€” a folder of instructions, scripts, and resources an agent loads on demand, with a declared safety posture and systems-touched list.
SKILL.mdThe entry-point file of a skill: YAML frontmatter (required: name, description) plus instructions. The core of the open Agent Skills standard.
Agent SkillsThe open standard for the skill format, published by Anthropic on 18 Dec 2025 at agentskills.io; read by 30+ tools by 2026.
System of recordThe authoritative store the business runs on โ€” canonical truth, not a copy that can be overwritten.
Safety postureA skill's declared write-capability: read-only, drafts-only, or gated-write.
Systems-touchedThe explicit list of records/stores a skill reads or proposes to write, declared in the skill and checked by the gate.
Read-only by defaultThe architectural default: a skill can read but not change shared systems unless a write scope is explicitly earned.
Review gateA merge-time check that runs deterministic scans and routes high-stakes skills to a named reviewer. Produces a report, not a verdict.
Named reviewerThe accountable human who owns approval of a higher-stakes skill (regulated data, money, production writes).
Human checkpointThe point where a person validates output against the source of truth โ€” placed one layer below the agent, beyond prompt injection's reach.
Prompt injectionA hostile input that manipulates an agent's behavior or the text a human is asked to approve; why checkpoints must sit outside the agent.
ObservabilityPer-skill visibility into usage, cost, and outcomes โ€” the gap most orgs hit first; build it in from day one.
DriftDivergence between the reviewed copy of a skill and what's actually running (e.g. personal forks). Prevented by auto-synced distribution.
Governed deployShipping a skill through an auto-synced channel so everyone loads the reviewed copy.
AutonomyHow much a skill may do without per-action human approval โ€” earned per skill from telemetry, never granted by default.
Progressive disclosureThe design that loads only a skill's name + description (~30โ€“50 tokens) at startup, expanding to full instructions only when it activates.
Shadow AIUngoverned, invisible AI use inside an org โ€” ungrounded copy/paste, quiet low-quality use, and siloed sophistication.
Distribution ≠ permissionThe rule that a skill being widely available does not grant it the right to write or act autonomously.