Governing Agentic AI: A Field Guide for Software Organizations

The Problem: Shadow AI Is Already in Your Org

The adoption question is settled. Your engineers are already using AI assistants — the open question is whether that use is governed. Ungoverned, it shows up in three forms, and all three are velocity you are buying with risk you can't see. A governed standard turns all three into an asset. The goal isn't to police AI use; it's to capture it.

Ungrounded copy/paste

Output that was never checked against the real systems, pasted in because it looked right. The plausible-but-wrong answer is the dangerous one — it passes a glance and fails in production.

Quiet, low-quality use

People who treat AI as cheating, so they use it in private and carry the risk without ever submitting it for review. The work never gets a second set of eyes precisely because it's hidden.

Siloed sophistication

One engineer works out how to safely do something hard — and that hard-won method dies in their session history instead of becoming something the team inherits.

The reframe

Each failure mode has the same root: AI use that is invisible and unreviewed. You can't ban it (adoption already won) and you shouldn't want to. The move is to make the safe path the easy path — capture the expertise once, review it, and hand it to everyone.

The Core Bet: The Substrate, Not the Agent

The value isn't any single clever automation. Anyone can demo an agent; almost no one builds the foundation. The value is the shared, trusted foundation that lets the whole team build on top of it safely.

A reviewed library compounds; a thousand private prompts do not.

The unit of delivery is a skill

A reviewed, versioned instruction module — a folder of instructions, scripts, and resources an agent loads on demand — with a declared safety posture and an explicit list of the systems it touches. One engineer captures "here's how we safely do X" once; it's reviewed once; then anyone on the team loads the reviewed copy.

Why it compounds

Knowledge stops dying in session histories. Every contributed skill makes the next problem cheaper to solve and the library more valuable. Private prompts have no such network effect — each one is solved, used once, and lost.

Dimension	A thousand private prompts	A reviewed skill library
Reuse	None — dies in one session	Loaded by the whole team
Review	Never seen by a second person	Reviewed once, inherited by all
Safety posture	Implicit, per-person, invisible	Declared and enforced per skill
Cost / usage	Unmeasured	Instrumented per skill
Value over time	Flat — resets every prompt	Compounds with each contribution

The Golden Rule

Everything rests on one rule. A skill exists to make a knowledgeable person faster, not to replace their judgment.

LLMs are safe when a knowledgeable human validates output against a known source of truth. — The single load-bearing principle; every invariant below is a corollary of it.

Knowledgeable human

Not any human — someone who can tell a right answer from a plausible one. The skill accelerates an expert; it doesn't manufacture expertise that isn't in the room.

Validates output

An active check, not a rubber stamp. The human's job is the verification step the model can't be trusted to do on itself.

Known source of truth

Validation needs a referent: the authoritative system, spec, or record the answer can be checked against. No source of truth → no safe validation.

The Four Invariants

From the Golden Rule, four properties that must hold for every skill, always. These are architectural guarantees, not policy reminders — build them so they can't be violated, don't just ask people to remember them.

1The system of record always wins

The system of record is the authoritative store the business runs on — the canonical truth, not a copy that can be overwritten. The agent reads that truth and drafts against it; it never silently mutates a shared environment or invents state. The same discipline any serious business applies to its books: nothing — human or agent — writes to the record without a person accountable for the write.

2Read-only by default; writes are earned and gated

Read-only is an architectural property, not a reminder. The agent drafts or diagnoses; a human approves anything that changes a shared system. A skill that can only read cannot break the system of record — that's the whole point of the default.

3Higher-stakes paths route to a named human

Anything touching regulated data, money, or a production write is flagged and owned by a named reviewer. Critically, the human checkpoint sits where prompt injection cannot reach it: one layer below the agent, not inside it. A check the agent can talk its way past is not a check.

4Everything is observable; nothing drifts

Each skill declares what it touches, the copy that runs is the reviewed copy (no personal forks), and usage and cost are visible per skill. Observability is the gap most orgs hit first — build it in from day one, not after the first incident. And it's two jobs, not one: operational analytics (who uses what, at what cost) and agent-quality (why a bad output happened) — stand up both.

Invariant	What it prevents	How you enforce it (not just ask)
System of record wins	Silent mutation, invented state, an agent overwriting canonical truth	Agent gets read creds to the record; writes go through a separate, accountable path
Read-only by default	Any unreviewed change to a shared system	Default scope = read; write scopes granted explicitly, per skill
Named human on high-stakes	Unowned writes to regulated data / money / prod; injection bypassing the check	Route flag → named reviewer; checkpoint outside the agent's context
Observable, no drift	Personal forks, blind spend, "which version ran?"	Auto-synced reviewed copy; declared systems-touched; per-skill usage/cost telemetry

The injection corollary

If your human approval lives inside the agent's prompt or output, a hostile input can rewrite the very text the human is reading. Put the gate one layer down — in the deploy pipeline, the merge, the write-path — where the model can't author what the reviewer sees. And isolate the sessions that read untrusted content (inbound email, scraped pages) from the sessions that can act, so the injection chain can't form. In one regulated-lender deployment, that isolation plus a native send-confirmation checkpoint cut prompt-injection success from ~24% unprotected to ~1%.

The Development Lifecycle

Every capability moves through the same reviewed path — from a real, repeated problem to a governed, instrumented skill whose autonomy is earned, never assumed. Distribution is not permission.

1 · Intake

Candidate

A repeated problem → a candidate skill, not a private prompt.

2 · Build

Author

Versioned: declared posture, systems-touched list, tests.

3 · Review

Gate

Deterministic checks + named reviewer. Report, not verdict.

4 · Deploy

Govern

Auto-synced channel; everyone loads the reviewed copy.

5 · Observe

Instrument

Usage, cost-per-skill, outcomes.

6 · Autonomy

Earn

Read-only by default; broaden only when telemetry justifies.

Intake repeated problem → candidate

A real, repeated problem a person already solves by hand becomes a candidate skill. The bar is repetition + a known-good method — not novelty. If it only happens once, it's not a skill; if nobody can yet do it safely by hand, it's not ready to encode.

Build declared posture + tests

Author it as a versioned skill with three non-negotiables:

Declared safety posture — read-only vs. drafts vs. gated-write.
Explicit systems-touched list — every system of record / data store it reads or proposes to write.
Tests — so the reviewed behavior is the behavior that ships.

Review report, not verdict

A merge gate runs deterministic checks and routes higher-stakes skills to a named reviewer:

Secrets & PII scan.
Naming conventions.
Declared-vs-actual systems — does what it touches match what it claims?

The gate produces a report, not a verdict — a human merges. The gate never merges. Automation finds; a person decides.

Governed deploy no drifted forks

The skill ships through an auto-synced channel (e.g. CI-distributed) so everyone loads the reviewed copy. No personal forks, no "the version I'm running is different." The reviewed copy is the running copy.

Observe cost-per-skill

Usage, cost-per-skill, and outcomes are instrumented. You cannot reason about adoption or safety you can't see — and you can't justify autonomy (next step) without the telemetry to back it.

Earn autonomy distribution ≠ permission

Read-only and drafts-only by default; broader autonomy is granted per-skill, only when telemetry justifies it. A skill being widely distributed does not mean it has earned the right to write. Autonomy is a privilege a skill earns from its own track record, one capability at a time. Calibrate the bar to reality: comparable sales-AI workflows see roughly 70–80% draft acceptance, not 99%+ — the telemetry tells you when a skill has earned more rope, not an aspirational target.

Anatomy of a Skill

A skill is a folder whose entry point is a SKILL.md file: YAML frontmatter (required: name + description) plus instructions, and optionally bundled scripts, templates, and reference files the agent loads on demand. The governance metadata — safety posture and systems-touched — is what turns a prompt into a governed capability.

incident-triage/SKILL.md — a read-only diagnostic skill

---
name: incident-triage
description: Diagnose a production incident from logs and metrics and
  draft a root-cause hypothesis. Read-only; never writes or restarts services.
# --- governance metadata (org convention, enforced by the review gate) ---
safety_posture: read-only        # read-only | drafts-only | gated-write
systems_touched:
  - observability-platform   # READ
  - incident-tracker         # READ
stakes: standard               # standard | high (routes to a named reviewer)
owner: sre-oncall
---

# Incident Triage
1. Pull the error spike window from the observability platform.
2. Correlate with recent deploys and the incident tracker.
3. **Draft** a ranked root-cause hypothesis for a human to confirm.
   Never restart, roll back, or modify any service — propose, don't act.

What every governed skill declares

name / description — the only fields the open spec requires; the description is how the agent decides when to load it.
safety_posture — read-only, drafts-only, or gated-write.
systems_touched — every record/store it reads or proposes to write, with the access kind.
stakes / owner — high-stakes routes to a named reviewer; an owner is accountable.

Progressive disclosure

A skill costs only ~30–50 tokens at startup (just its name + description); the full instructions load only when the skill activates for a task. An agent with 100 skills installed pays roughly 3,000–5,000 tokens at session start. That's what makes a large governed library cheap to carry — you can install everything and pay only for what fires. (as of Dec 2025 spec)

Build on the Open Standard, Not a Vendor

A crucial design choice: don't lock your institutional knowledge to one model vendor. What you standardize is the format and the governance, not the model. Your reviewed library outlives any single vendor's moment in the lead.

The `SKILL.md` / Agent Skills standard

Originally from Anthropic and published as an open spec on 18 Dec 2025 (hosted at agentskills.io). The same skill files are portable across tools — only discovery and distribution differ per tool. By 2026, 30+ tools read the same SKILL.md from the same folder structure.

Why portability is the safety play

A team that arrives on a different toolset joins the same disciplined way of building. The standard travels; your governance rides on it. You're insulated from any one vendor's pricing, deprecation, or fall from the lead.

Standardize this	Don't lock this
The skill format (`SKILL.md` + bundled resources)	The model vendor / specific LLM
The governance (posture, systems-touched, review gate)	The tool that discovers and runs skills
The review & distribution pipeline	A proprietary, single-vendor skill schema

Tools that read SKILL.md

Claude Code, OpenAI Codex, Google Gemini CLI, GitHub Copilot, Cursor, and a growing list of others (30+ by 2026) load the same files. Adoption count is volatile — verify the current list at agentskills.io before quoting a number.

The Maturity Path: Closing the Loop

A skill library matures in a predictable sequence — monitor → diagnose → fix → ship. You start with isolated, read-only helpers and add connective tissue over time. The point isn't full autonomy; it's that the org moves from "isolated helpers" to "a reviewed pipeline" without ever giving up the human checkpoint on anything that writes. The loop tightens; oversight never leaves.

Stage 1

Monitor

Read-only helpers that surface a class of issue.

Stage 2

Diagnose

Skills that safely root-cause; encode reviewer judgment.

Stage 3

Fix (draft)

Turn a diagnosis into a draft fix — human approves.

Stage 4

Ship (verify)

Check the fix vs. the source of truth before it ships.

Stage	Example skill	Posture	Human checkpoint
Monitor	Safely diagnose a class of production issue; `pr-code-review` encoding senior reviewers' judgment	read-only	Not yet writing — informational
Diagnose	`incident-triage` orchestrator that composes the read-only helpers into one flow	read-only	Human reads the hypothesis
Fix	A skill that turns a diagnosis into a draft fix	drafts-only	Human approves the draft
Ship	`change-verification` step that checks a fix against the source of truth before it ships	gated-write	Named reviewer on the write

The loop tightens; oversight stays

Manual toil falls as the stages connect into a pipeline. But the human checkpoint on anything that writes is not one of the things you optimize away — it's the constant that lets you optimize everything around it.

Governance Is the Velocity Play, Not the Brake

The instinct is that guardrails slow you down. The opposite is true: the guardrails are what let you go fast and keep going fast. Ungoverned AI use is quick — right up until the first bad write to a system of record, and that single incident sets you back further than the slow status quo ever would.

Guardrails are why adoption scales

They're the reason adoption gets past a handful of enthusiasts without an incident forcing a halt. The org that never has the headline-making bad write keeps compounding; the one that does spends its next two quarters on remediation and a moratorium.

The shape lesson

Big, all-or-nothing platform programs concentrate risk in a single basket and a single cutover — and most of us have watched one fail. A governed skill library is the opposite shape: small reviewed increments, each owned, each earning the next.

	Big-bang AI platform program	Governed skill library
Risk shape	One basket, one cutover	Many small reviewed increments
Origin	Mandated top-down	Emerges from a real problem a team solved
Failure mode	Whole program fails at once	One skill is rejected; the rest stand
Return over time	Binary — works or doesn't	Compounds with each contribution

Shared pattern, not mandated skills

The pattern is shared, but the skills aren't dictated from the top — each emerges from a real problem a specific team already solved, captured once and handed to everyone. That's why it compounds: a reviewed library gets stronger with every skill a team contributes. A thousand private prompts do not.

Anti-Patterns & What to Do Instead

The fastest way to a safe, fast AI practice is usually to stop doing these — not to add a new tool.

Anti-pattern

Banning AI assistants outright — driving the use underground.

Do instead

Make the governed path the easy path; capture the use, don't police it.

Anti-pattern

Letting the agent write directly to a shared system "to save a step."

Do instead

Read-only by default; the agent drafts, a human approves every write.

Anti-pattern

Putting the human approval inside the agent's prompt/output.

Do instead

Place the checkpoint one layer below the agent, beyond prompt injection's reach.

Anti-pattern

A gate that auto-merges when checks pass — automation as judge.

Do instead

The gate produces a report, not a verdict; a person merges.

Anti-pattern

Retrofitting governance after the library already exists.

Do instead

Build the review gate before your third skill — far cheaper than bolting it on.

Anti-pattern

Shipping skills with no usage/cost instrumentation.

Do instead

Instrument from day one — you can't manage safety or spend you can't see.

Anti-pattern

Personal forks of a skill drifting from the reviewed copy.

Do instead

Distribute via an auto-synced channel; the reviewed copy is the running copy.

Anti-pattern

Treating wide distribution as permission to write.

Do instead

Distribution ≠ permission. Earn autonomy per skill, from telemetry.

Anti-pattern

Encoding institutional knowledge in one vendor's proprietary format.

Do instead

Use the open SKILL.md format; standardize format + governance, not the model.

Anti-pattern

Launching a big-bang AI platform with one risky cutover.

Do instead

Grow a library of small reviewed increments, each earning the next.

Anti-pattern

Status inflation — calling a draft "built" or a prototype "in production."

Do instead

Use precise status words (drafted ≠ prototyped ≠ built); status inflation is how an AI program loses credibility with risk & audit.

Anti-pattern

Reviewing skills only one at a time, letting two authors ship overlapping ones.

Do instead

Add a catalog-level overlap review; overlapping authorship is the failure mode that quietly kills a skills platform.

Anti-pattern

Relying on catalog or department segmentation as a security boundary.

Do instead

Treat the connector credential as the real access boundary; catalog membership is documentation & ergonomics, not permission.

Getting Started

You don't need a program or a budget line — you need the first reviewed skill and the discipline to grow from there. Work the six steps in order; tick them off as you go (progress saves in your browser). The whole thing is one bet, repeated: capture expertise once, review it, and let everyone build on it.

0 of 6 done

Start with one real, repeated problem

Pick something a person already solves by hand. Capture it as a single skill — not a platform. Repetition plus a known-good method is the bar; resist the urge to build a framework.

Declare what it touches, default it to read-only

List every system of record it reads. Set safety_posture: read-only. If it can only read, it can't break the system of record — that property is the safety, not a promise to be careful.

Build the review gate before your third skill

Deterministic checks (secrets/PII scan, naming, declared-vs-actual systems) + a named reviewer for high-stakes skills. The gate emits a report, not a verdict. Retrofitting governance later is far harder than starting with it.

Instrument from day one

Usage and cost per skill, in one place. You can't reason about adoption or safety you can't see — and you'll need this telemetry to justify autonomy later.

Keep the human checkpoint on every write

Earn autonomy per skill, from telemetry — never grant it by default. Place the checkpoint one layer below the agent so prompt injection can't reach it.

Use the open SKILL.md format

So nothing you build is hostage to one vendor. Standardize the format and the governance, not the model. Your reviewed library then travels to whatever tool the team lands on.

The repeated bet

Each new skill is the same loop again: a repeated problem → declared posture → reviewed once → governed deploy → instrumented → autonomy earned. Do it once well and the seventh skill is far cheaper than the first.

A Worked Example: A Regulated Lender

This was built end-to-end as the embedded AI lead at a regulated commercial lender — ~3 dozen people across 9 departments, in a 2-month engagement: a CI-distributed skill marketplace, a publish-and-audit governance gate, and AI observability stood up from zero.

Skills shipped

~30

19 firm-authored, across 10 plugins

30-day model spend

~$8,980

3,685 sessions · ~$2.44 each

Latent capacity

30–80 hr/wk

measured, gated on adoption

The single most instructive moment

On its first run, a follow-up skill was about to request a borrower's Social Security number — from the wrong people: 4 of the first 5 CRM contacts were brokers, not borrowers, because the skill had inherited a wrong assumption about the source of truth. The gate, the named reviewer, and the drafts-only default existed precisely so this surfaced as a caught request, not a leaked SSN to an intermediary. The fixes became rules: treat the CRM lead-status as authoritative over the application API, gate the SSN ask on a tax-ID-on-file flag, and skip broker-intermediated deals entirely. Every failure like that became an architectural rule — which is how the four invariants above were earned, not theorized.

Skill marketplace

CI-distributed so everyone loads the reviewed copy — no drifted forks. (Invariant 4.)

Publish-and-audit gate

A compliance-routing review gate: deterministic checks, high-stakes skills to a named reviewer. (Lifecycle step 3.)

AI observability

Usage and cost per skill, from zero — so autonomy could be earned on evidence. (Invariant 4 + step 5.)

The thesis, in one line

In a small, regulated, system-of-record-centric business, the highest-leverage AI work is not the cleverest agent — it's the platform that lets non-engineers ship reviewed, observable, drafts-only automation on top of the existing stack, without ever competing with the system of record. Get the substrate right and capability compounds safely; get it wrong and you accumulate unauditable risk that someone eventually has to unwind.

Full writeup — including the observability backend and the compliance-routing gate: Founding the AI delivery function at a regulated lender.

Glossary

Term	Meaning
Skill	A reviewed, versioned instruction module — a folder of instructions, scripts, and resources an agent loads on demand, with a declared safety posture and systems-touched list.
SKILL.md	The entry-point file of a skill: YAML frontmatter (required: `name`, `description`) plus instructions. The core of the open Agent Skills standard.
Agent Skills	The open standard for the skill format, published by Anthropic on 18 Dec 2025 at agentskills.io; read by 30+ tools by 2026.
System of record	The authoritative store the business runs on — canonical truth, not a copy that can be overwritten.
Safety posture	A skill's declared write-capability: read-only, drafts-only, or gated-write.
Systems-touched	The explicit list of records/stores a skill reads or proposes to write, declared in the skill and checked by the gate.
Read-only by default	The architectural default: a skill can read but not change shared systems unless a write scope is explicitly earned.
Review gate	A merge-time check that runs deterministic scans and routes high-stakes skills to a named reviewer. Produces a report, not a verdict.
Named reviewer	The accountable human who owns approval of a higher-stakes skill (regulated data, money, production writes).
Human checkpoint	The point where a person validates output against the source of truth — placed one layer below the agent, beyond prompt injection's reach.
Prompt injection	A hostile input that manipulates an agent's behavior or the text a human is asked to approve; why checkpoints must sit outside the agent.
Observability	Per-skill visibility into usage, cost, and outcomes — the gap most orgs hit first; build it in from day one.
Drift	Divergence between the reviewed copy of a skill and what's actually running (e.g. personal forks). Prevented by auto-synced distribution.
Governed deploy	Shipping a skill through an auto-synced channel so everyone loads the reviewed copy.
Autonomy	How much a skill may do without per-action human approval — earned per skill from telemetry, never granted by default.
Progressive disclosure	The design that loads only a skill's name + description (~30–50 tokens) at startup, expanding to full instructions only when it activates.
Shadow AI	Ungoverned, invisible AI use inside an org — ungrounded copy/paste, quiet low-quality use, and siloed sophistication.
Distribution ≠ permission	The rule that a skill being widely available does not grant it the right to write or act autonomously.

The Problem: Shadow AI Is Already in Your Org

Ungrounded copy/paste

Quiet, low-quality use

Siloed sophistication

The Core Bet: The Substrate, Not the Agent

The unit of delivery is a skill

Why it compounds

The Golden Rule

Knowledgeable human

Validates output

Known source of truth

The Four Invariants

1The system of record always wins

2Read-only by default; writes are earned and gated

3Higher-stakes paths route to a named human

4Everything is observable; nothing drifts

The Development Lifecycle

Anatomy of a Skill

What every governed skill declares

Progressive disclosure

Build on the Open Standard, Not a Vendor

The SKILL.md / Agent Skills standard

Why portability is the safety play

The Maturity Path: Closing the Loop

Governance Is the Velocity Play, Not the Brake

Guardrails are why adoption scales

The shape lesson

Anti-Patterns & What to Do Instead

Getting Started

A Worked Example: A Regulated Lender

Skill marketplace

Publish-and-audit gate

AI observability

Glossary

Related Cheatsheets

AI Safety Ecosystem Hub

System Prompt Builder & Engineering Guide

Modern DevOps Pipeline

AI Frontier Model Builders

The `SKILL.md` / Agent Skills standard