Quick Reference — the mental model in one table
| Concept | What it really is |
|---|---|
| Blob | Raw bytes of one file's content — no filename, no mode, no history. Identical content = one blob, shared by every file/commit that has it. |
| Tree | A directory listing: (mode, name) → blob/tree SHA. Nested trees make a nested directory structure. |
| Commit | A pointer to one tree + zero-or-more parent commits + author/committer/message. A snapshot, never a diff. |
| SHA (object ID) | The object's address and its integrity check — hash of the object's own bytes. Change one byte anywhere, the hash (and every descendant's hash) changes. |
| Branch | A 41-byte text file (.git/refs/heads/<name>) holding a commit SHA. Moving a branch is a single file write — no data is copied. |
| HEAD | Usually a symbolic pointer to the current branch ref. Points directly at a commit SHA instead = "detached HEAD." |
| Index (staging area) | A separate binary file (.git/index) listing what the next commit's tree will contain — not a commit-in-progress, not the working directory. |
| The DAG | Commits form a directed acyclic graph via parent pointers. Merges have 2+ parents; the graph can only grow forward because a commit's hash depends on its parent's hash, which must already exist. |
| Merge base | The best common ancestor of two branch tips in the DAG — the reference point for a three-way merge. |
| Rebase | Re-apply commits onto a new parent, one at a time. New parent ⇒ new hash for every commit from that point forward, even if the content is identical. |
§1
The big idea: a content-addressable filesystem
Git is a key-value store, not a "diff tracker"
Linus Torvalds described Git's core as "a stupid content tracker" — and that's the accurate mental model. Underneath the porcelain (add, commit, merge...) sits a tiny key-value database: the key is a SHA hash, the value is a compressed blob of bytes, and there are exactly four kinds of values (objects). Everything Git does — history, branching, diffing, merging — is built on top of that one primitive.
Commits are snapshots, not diffs — a common misconception
Unlike older centralized VCS tools (CVS, Subversion) that store a file's history as a chain of diffs/deltas, every Git commit points at a complete tree — the full state of every file, as it existed at that commit. git diff, git log -p, and friends compute differences on the fly by comparing two full snapshots; nothing is stored as a diff at the object level.
Three separate areas, three separate jobs
Every Git command can be understood as copying data between three places:
- Working directory — the plain files on disk you edit with a normal editor. Not part of the object database at all.
- Index / staging area — a snapshot-in-progress recorded in
.git/index. Whatgit commitwill turn into a tree. - Repository (object database + refs) — the immutable object graph plus the mutable pointers (branches, HEAD, tags) into it.
§2
The object model: blobs, trees, commits & tags
The four object types — what each one stores
| Type | Stores | Inspect with |
|---|---|---|
| blob | Raw file bytes only. No name, no path, no permission bits — those live in the tree that references it. | git cat-file -p <sha> |
| tree | A sorted list of entries: mode name\0<20-byte binary sha> per entry — one directory level. Subdirectories are just entries whose SHA points at another tree. | git ls-tree <sha> |
| commit | One tree SHA, zero-or-more parent SHAs, author, committer, timestamps, and the message. Never a diff. | git cat-file -p <sha> / git show <sha> |
| tag (annotated) | A target object's type + SHA, tagger identity, message, and optionally a GPG/SSH signature. A durable, first-class object — unlike a lightweight tag. | git cat-file -p <sha> |
git tag v1.0 (no -a) just writes a ref file at .git/refs/tags/v1.0 containing a commit SHA — structurally identical to a branch, except convention says you don't move it. Use annotated tags (git tag -a) for anything you'll ship; they carry metadata and can be signed.Gitlinks — the quasi-fifth reference (submodules)
A tree entry can carry mode 160000 — a gitlink — whose "SHA" is a commit hash in an entirely different repository, not a blob/tree in this one. That's the whole mechanism behind submodules: the parent repo records exactly one pinned commit of the child repo, with no copy of its objects. git submodule update is what actually fetches and checks out that pinned commit into the working directory.
§3
Content addressing & the object database
How a SHA is actually computed
An object's ID is the hash of a small header plus its content: sha1("<type> <byte-length>\0" + content). The header means a blob containing exactly the same bytes as a commit object (astronomically unlikely, but conceptually) would still hash differently, because the header text differs. This is why identical file content always produces the identical blob SHA, regardless of filename, path, or which commit it's in — the hash only ever depends on the bytes.
$ echo -n "hello git internals" | git hash-object --stdin
19a3d3d4a52002ac7f7ef476ffc2ba1de1471ec9
$ echo -n "hello git internals" | git hash-object --stdin
19a3d3d4a52002ac7f7ef476ffc2ba1de1471ec9 # identical input → identical hash, every timeOn-disk storage: loose objects vs. packfiles
A freshly-created object is written as a loose object: zlib-deflate-compressed and stored at .git/objects/xx/yyyy…, where xx is the first two hex characters of its SHA (a directory, so no single directory ever needs millions of entries) and the rest is the filename.
Loose objects are simple but wasteful — one per object, no cross-object compression. Periodically (or explicitly via git gc), Git collapses many loose objects into a single .pack file plus a .idx index for O(log n) lookup by SHA prefix, using delta compression: similar objects (e.g. successive versions of one file) are stored as a small diff against a chosen base object rather than in full. Full detail in §8.
Inspecting the object graph with plumbing
Porcelain commands (log, show, diff) are convenience wrappers. The plumbing commands underneath let you walk the exact same graph a commit points at:
$ git cat-file -p HEAD
tree 8f3d44b1c9e2a0f5b6d7c8a9b0e1f2a3b4c5d6e7
parent 1a2b3c4d5e6f7a8b9c0d1e2f3a4b5c6d7e8f9a0b
author David Veksler <[email protected]> 1782900000 -0700
committer David Veksler <[email protected]> 1782900000 -0700
Add README
$ git cat-file -p 8f3d44b1c9e2a0f5b6d7c8a9b0e1f2a3b4c5d6e7
100644 blob 3f2504e04f8964a875e69bfff8c5a4b8b5f2c1f3 README.md
100644 blob a1b2c3d4e5f60718293a4b5c6d7e8f9a0b1c2d3e main.py
040000 tree 9d0e2f1a2b3c4d5e6f708192a3b4c5d6e7f80912 src
$ git cat-file -p 3f2504e04f8964a875e69bfff8c5a4b8b5f2c1f3
# Hello, Git
$ git cat-file -t 8f3d44b1c9e2a0f5b6d7c8a9b0e1f2a3b4c5d6e7
treeSample hashes above are illustrative; run this against your own repo to see the real graph.
§4
Refs, HEAD & the index — the mutable layer
A ref is just a file containing a SHA
Everything in .git/objects is content-addressed and immutable. Branches, HEAD, and tags are the opposite: small, deliberately mutable pointers that make the immutable graph usable as version control.
| Ref | On disk | Contents |
|---|---|---|
refs/heads/main | .git/refs/heads/main | A commit SHA — the branch tip. Moves on every commit to that branch. |
HEAD | .git/HEAD | Normally ref: refs/heads/main (symbolic — "whatever main points at"). A raw SHA instead = detached HEAD. |
refs/tags/v1.0 | .git/refs/tags/v1.0 | A commit (or tag-object) SHA. Immutable by convention only — nothing stops you from moving it. |
refs/remotes/origin/main | .git/refs/remotes/origin/main | Local record of where origin's main was as of your last fetch/push. Read-only in normal use. |
The index (staging area) is a file, not a commit-in-progress
.git/index is a binary file: a header (signature, format version, entry count), then one sorted entry per staged path — mode, file size, mtime cache, and the blob SHA it would commit — followed by a checksum of the file itself. git add doesn't touch the object graph beyond writing a blob; it edits this one file.
git status mostly compares cached stat info in the index against the filesystem, only re-hashing files whose mtime/size actually changed.Build a commit entirely by hand (no porcelain)
The clearest way to internalize "porcelain is just choreography" is to do a commit's job yourself with plumbing:
$ echo "hello" > file.txt
$ BLOB=$(git hash-object -w file.txt) # writes a blob object, prints its SHA
$ git update-index --add --cacheinfo 100644 $BLOB file.txt # stage it (edit .git/index)
$ TREE=$(git write-tree) # index → tree object
$ COMMIT=$(git commit-tree $TREE -m "manual commit") # tree (+ parent, if any) → commit object
$ git update-ref refs/heads/main $COMMIT # move the branch pointer — that's the "commit"Four plumbing calls, one line of pointer-moving at the end. git commit is exactly this sequence, plus reading the previous HEAD as -p <parent> and appending a reflog entry.
Detached HEAD isn't dangerous, it's just unreachable
Checking out a commit or tag directly (git checkout <sha>) puts a raw SHA into .git/HEAD instead of a branch reference. You can commit, build, and test normally — those new commits are entirely valid objects. The only real risk: since no branch ref points at them, switching away leaves them unreachable and eligible for garbage collection (§8) unless you first anchor them with git branch <name> or git switch -c <name>.
§5
The commit DAG & merge base
Why the graph can only be acyclic
Each commit stores its parent(s) by hash, and a hash can only be computed after the thing it names already exists. A commit literally cannot reference a descendant, because that descendant's hash — which would have to include this commit as its parent — doesn't exist yet. The acyclic property isn't enforced by a rule Git checks; it falls directly out of hashing being one-directional in time.
Merge base: the reference point for a three-way merge
When you run git merge feature from main, Git first walks the DAG backward from both tips to find their best common ancestor — in the diagram, that's B. It then diffs base→ours (B→C) and base→theirs (B→F), applies non-overlapping changes automatically, and flags anything both sides touched as a conflict. Inspect it yourself with git merge-base main feature.
Criss-cross merges & the "virtual" merge base
If two branches have merged each other more than once, there can be multiple lowest common ancestors with no single "best" one. Git's default recursive/ort strategy handles this by first merging the candidate ancestors with each other to synthesize one virtual base, then doing the normal three-way merge against that. Most people never notice this machinery — until a merge conflicts in a spot that seems to have "nothing to do" with either branch's real changes, which is usually this case.
Ancestry shorthand: ~ vs. ^
HEAD~N— walk N steps back along first parents only. Meaningful for any commit, since it ignores side branches.HEAD^N— the Nth parent of a merge commit. Only meaningful at a merge (which is the only kind of commit with more than one parent);HEAD^1=HEAD^=HEAD~1.HEAD^2on commitMabove =F(the second parent, i.e. the branch that got merged in).
§6
Porcelain → plumbing: what commands really do
Every command, decoded at the object/ref level
| Command | What actually happens |
|---|---|
git add <file> | hash-object writes a blob for the file's current bytes; the index gets a new/updated entry (path, mode, blob SHA). No commit, no tree, yet. |
git commit | write-tree turns the index into a tree object (nested trees for subdirectories) → commit-tree wraps it with the old HEAD as parent → update-ref moves the branch to the new commit → a reflog line is appended. |
git branch <name> | update-ref refs/heads/<name> $(rev-parse HEAD) — one new 41-byte file. Index and working directory are untouched. |
git switch/checkout <branch> | HEAD becomes a symbolic ref to refs/heads/<branch>; index and working directory are overwritten to match that branch tip's tree. |
git checkout <commit> | Same tree checkout, but HEAD becomes a raw SHA — detached HEAD (§4). |
git merge (fast-forward) | Target is a descendant of current HEAD — the branch ref is simply reassigned forward. No new commit object is created. |
git merge (true merge) | Merge-base found (§5), three-way diff computed, a new commit object is created with two parents, branch ref moves to it. |
git rebase <base> | Each source commit is replayed (cherry-picked) onto the new parent in turn — every one becomes a brand-new commit object with a new hash (§7), because the parent changed. Branch ref is moved to the final new commit only at the end. |
git tag -a | Writes a tag object pointing at the target + a ref file at refs/tags/<name>. |
git stash | Creates commit objects (working-tree state, and index state) that are not reachable from any branch — referenced only via refs/stash and the stash reflog. |
git reset's three modes, precisely
| Mode | Moves branch ref | Resets index | Resets working dir |
|---|---|---|---|
--soft | ✅ | ❌ | ❌ |
--mixed (default) | ✅ | ✅ (unstages) | ❌ |
--hard | ✅ | ✅ | ✅ (discards edits) |
In every mode the commits you "reset away from" aren't deleted — they simply become unreachable from the branch ref, recoverable via the reflog (§8) until it expires or gc prunes them.
§7
Why rebase changes every hash downstream
A commit's hash is a function of its parent's hash
A commit object's bytes include the literal text of its parent SHA. Change the parent — even with the tree and message held perfectly identical — and the commit's own hash changes, because the input to the hash function changed. Every commit downstream references this commit's hash as its parent, so the change cascades through the entire remainder of the branch.
git rebase main (feature branched from B):
git rebase main while on feature:
feature points at E' — until reflog expiry / git gc prunes them (§8). This unreachability, multiplied across everyone who already pulled D/E, is exactly why rebasing shared/published history breaks collaborators: their branch's parent chain no longer matches anyone else's.Fast-forward vs. true merge — a graph-shape decision, not a preference
Git doesn't "choose" to fast-forward stylistically — it's forced whenever the target is a straight-line descendant of the current tip (no divergence to reconcile), and impossible otherwise. git merge --no-ff exists specifically to force a merge commit even when a fast-forward is possible, purely to keep a visible marker of "a branch merged here" in the graph.
§8
Packfiles, garbage collection & the reflog
Reachability is the whole GC model
Git never "deletes on undo." An object is kept as long as it's reachable — findable by walking parent/tree/blob pointers starting from some ref (a branch, tag, stash, or a reflog entry). Reset, rebase, branch deletion, and amend all just make certain commits unreachable from normal refs; the bytes remain on disk until an explicit collection pass decides otherwise.
git gc & pruning — the actual defaults
gc.auto= 6700 — once loose objects exceed roughly this count, ordinary commands opportunistically triggergit gc --autoto pack them.gc.autoPackLimit= 50 — once there are more than this many packs, they get consolidated into one.gc.pruneExpire= "2 weeks ago" — unreachable loose objects younger than this are kept (as a safety margin) even during an explicitgc.
The reflog: your real local undo history
Every time a ref (HEAD, a branch) moves on your machine, Git appends a line to that ref's reflog — a purely local, never-pushed journal of "where this pointer has been." It's the practical safety net under reset/rebase/amend: the old commit is still there, you just need its former reflog position.
gc.reflogExpire= 90 days for entries still reachable some other way.gc.reflogExpireUnreachable= 30 days for entries that are not reachable from any current ref (the common case after--amend/rebase).
$ git reflog # find the entry just before the mistake, e.g. HEAD@{2}
$ git branch recovered HEAD@{2} # anchor it to a real ref so it survives gc
# or, if you don't even remember a ref name — search every dangling object directly:
$ git fsck --no-reflogs --unreachable --full | grep commit
$ git show <dangling-sha>§9
Edge cases & advanced internals
SHA-1 → SHA-256: where the transition actually stands
Git has used a collision-detecting SHA-1 variant (hardened against the 2017 "SHAttered" attack) since 2017, so practical collision forgery against a Git repo isn't a live threat. A parallel SHA-256 object format exists (git init --object-format=sha256) and Git 2.51 marked it as the default hash for the planned Git 3.0 — but as of mid-2026, major hosting platforms (GitHub, GitLab) still don't support SHA-256 repositories, so the transition is gated on ecosystem support, not the Git client. SHA-1 and SHA-256 repos are designed to interoperate (a SHA-256 client can push/fetch to a SHA-1 server) once tooling catches up.
Shallow & partial clones deliberately break the graph
git clone --depth N fetches only the most recent N commits and synthesizes a "grafted" boundary — older parents genuinely don't exist locally. Anything that needs to walk past that boundary (full blame, some rebases, bisect across old history) fails or behaves oddly until git fetch --unshallow. --filter=blob:none (partial clone) takes the opposite approach: the full commit/tree graph is present, but blob content is fetched lazily on first access — different trade-off, same principle of deferring part of the object graph.
What a signature actually covers
Signing a commit or tag (-S, GPG or SSH) signs the object's own canonical serialized bytes — tree/parent/author/message for a commit — everything except the signature field itself, which gets appended. It authenticates that exact snapshot and metadata, not "the diff the author intended"; git verify-commit / git log --show-signature re-derive the object and check the signature against it.
Worktrees: multiple working directories, one object database
git worktree add ../hotfix hotfix-branch creates a second working directory + index, both pointed at the same .git/objects and refs as the original. It solves "I need two branches checked out simultaneously" without the disk/network cost of a second clone — a direct consequence of the working directory being just one of the three separate areas in §1.
Sparse checkout: the index doesn't have to mirror the whole tree
git sparse-checkout set <patterns> (cone mode) restricts which paths get materialized into the working directory, while the index and full commit history remain complete. Useful on monorepos where checking out every path would be prohibitively slow — the object graph is unaffected, only what gets written to disk for you to see.
Common misconceptions & anti-patterns
--force-with-lease. Plain --force blindly overwrites the remote ref; --force-with-lease refuses if the remote moved since your last fetch..gitignore untracks files." It only affects untracked files. Already-tracked files need git rm --cached first.reset --hard destroys data instantly. The commits become unreachable loose objects, recoverable via reflog for weeks (§8) — not deleted on the spot.git filter-repo/BFG) + force-push + everyone re-clones + rotate the credential.fetch and pull at the ref level. fetch only updates refs/remotes/*; it never touches your branches or working dir. pull = fetch + merge/rebase into your current branch.git fetch --unshallow.git branch <name> before switching away, or it becomes GC-eligible.