DVCS Philosophy
Distributed Version Control Systems (DVCS) like Git & Mercurial give every developer a full repository copy, enabling offline work, faster operations, and flexible workflows.
Core Idea
Unlike older Centralized VCS (like SVN or CVS) where developers checked out files from a single central server, DVCS provides each user with a complete, independent copy (clone) of the entire repository history.
Key Advantages
- Offline Work: Commit, branch, view history, and merge without needing a network connection.
- Performance: Most operations (commit, branch, merge, diff) are local and thus significantly faster.
- Redundancy: Every clone is effectively a full backup of the repository.
- Workflow Flexibility: Enables complex branching strategies and collaboration models (e.g., pull requests).
- Collaboration: Easily share changes between any two repositories, not just client-server.
Shift from Centralized
The move to DVCS addressed key limitations of CVCS, particularly around branching/merging complexity, performance bottlenecks, and reliance on network connectivity.
Repository (Repo)
A collection of files and the history of their changes. In DVCS, you have a local copy and interact with remote copies (e.g., on GitHub).
What it Contains
A repository stores all the files belonging to a project, plus the complete history of modifications made to those files over time. This history is typically stored as a series of commits.
Local vs. Remote
- Local Repository: The full copy residing on your own computer. This is where you do your work: edit files, commit changes, create branches.
- Remote Repository: A copy hosted on a server (e.g., GitHub, GitLab, a company server). Used for collaboration, backup, and sharing code with others. A common conventional name for the primary remote is
origin
.
Operations
You typically clone a remote repository to create your local copy, pull changes from a remote, and push your local changes to a remote. Fetch retrieves changes without merging.
Commit
A snapshot of your project's tracked files at a specific point in time, saved to the repository's history. Each commit has a unique ID (e.g., Git SHA-1 hash) and a message.
Purpose
Commits are the fundamental building blocks of a project's history. They represent saved states that you can reference, revert to, or compare against.
Components
- Snapshot: Records the state of all tracked files at that moment. DVCS like Git often store this efficiently, only recording changes relative to previous commits.
- Metadata: Includes the author, committer, timestamp, and a unique identifier (e.g., a SHA-1 hash in Git).
- Parent(s): Points to the preceding commit(s). A standard commit has one parent; a merge commit has two or more.
- Commit Message: A description written by the author explaining the changes made in the commit. Crucial for understanding history.
Workflow (Git Example)
Typically, you modify files, stage the specific changes you want to include (git add
), and then commit them with a message (git commit
).
Branch
An independent line of development. Branches allow you to work on features or fixes without affecting the main codebase (e.g., main
or master
).
Concept
Think of a branch as a movable pointer to a specific commit. When you create a branch, you create a new pointer. As you make commits on that branch, the pointer moves forward along with your changes.
Why Use Branches?
- Isolation: Develop features, experiment, or fix bugs without destabilizing the main line of code.
- Parallel Development: Multiple team members can work on different features simultaneously on separate branches.
- Organization: Keep related changes grouped together (e.g., a 'feature/user-login' branch).
- Workflow Enablement: Basis for workflows like Gitflow or GitHub Flow, often involving Pull/Merge Requests.
Common Operations (Git)
Creating a branch (git branch feature-x
), switching to it (git checkout feature-x
or the newer git switch feature-x
), making commits, and eventually merging it back.
Merge
The process of combining changes from different branches back into one. Resolves differences between the development histories.
Goal
To integrate work done on a separate branch (e.g., a feature branch) into another branch (e.g., main
).
How it Works
VCS tools analyze the histories of the branches being merged.
- Fast-Forward Merge: If the target branch hasn't diverged (no new commits since the feature branch was created), the target branch pointer simply moves forward to the source branch's latest commit. Simple and results in a linear history.
- Three-Way Merge (or Merge Commit): If both branches have diverged, the VCS finds a common ancestor commit and combines the changes from both branches since that ancestor. This typically creates a new merge commit with two parents, explicitly showing where the histories were combined.
Merge Conflicts
Occur when both branches modified the same part of the same file differently. The VCS cannot automatically decide which change to keep, requiring manual intervention by the developer to resolve the conflict before completing the merge.
Alternatives
Rebasing (e.g., git rebase
) is another way to integrate changes. It rewrites the history of one branch on top of another, creating a linear history but potentially causing issues if the branch has already been shared. It has different tradeoffs compared to merging regarding history clarity and collaboration safety.