Foundational Concepts

DVCS Philosophy

Distributed Version Control Systems (DVCS) like Git & Mercurial give every developer a full repository copy, enabling offline work, faster operations, and flexible workflows.

Core Idea

Unlike older Centralized VCS (like SVN or CVS) where developers checked out files from a single central server, DVCS provides each user with a complete, independent copy (clone) of the entire repository history.

Key Advantages
  • Offline Work: Commit, branch, view history, and merge without needing a network connection.
  • Performance: Most operations (commit, branch, merge, diff) are local and thus significantly faster.
  • Redundancy: Every clone is effectively a full backup of the repository.
  • Workflow Flexibility: Enables complex branching strategies and collaboration models (e.g., pull requests).
  • Collaboration: Easily share changes between any two repositories, not just client-server.
Shift from Centralized

The move to DVCS addressed key limitations of CVCS, particularly around branching/merging complexity, performance bottlenecks, and reliance on network connectivity.

Repository (Repo)

A collection of files and the history of their changes. In DVCS, you have a local copy and interact with remote copies (e.g., on GitHub).

What it Contains

A repository stores all the files belonging to a project, plus the complete history of modifications made to those files over time. This history is typically stored as a series of commits.

Local vs. Remote
  • Local Repository: The full copy residing on your own computer. This is where you do your work: edit files, commit changes, create branches.
  • Remote Repository: A copy hosted on a server (e.g., GitHub, GitLab, a company server). Used for collaboration, backup, and sharing code with others. A common conventional name for the primary remote is origin.
Operations

You typically clone a remote repository to create your local copy, pull changes from a remote, and push your local changes to a remote. Fetch retrieves changes without merging.

Commit

A snapshot of your project's tracked files at a specific point in time, saved to the repository's history. Each commit has a unique ID (e.g., Git SHA-1 hash) and a message.

Purpose

Commits are the fundamental building blocks of a project's history. They represent saved states that you can reference, revert to, or compare against.

Components
  • Snapshot: Records the state of all tracked files at that moment. DVCS like Git often store this efficiently, only recording changes relative to previous commits.
  • Metadata: Includes the author, committer, timestamp, and a unique identifier (e.g., a SHA-1 hash in Git).
  • Parent(s): Points to the preceding commit(s). A standard commit has one parent; a merge commit has two or more.
  • Commit Message: A description written by the author explaining the changes made in the commit. Crucial for understanding history.
Workflow (Git Example)

Typically, you modify files, stage the specific changes you want to include (git add), and then commit them with a message (git commit).

Branch

An independent line of development. Branches allow you to work on features or fixes without affecting the main codebase (e.g., main or master).

Concept

Think of a branch as a movable pointer to a specific commit. When you create a branch, you create a new pointer. As you make commits on that branch, the pointer moves forward along with your changes.

Why Use Branches?
  • Isolation: Develop features, experiment, or fix bugs without destabilizing the main line of code.
  • Parallel Development: Multiple team members can work on different features simultaneously on separate branches.
  • Organization: Keep related changes grouped together (e.g., a 'feature/user-login' branch).
  • Workflow Enablement: Basis for workflows like Gitflow or GitHub Flow, often involving Pull/Merge Requests.
Common Operations (Git)

Creating a branch (git branch feature-x), switching to it (git checkout feature-x or the newer git switch feature-x), making commits, and eventually merging it back.

Merge

The process of combining changes from different branches back into one. Resolves differences between the development histories.

Goal

To integrate work done on a separate branch (e.g., a feature branch) into another branch (e.g., main).

How it Works

VCS tools analyze the histories of the branches being merged.

  • Fast-Forward Merge: If the target branch hasn't diverged (no new commits since the feature branch was created), the target branch pointer simply moves forward to the source branch's latest commit. Simple and results in a linear history.
  • Three-Way Merge (or Merge Commit): If both branches have diverged, the VCS finds a common ancestor commit and combines the changes from both branches since that ancestor. This typically creates a new merge commit with two parents, explicitly showing where the histories were combined.
Merge Conflicts

Occur when both branches modified the same part of the same file differently. The VCS cannot automatically decide which change to keep, requiring manual intervention by the developer to resolve the conflict before completing the merge.

Alternatives

Rebasing (e.g., git rebase) is another way to integrate changes. It rewrites the history of one branch on top of another, creating a linear history but potentially causing issues if the branch has already been shared. It has different tradeoffs compared to merging regarding history clarity and collaboration safety.

Distributed VCS

Git (git-scm.com)

The dominant DVCS, known for speed, flexibility, powerful branching/merging, and a vast ecosystem. Features a staging area (index) for crafting commits.

Philosophy & Core

Designed by Linus Torvalds for Linux kernel development. Focuses on performance, data integrity (content-addressable storage using SHA-1 hashes), and robust support for non-linear workflows (branching and merging).

Key Features
  • Staging Area (Index): An intermediate step between your working directory and the repository history. Allows you to selectively stage parts of files to craft precise, logical commits.
  • Branching Model: Extremely lightweight and fast branching and merging are core strengths.
  • Distributed Nature: Every clone is a full repository, facilitating offline work and peer-to-peer sharing.
  • Performance: Most operations are very fast as they are performed locally.
  • Flexibility: Supports diverse workflows (Gitflow, GitHub Flow, etc.). Commands like rebase allow history manipulation (powerful, but use with care on shared branches).
  • Large File Support: Handles large binary files better with extensions like Git Large File Storage (LFS).
Common Commands

git clone, git add, git commit, git status, git log, git branch, git checkout / git switch, git merge, git rebase, git pull, git push, git fetch.

Strengths
  • Industry standard; huge community and ecosystem.
  • Extensive tooling and platform support (GitHub, GitLab, etc.).
  • Excellent performance for most common operations.
  • Powerful and flexible branching/merging.
  • Staging area allows granular commit control.
Tradeoffs
  • Steeper initial learning curve for some concepts (staging area, reset, rebase).
  • Command-line interface can feel complex due to numerous options.
  • Default handling of very large binaries isn't ideal without LFS.
Use Cases

Software development of all sizes, open-source projects, infrastructure as code, documentation, collaborative writing - essentially anywhere tracking changes to text-based files is needed.

Mercurial (Hg) (mercurial-scm.org)

A DVCS focused on simplicity, ease of use, and performance. Often considered more intuitive than Git initially, but less widely used today.

Philosophy & Core

Designed with user-friendliness and interface consistency as high priorities. Aims to provide powerful DVCS features with a simpler command set and conceptual model compared to Git's defaults.

Key Features
  • Simpler Interface: Commands are often perceived as more consistent and intuitive (e.g., no staging area by default; hg commit commits all tracked changes).
  • Branching Concepts: Supports multiple ways to handle parallel lines of development (named branches, bookmarks, anonymous heads). Named branches are permanent parts of history, unlike Git branches which are just pointers. This can offer power but also cause confusion.
  • Extensibility: Core functionality can be extended via built-in extensions (e.g., `rebase`, `histedit` for history modification, `largefiles` for binary file support).
  • Performance: Generally excellent performance, comparable to Git.
  • Windows Support: Historically known for strong native Windows support.
Common Commands

hg clone, hg add, hg commit (or hg ci), hg status, hg log, hg branch, hg update (switches branches/commits), hg merge, hg pull, hg push, hg heads.

Strengths
  • Potentially easier initial learning curve for basic workflows.
  • More consistent command-line interface structure.
  • Good performance.
  • Built-in web interface (hg serve).
Tradeoffs
  • Significantly smaller community and ecosystem compared to Git.
  • Declining support on major hosting platforms (Bitbucket deprecated new Hg repos in 2020; GitHub never fully supported it; GitLab requires workarounds).
  • Multiple branching models (named branches vs bookmarks) can be confusing.
  • Some advanced operations require enabling non-default extensions.
Use Cases

Legacy projects, specific large organizations (e.g., Meta/Facebook uses it heavily internally), teams that strongly prefer its model over Git and manage their own hosting or use Bitbucket for existing repos.

Hosting Platforms

GitHub (github.com)

The most popular platform for hosting Git repositories. Strong focus on collaboration, open source, and developer community features.

Core Offerings

Git repository hosting, Pull Requests (code review/merge workflow), issue tracking, project management (Projects v2 - boards/tables), CI/CD (GitHub Actions), package hosting (Packages), wikis, security scanning (Dependabot, Code Scanning), large community.

Key Features
  • Pull Requests: Central mechanism for collaborative code review and merging.
  • GitHub Actions: Powerful, integrated CI/CD and workflow automation platform defined in YAML files within the repo.
  • Community Focus: The de facto hub for open-source projects; social coding features (following, starring, forks).
  • Marketplace: Integrations with numerous third-party developer tools.
  • Codespaces: Cloud-based development environments accessible via browser or VS Code.
  • Copilot: AI pair programmer deeply integrated into the developer workflow.
Strengths
  • Massive user base and network effect.
  • Generally excellent UI/UX, considered intuitive by many.
  • Powerful and flexible GitHub Actions CI/CD.
  • Generous free tier for public and private repositories, including Actions minutes for public repos.
  • Strong focus on security features (Dependabot, code scanning).
Tradeoffs
  • Owned by Microsoft (a potential concern for some).
  • CI/CD minutes/storage limits on free/lower tiers can be restrictive for large private projects.
  • Integrated project management features may be less comprehensive than specialized tools like Jira or Azure Boards for complex needs.
GitLab (gitlab.com)

An integrated DevOps platform built around Git hosting. Offers a wide toolchain from planning to monitoring. Self-hosting option available.

Core Offerings

Git repository hosting, Merge Requests (their term for PRs), issue tracking with boards, powerful integrated CI/CD (GitLab CI/CD), container registry, extensive security scanning (SAST, DAST, dependency, secrets), monitoring features, planning tools (Epics, Roadmaps), wiki.

Key Features
  • Single Application: Aims to provide the entire DevOps lifecycle within one platform, reducing toolchain complexity.
  • GitLab CI/CD: Highly integrated, powerful, and configurable CI/CD using a .gitlab-ci.yml file. Runners can be shared or self-hosted.
  • Self-Managed Option: Can be installed on your own infrastructure (Community Edition is open source, Enterprise Edition offers more features).
  • Security Focus (DevSecOps): Comprehensive suite of security scanning tools integrated into the development workflow and MRs.
  • Auto DevOps: Opinionated pipeline for automating build, test, deploy, and monitoring for simple projects.
Strengths
  • All-in-one DevOps platform can simplify tooling.
  • Mature and powerful built-in CI/CD is a core strength.
  • Strong security features integrated throughout the lifecycle.
  • Viable and popular self-hosting option with the open-source Community Edition.
  • Transparent development and release process.
Tradeoffs
  • The sheer number of features can make the UI feel complex or overwhelming initially.
  • UI/UX sometimes perceived as less polished than GitHub by some users.
  • Free tier CI/CD limits may be reached quickly by active teams.
  • Self-hosting requires operational overhead (maintenance, scaling, security).
Bitbucket Cloud (bitbucket.org)

Atlassian's Git hosting platform. Known for excellent integration with other Atlassian tools like Jira and Trello. (Mercurial support deprecated).

Core Offerings

Git repository hosting (Mercurial support was removed for new repos in 2020, existing Hg repos may still function but are legacy), Pull Requests, integrated CI/CD (Bitbucket Pipelines), code search, deep issue tracking integration (Jira), project tracking integration (Trello).

Key Features
  • Atlassian Ecosystem Integration: Seamless workflow between Bitbucket commits/PRs and Jira issues is its main draw. Also integrates with Trello, Confluence, Opsgenie etc.
  • Bitbucket Pipelines: Integrated CI/CD configured via a bitbucket-pipelines.yml file within the repository.
  • Code Insights: Allows surfacing information from scanning/testing tools directly within Pull Requests.
  • Free Tier: Offers free private repositories for small teams (up to 5 users), including CI/CD minutes.
  • Deployment tracking: Integrates deployment information back into Jira issues.
Strengths
  • Unbeatable integration if your team is heavily invested in the Atlassian suite (Jira especially).
  • Competitive pricing, potentially cost-effective for small teams needing private repos and basic CI/CD.
  • Simple and effective built-in CI/CD with Pipelines for many common use cases.
  • Clean and straightforward user interface.
Tradeoffs
  • Smaller developer community compared to GitHub or GitLab.
  • CI/CD (Pipelines) may be less flexible or powerful than GitHub Actions or GitLab CI for very complex scenarios.
  • Fewer built-in features compared to GitLab's all-in-one approach (relies more on integrating other Atlassian tools).
  • Less focus on open-source community features than GitHub.
  • Mercurial support is effectively gone for practical purposes.
Use Cases

Teams already using Jira extensively for project management, organizations standardized on Atlassian tools, small teams looking for cost-effective private repositories with integrated CI/CD.

Azure DevOps (azure.microsoft.com)

Microsoft's suite of DevOps services, including Azure Repos for Git hosting. Strong integration with Azure cloud and other DevOps services.

Core Offerings (Suite)

Azure DevOps Services is a suite including: Azure Repos (Git hosting, TFVC also supported), Azure Pipelines (CI/CD), Azure Boards (Agile planning, work item tracking), Azure Test Plans, and Azure Artifacts (package management).

Key Features (Focus on Repos & Pipelines)
  • Integrated Suite: Provides tools across the development lifecycle, similar in scope to GitLab but within the Microsoft Azure ecosystem. Services can often be used independently.
  • Azure Repos: Unlimited private Git repositories. Also supports Microsoft's legacy centralized VCS (Team Foundation Version Control - TFVC).
  • Azure Pipelines: Very powerful and flexible CI/CD system. Supports building/deploying almost anywhere (Azure, AWS, GCP, on-prem). Can be defined with YAML or a classic visual editor.
  • Azure Boards: Mature and highly customizable Agile planning and work tracking tools (comparable to Jira).
  • Enterprise Focus: Strong support for granular permissions, branch policies, auditing, and integration with Azure Active Directory (now Microsoft Entra ID).
Strengths
  • Excellent integration with Azure cloud services for build and deployment.
  • Mature, powerful, and flexible CI/CD (Azure Pipelines).
  • Comprehensive and customizable work tracking features (Azure Boards).
  • Strong enterprise governance, security policies, and access control features.
  • Generous free tier for small teams (up to 5 users) and open source projects (more users, parallel jobs).
  • Continued support for TFVC for teams migrating from older Microsoft VCS.
Tradeoffs
  • UI can feel complex or slightly dated compared to GitHub/GitLab, especially navigating between the different services (Repos, Pipelines, Boards).
  • Less prominent in the general open-source community compared to GitHub.
  • Can feel heavily tied to the Microsoft ecosystem, although Pipelines are platform-agnostic.
  • The split between different services (Repos, Pipelines, Boards) might feel less unified than GitLab's single-application approach for some users.
Use Cases

Organizations heavily invested in Microsoft Azure, enterprise teams needing strong governance/planning tools, teams migrating from TFVC, projects requiring complex CI/CD pipelines regardless of deployment target.

Key Workflows

Branching Strategies

Defined patterns for using branches to manage development, releases, and fixes. Common examples include Gitflow, GitHub Flow, and Trunk-Based Development.

Common Strategies
  • Gitflow: A robust model with dedicated branches: `main` (production releases), `develop` (integration), `feature/*` (new work), `release/*` (release prep), `hotfix/*` (urgent production fixes). Good for projects with distinct, scheduled releases. Can be complex.
  • GitHub Flow: A simpler model optimized for continuous deployment. `main` is always deployable. New work happens in feature branches created from `main`. Changes are merged back into `main` via Pull Request and typically deployed immediately.
  • GitLab Flow: Similar to GitHub Flow but adds more flexibility, often including environment branches (e.g., `production`, `staging`) or release branches, bridging the gap between continuous deployment and versioned releases.
  • Trunk-Based Development (TBD): Developers work in very short-lived feature branches or commit directly to the main line (`trunk`/`main`). Relies heavily on feature flags, comprehensive automated testing, and robust CI/CD pipelines. Aims for continuous integration and fast feedback.

The best strategy depends on team size, release cadence, deployment practices, required stability guarantees, and team discipline.

Pull/Merge Requests (PR/MR)

A core collaboration feature on hosting platforms to propose changes from one branch to another, facilitating code review, discussion, automated checks, and controlled merging.

Process Overview
  1. A developer creates a feature branch, makes commits locally, and pushes the branch to the remote platform.
  2. The developer opens a Pull Request (GitHub, Bitbucket, Azure Repos) or Merge Request (GitLab) targeting the desired integration branch (e.g., `main` or `develop`).
  3. Team members (reviewers) examine the code changes ('diff'), leave comments, ask questions, and request modifications.
  4. Automated checks configured via CI/CD (builds, tests, linters, security scans) run against the proposed changes, reporting status back to the PR/MR.
  5. Discussion and code updates occur until the changes are approved and all checks pass.
  6. Finally, an authorized user merges the changes into the target branch, often directly via the platform's UI, which usually closes the PR/MR.
Benefits
  • Code Quality: Facilitates peer review to catch bugs, improve design, ensure coding standards, and enforce consistency.
  • Knowledge Sharing: Team members learn from each other's code and gain context on changes across the project.
  • Gatekeeping: Acts as a checkpoint, ensuring changes meet quality standards and pass automated checks before entering critical branches.
  • Discussion Record: Provides a valuable history of decisions, discussions, and context related to specific changes.
CI/CD Integration

VCS events (like pushes or PRs) trigger Continuous Integration (automated build/test) and Continuous Delivery/Deployment pipelines, automating the software delivery process.

Continuous Integration (CI)

The practice of frequently merging all developers' working copies to a shared mainline (e.g., `main` or `develop`), several times a day. Each merge triggers an automated build and execution of automated tests.

  • Trigger: Typically runs on every push to any branch, or specifically on pushes/updates to PRs/MRs targeting key branches.
  • Goal: Detect integration errors, build failures, and test regressions as early as possible in the development cycle.
  • Benefit: Provides a fast feedback loop to developers, reduces integration problems ("merge hell"), and improves overall code quality and stability.
Continuous Delivery / Continuous Deployment (CD)

Extends CI by automating the release process beyond just building and testing.

  • Continuous Delivery: Automatically builds, tests, and prepares code changes for release to production. The pipeline ensures the software *can* be released at any time, but the final deployment to production usually requires a manual approval step.
  • Continuous Deployment: Automatically deploys every validated change that passes the full CI/CD pipeline directly to production without human intervention. Requires high confidence in automated testing and infrastructure.
  • Trigger: Typically runs after CI succeeds on specific branches (e.g., merge to `main`) or upon creation of tags.
  • Benefit: Enables faster, more frequent, and more reliable releases; reduces manual deployment effort and associated errors.
Role of VCS & Platforms

The VCS (Git) and the Hosting Platform (GitHub, GitLab, etc.) are central. VCS events (push, merge, tag creation) act as the primary triggers for CI/CD pipelines. Platforms provide integrated CI/CD tools (Actions, GitLab CI, Pipelines) or integrate tightly with external tools (like Jenkins, CircleCI, etc.). The pipeline definitions are often stored as code (e.g., YAML files) within the repository itself.

Choosing Tools

System & Platform Selection Factors

Generally, choose Git due to its ubiquity. Select a Hosting Platform based on ecosystem integration (Atlassian, Microsoft, Open Source), desired feature depth (CI/CD, security, project management), community vs. enterprise focus, self-hosting needs, and pricing.

Choosing the VCS System (Git vs. Mercurial)
  • Default Choice: Git. It's the overwhelming industry standard. Choosing Git ensures maximum compatibility, community support, available talent, tooling options, and platform choices.
  • Consider Mercurial Only If: You have a strong, specific reason, such as maintaining large legacy Hg projects, being part of an organization deeply committed to it (like Meta), or having a team that vastly prefers its specific model and is aware of the ecosystem limitations. Be prepared for fewer hosting options and potentially less community support.
Choosing the Hosting Platform (GitHub vs. GitLab vs. Bitbucket vs. Azure DevOps)
  • Ecosystem Integration: How well does it fit with your other tools?
    • Bitbucket: Best for deep Atlassian (Jira, Confluence) integration.
    • Azure DevOps: Best for deep Microsoft Azure and Microsoft Entra ID integration.
    • GitHub: Strong integrations via Marketplace, excellent for open source workflows.
    • GitLab: Aims to *be* the ecosystem (all-in-one).
  • Feature Set & Depth: What capabilities are crucial?
    • GitLab: Most comprehensive built-in feature set across the DevOps lifecycle (especially strong CI/CD and integrated security scanning).
    • Azure DevOps: Very mature and powerful Pipelines (CI/CD) and Boards (planning).
    • GitHub: Excellent Actions (CI/CD), leading community features, strong security addons (Dependabot, Code Scanning), integrated Packages and Codespaces.
    • Bitbucket: Core Git hosting, solid Pipelines (CI/CD), primary strength is Jira integration.
  • Hosting Model: Cloud SaaS vs. Self-Managed?
    • GitLab: Strong offering for both SaaS and self-managed (Community/Enterprise editions).
    • GitHub: Primarily SaaS, but offers GitHub Enterprise Server for self-hosting.
    • Azure DevOps: Primarily SaaS (Services), but Azure DevOps Server exists for on-premise (less common now).
    • Bitbucket: Cloud (SaaS) and Data Center (self-managed).
  • Community vs. Enterprise Focus:
    • GitHub: Leader in open source and general developer community. Strong enterprise offering too.
    • GitLab: Strong in both, appeals to companies wanting an integrated platform.
    • Azure DevOps: Historically strong enterprise focus, good free tiers attract smaller teams/OSS too.
    • Bitbucket: Traditionally strong with SMBs and enterprises using Atlassian products.
  • Pricing & Tiers: Carefully compare free tier limitations (users, private repos, CI/CD minutes, storage) and the costs of paid plans based on your team size and expected usage.
  • User Interface & Experience (UI/UX): This is subjective. Explore the interfaces or use free tiers to see which platform your team finds most intuitive and pleasant to work with.
Standardization vs. Polyglot

While technically possible to use different platforms for different needs (e.g., GitHub for open source, Azure DevOps for internal enterprise apps), most organizations find it beneficial to standardize on one primary platform for consistency, billing simplification, user management, and unified workflow, potentially integrating specific best-of-breed tools if necessary.