git fetch explained

What you need to understand is that git is a content-addressable store. What this means is that it stores a bunch of files in the .git/objects directory, each named with a hash of its contents. (Doing it that way prevents git from storing duplicate data). (These files are what git gc cleans up; they take up most of the space in the .git directory, and in the hsglobalscript3 git mirror they take up most of the space in the checkout.)

The most basic files git stores are copies of the files in the tree. git stores a copy of every version you’ve ever checked in of every file in the repo; identical versions get deduplicated by the content-addressable store and I think it tries to arrange the copies on disk so versions with common portions can share storage where possible as well.

(This is different from how rcs/cvs/svn used to do things; on those systems, a file was a sequence of versions, with the latest version stored in full and older versions stored as a sequence of diffs from the next newer version. That minimized storage, usually, but git’s approach makes branching much easier since each version of the file, across all branches, is, in principle, available completely independently.)

git also stores a bunch of commits. Each commit is a snapshot of the content of the tree at the time it was made, so it stores, in addition to the commit metadata and the names of the parent commits, the name of an object which, in turn, stores the names of every file and the object which stores the contents of that file.

So what git essentially does is store a backup of every version of your tree for you, in a way that minimizes how much duplicate data it stores, along with some very simple history; basically just commit authors, messages, and a ‘previous version’ pointer.

(This is why backing up your git repo is essential, but keeping backups within your git repo is not – git already is a backup system.)

Commits, trees, and files are the fundamental objects in git; branches are implemented as a thin layer over commits. A branch, in git, is simply a file in .git/refs/heads that lists the hash of the latest commit on that branch. Commands like git checkout, git branch, git commit, git pull, etc. read and update that file, as needed.

git stores all of this structure (at least) twice: once in your repo, and once in the remote repo. In particular, while git will automatically synchronize commits between the two repos, it’s up to you to make sure branches get synchronized; as far as git is concerned, the ‘devel’ branch in the remote repo and the ‘devel’branch in your local repo are just two independent files that happen to store commit hashes; if you want to create a closer relationship between them, that’s up to you.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s