GIT is based on a simple model, with a lot of shorthands for common use cases. This model is sometimes hard to guess just from the everyday commands. To illustrate how GIT works, we'll implement a stripped down clone of GIT in a few lines of JavaScript.
We will simulate the Operating System's filesystem with a very
simple key-value store. In this very simple filesystem, directories
are entries mapped to null
and files are entries mapped
to strings.
The filesystem exposes functions to read an entire file, create or replace an entire file, and create a directory.
It will be handy for some operations to list the contents of a directory.
Our imaginary user will create a proj
directory,
and start filling in some files.
git init
(creating .git
)The first thing to do is to initialize the GIT directory.
For now, only the .git
folder is needed, The rest
of the function implementing git init
will be
implemented later.
git hash-object
(storing a copy of a file in .git
)The most basic element of a GIT repository is an object. It is a copy of a file that is stored in GIT's database. That copy is stored under a unique name. The unique name is obtained by hashing the contents of the file.
So far, our GIT database does not know about any of the user's
files. In order to add the contents of the README
file in
the database, we use git hash-object -w -t blob README
,
where -w
tells GIT to write the object in its
database, and -t blob
indicates that we want to create
a blob object, i.e. the contents of a file.
The objects stored in the GIT database are compressed with zlib (using the "deflate" compression method). The filesystem view shows the deflated: followed by the uncompressed data. Click on the file contents to toggle between this pretty-printed view and the raw compressed data.
You will notice that the database does not contain the name of the file, only its contents, stored under a unique identifier which is derived by hashing its contents. Let's add the second user file to the database.
zlib
compressionThe real implementation of GIT compresses objects with zlib. To
view a zlib-compressed object in your terminal, simply write this
declaration in your shell, and then call e.g. unzlib
.git/objects/95/d318ae78cee607a77c453ead4db344fc1221b7
unzlib() { python -c \ "import sys,zlib; \ sys.stdout.buffer.write(zlib.decompress(open(sys.argv[1], 'rb').read()));" \ "$1" }
Now GIT knows about the contents of both of the user's files, but it would be nice to also store the filenames. This is done by creating a tree object
A tree object can contain files (by associating the file's blob to its name), or directories (by associating the hash of other subtrees to their name).
The mode (100644
for the file and 40000
) incidates the permissions, and is given in octal using the values used by *nix
Now that the GIT database contains the entire tree for the current version, a commit can be created. A commit contains
It is now possible to store a commit in the database. This saves a copy of the tree along with some metadata about this version. The first commit has no parent, which is represented by passing the empty list.
A branch is a pointer to a commit, stored in a file in .git/refs/heads/name_of_the_branch
.
The branch can be overwritten with git branch -f
. Also, as will be explained later,
git commit
can update the pointer of a branch.
HEAD
The HEAD indicates the "current" commit. It is set at first as part of the git init
routine.
git commit
If the HEAD
points to a commit hash, then git commit
updates the HEAD
to point to the new commit.
Otherwise, when the HEAD
points to a branch, then the target branch (represented by a file named .git/refs/heads/the_branch_name
) is updated.
Tags are like branches, but are stored in .git/refs/tags/the_tag_name
and a tag is not normally modified. Once created, it's supposed to always point
to the same version.
GIT does offer a git tag -f existing-tag new-hash
command,
but using it should be a rare occurrence.
More importantly, the HEAD does not normally point to a tag. Although nothing actually
prevents writing ref: refs/tags/v1.0
into .git/HEAD
, the GIT
commands will not automatically do this. For example, git checkout tag-or-branch-or-hash
will put a symbolic ref:
in .git/HEAD
only if the argument is a branch.
git init
git init