860 lines
31 KiB
HTML
860 lines
31 KiB
HTML
<html>
|
|
<head>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
|
|
<title>GIT tutorial</title>
|
|
<link rel="stylesheet" href="codemirror-5.60.0/lib/codemirror.css">
|
|
<script src="codemirror-5.60.0/lib/codemirror.js"></script>
|
|
<script src="codemirror-5.60.0/mode/javascript/javascript.js"></script>
|
|
<script src="sha1.js"></script>
|
|
<script src="pako.min.js"></script>
|
|
<link rel="stylesheet" href="codemirror-5.60.0/lib/codemirror.css">
|
|
<link rel="stylesheet" href="git-tutorial.css">
|
|
</head>
|
|
<body>
|
|
|
|
<h1>Under construction</h1>
|
|
|
|
<article id="git-tutorial">
|
|
<p>The main reference for this tutorial is the <a href="https://git-scm.com/book/en/v2/Git-Internals-Git-Objects">Pro Git book</a> section on GIT internals.</p>
|
|
|
|
<p>This tutorial uses three libraries:</p>
|
|
<ul>
|
|
<li><a href="https://codemirror.net/">CodeMirror</a>, released under the MIT license</li>
|
|
<li><a href="https://www.movable-type.co.uk/scripts/sha1.html">sha1.js</a>, released under the MIT license</li>
|
|
<li><a href="https://github.com/nodeca/pako">pako 2.0.3</a>, released under the MIT and Zlib licenses, see the project page for details.</li>
|
|
</ul>
|
|
|
|
<div id="lines"></div>
|
|
|
|
<section id="introduction">
|
|
<h1>Introduction</h1>
|
|
<p>
|
|
GIT is based on a simple model, with a lot of shorthands for common
|
|
use cases. This model is sometimes hard to guess just from the
|
|
everyday commands. To illustrate how GIT works, we'll implement a
|
|
stripped down clone of GIT in <span class="loc-count">a few</span> lines of
|
|
JavaScript.
|
|
<span style="font-size: small">* empty lines and single closing braces
|
|
excluded, <span class="loc-count-total">a few more</span> in total.</span>
|
|
</p>
|
|
</section>
|
|
|
|
<section id="os-filesystem">
|
|
<h1>The Operating System's filesystem</h1>
|
|
|
|
<section id="os-filesystem-model">
|
|
<h1>Model of the filesystem</h1>
|
|
<p>We will simulate the Operating System's filesystem with a very
|
|
simple key-value store. In this very simple filesystem, directories
|
|
are entries mapped to <code>null</code> and files are entries mapped
|
|
to strings. The path to the current directory is stored in a separate
|
|
variable.</p>
|
|
<textarea id="in0">
|
|
var filesystem = {};
|
|
var current_directory = '';
|
|
</textarea>
|
|
</section>
|
|
|
|
<section id="os-filesystem-functions">
|
|
<h1>Filesystem access functions<span class="notoc"> (<code>read</code>, <code>write</code>, <code>mkdir</code>, <code>exists</code>, <code>cd</code>)</span></h1>
|
|
<p>The filesystem exposes functions to read an entire file, create or
|
|
replace an entire file, create a directory, test the existence of a filesystem entry, and change the current directory.</p>
|
|
<textarea id="in1">
|
|
function read(filename) {
|
|
return filesystem[filename];
|
|
}
|
|
|
|
function write(filename, data) {
|
|
return filesystem[filename] = ""+data;
|
|
}
|
|
|
|
function exists(filename) {
|
|
return typeof(filesystem[filename]) !== 'undefined';
|
|
}
|
|
|
|
function mkdir(dirname) {
|
|
return filesystem[dirname] = null;
|
|
}
|
|
|
|
function cd(d) {
|
|
current_directory = d;
|
|
}
|
|
</textarea>
|
|
</section>
|
|
|
|
<section id="os-filesystem-listdir">
|
|
<h1>Filesystem access functions<span class="notoc"> (<code>listdir</code>)</span></h1></h1>
|
|
<p>It will be handy for some operations to list the contents of a
|
|
directory.</p>
|
|
<textarea id="in2">
|
|
function listdir(dirname) {
|
|
var depth = dirname.split('/').length + 1;
|
|
var descendents = filesystem
|
|
.filter(function (filename) { return filename.startsWith(dirname + '/'); });
|
|
var children = descendents
|
|
.map(function (filename) { return filename.split('/')[depth]; });
|
|
// remove duplicates:
|
|
return Array.from(new Set(children));
|
|
}
|
|
</textarea>
|
|
</section>
|
|
</section>
|
|
|
|
<section id="example-working-directory">
|
|
<h1>Example working directory</h1>
|
|
<p>Our imaginary user will create a <code>proj</code> directory,
|
|
and start filling in some files.</p>
|
|
<textarea id="in3">
|
|
cd('proj');
|
|
mkdir('proj');
|
|
write('proj/README', 'This is my Scheme project.\n');
|
|
mkdir('proj/src');
|
|
write('proj/src/main.scm', '(map (lambda (x) (+ x 1)) (list 1 2 3))\n');
|
|
</textarea>
|
|
</section>
|
|
|
|
<section id="git-init">
|
|
<h1><code>git init</code> (creating <code>.git</code>)</h1>
|
|
<p>The first thing to do is to initialize the GIT directory.
|
|
For now, only the <code>.git</code> folder is needed, The rest
|
|
of the function implementing <code>git init</code> will be
|
|
implemented later.</p>
|
|
<textarea id="in4">
|
|
function join_paths(a, b) {
|
|
return (a == "") ? b : (a + "/" + b);
|
|
}
|
|
|
|
function git_init_mkdir() {
|
|
mkdir(join_paths(current_directory, '.git'));
|
|
}
|
|
|
|
git_init_mkdir();
|
|
</textarea>
|
|
</section>
|
|
|
|
<section id="git-hash-object">
|
|
<h1><code>git hash-object</code><span class="notoc" (storing a copy of a file in <code>.git</code>)</span></h1>
|
|
<p>The most basic element of a GIT repository is an object. It is a
|
|
copy of a file that is stored in GIT's database. That copy is
|
|
stored under a unique name. The unique name is obtained by hashing the
|
|
contents of the file. <!-- or have a hash oracle that always returns a
|
|
new number. --></p>
|
|
<textarea id="in5">
|
|
function hash_object(must_write, type, is_data, path_or_data) {
|
|
var data = is_data ? path_or_data : read(current_directory + "/" + path_or_data);
|
|
|
|
object_contents = type + ' ' + data.length + '\0' + data;
|
|
|
|
var hash = sha1(object_contents)
|
|
|
|
if (must_write) {
|
|
mkdir(join_paths(current_directory, '.git/objects'));
|
|
mkdir(join_paths(current_directory, '.git/objects/' + hash.slice(0,2)));
|
|
var object_path = join_paths(current_directory, '.git/objects/' + hash.slice(0,2) + '/' + hash.slice(2));
|
|
write(object_path, deflate(object_contents));
|
|
}
|
|
|
|
return hash;
|
|
}
|
|
</textarea>
|
|
|
|
<section id="add-file-to-git">
|
|
<h1>Adding a file to the GIT database</h1>
|
|
<p>So far, our GIT database does not know about any of the user's
|
|
files. In order to add the contents of the <code>README</code> file in
|
|
the database, we use <code>git hash-object -w -t blob README</code>,
|
|
where <code>-w</code> tells GIT to <em>write</em> the object in its
|
|
database, and <code>-t blob</code> indicates that we want to create
|
|
a <em>blob</em> object, i.e. the contents of a file.</p>
|
|
<textarea id="in6">
|
|
// git hash-object -w -t blob README
|
|
hash_object(true, 'blob', false, 'README');
|
|
</textarea>
|
|
<p>The objects stored in the GIT database are compressed with zlib
|
|
(using the "deflate" compression method). The filesystem view shows
|
|
the <span class="deflated">deflated:</span> followed by the uncompressed
|
|
data. Click on the file contents to toggle between this pretty-printed
|
|
view and the raw compressed data.
|
|
</p>
|
|
|
|
<p>You will notice that the database does not contain the name of the
|
|
file, only its contents, stored under a unique identifier which is
|
|
derived by hashing its contents. Let's add the second user file
|
|
to the database.</p>
|
|
<textarea id="in7">
|
|
// git hash-object -w -t blob src/main.scm
|
|
hash_object(true, 'blob', false, 'src/main.scm');
|
|
</textarea>
|
|
</section>
|
|
</section>
|
|
|
|
<section id="zlib-compression-note">
|
|
<h1><code>zlib</code> compression</h1>
|
|
<p>The real implementation of GIT compresses objects with zlib. To
|
|
view a zlib-compressed object in your terminal, simply write this
|
|
declaration in your shell, and then call e.g. <code>unzlib
|
|
.git/objects/95/d318ae78cee607a77c453ead4db344fc1221b7</code></p>
|
|
|
|
<pre>
|
|
unzlib() {
|
|
python -c \
|
|
"import sys,zlib; \
|
|
sys.stdout.buffer.write(zlib.decompress(open(sys.argv[1], 'rb').read()));" \
|
|
"$1"
|
|
}
|
|
</pre>
|
|
</section>
|
|
|
|
<section id="storing-trees">
|
|
<h1>Storing trees (list of hashed files and subtrees)</h1>
|
|
<p>Now GIT knows about the contents of both of the user's
|
|
files, but it would be nice to also store the filenames.
|
|
This is done by creating a <em>tree</em> object</p>
|
|
|
|
<p>A tree object can contain files (by associating the file's blob to its name), or directories (by associating the hash of other subtrees to their name).
|
|
The mode (<code>100644</code> for the file and <code>40000</code>) incidates the permissions, and is given in octal using <a href="https://unix.stackexchange.com/a/145118/19059">the values used by *nix</a></p>
|
|
|
|
<textarea id="in8">
|
|
// base_directory is a string
|
|
// filenames is a list of strings
|
|
// subtrees is a list of {name, hash} objects.
|
|
function store_tree(base_directory, filenames, subtrees) {
|
|
function get_file_hash(filename) {
|
|
return from_hex(hash_object(true, 'blob', false, join_paths(base_directory, filename)));
|
|
}
|
|
|
|
var blobs = filenames.map(function (filename) {
|
|
return "100644 " + filename + "\0" + get_file_hash(filename)
|
|
});
|
|
|
|
var trees = subtrees.map(function (subtree) {
|
|
return "40000 " + subtree.name + "\0" + from_hex(subtree.hash);
|
|
});
|
|
|
|
tree_contents = blobs.join('') + trees.join('');
|
|
|
|
// cat tree_contents | git hash-object -w -t tree --stdin
|
|
return hash_object(true, 'tree', true, tree_contents);
|
|
}
|
|
</textarea>
|
|
|
|
<p>This function needs a small utility to convert hashes encoded in hexadecimal to a binary form.</p>
|
|
<textarea id="in9">
|
|
function from_hex(hex) {
|
|
var hex = String(hex);
|
|
var str = ""
|
|
for (var i = 0; i < hex.length; i+=2) {
|
|
str += String.fromCharCode(parseInt(hex.substr(i, 2), 16));
|
|
}
|
|
return str;
|
|
}
|
|
</textarea>
|
|
|
|
<section id="store-tree-example">
|
|
<h1>Example use of store_tree</h1>
|
|
<textarea id="in10">
|
|
//hash_src_tree = store_tree("src/", ["main.scm"], []);
|
|
//hash_root_tree = store_tree("", ["README"], [{name:"src", hash:hash_src_tree}]);
|
|
</textarea>
|
|
</section>
|
|
|
|
<section id="store-tree-from-paths">
|
|
<h1>Storing a tree from a list of paths</h1>
|
|
<p>Making trees out of the subfolders one by one is cumbersome. Here's a utility function which takes a list of paths, and builds a tree from those.</p>
|
|
|
|
<textarea id="in11">
|
|
function paths_to_tree(paths) {
|
|
var hierarchy = { subfolders: {}, files: [] };
|
|
for (var i = 0; i < paths.length; i++) {
|
|
var path_components = paths[i].split('/');
|
|
var h = hierarchy;
|
|
for (var j = 0; j < path_components.length - 1; j++) {
|
|
if (! h.subfolders.hasOwnProperty(path_components[j])) {
|
|
h.subfolders[path_components[j]] = { subfolders: {}, files: [] };
|
|
}
|
|
h = h.subfolders[path_components[j]];
|
|
}
|
|
h.files.push(path_components[i]);
|
|
}
|
|
|
|
var to_tree = function(base_directory, hierarchy) {
|
|
var subtrees = [];
|
|
for (var i in hierarchy.subfolders) {
|
|
if (hierarchy.subfolders.hasOwnProperty(i)) {
|
|
subtrees.push({ name: i, hash: to_tree(join_paths(base_directory, i), hierarchy.subfolders[i]) });
|
|
}
|
|
}
|
|
return store_tree(base_directory, hierarchy.files, subtrees);
|
|
}
|
|
return to_tree("", hierarchy);
|
|
}
|
|
|
|
// git add README src/main.scm
|
|
paths_to_tree(["README", "src/main.scm"]);
|
|
</textarea>
|
|
</section>
|
|
</section>
|
|
|
|
<section id="store-commit">
|
|
<h1>Storing a commit in the GIT database</h1>
|
|
<p>Now that the GIT database contains the entire tree for the current version,
|
|
a commit can be created. A commit contains</p>
|
|
<ul>
|
|
<li>a pointer to the tree</li>
|
|
<li>a pointer to the previous ("parent") commit (or to multiple parent commits merging them, or no parents for the initial commit)</li>
|
|
<li>information about the author (the person who initially wrote the code)</li>
|
|
<li>information about the committer (the person who adds the code to the GIT
|
|
database, often the same person as the author, but it can be a different person
|
|
e.g. when someone else makes changes to the history or applies a patch recieved
|
|
by e-mail)</li>
|
|
<li>a description</li>
|
|
</ul>
|
|
<p>The author and committer information contain</p>
|
|
<ul>
|
|
<li>the person's name</li>
|
|
<li>the person's email</li>
|
|
<li>the *nix timestamp at which the version was authored or committed</li>
|
|
<li>the <a href="https://www.youtube.com/watch?v=q2nNzNo_Xps">timezone for that timestamp</a></li>
|
|
</ul>
|
|
<textarea id="in12">
|
|
function store_commit(tree, parents, author, committer, message) {
|
|
var commit_contents = '';
|
|
commit_contents += 'tree ' + tree + '\n';
|
|
for (var i = 0; i < parents.length; i++) {
|
|
commit_contents += 'parent ' + parents[i] + '\n';
|
|
}
|
|
commit_contents += 'author ' + author.name
|
|
+ ' <' + author.email + '> '
|
|
+ format_date(author.date) + ' '
|
|
+ format_timezone(author.timezoneMinutes) + '\n';
|
|
commit_contents += 'committer ' + committer.name
|
|
+ ' <' + committer.email + '> '
|
|
+ format_date(committer.date) + ' '
|
|
+ format_timezone(committer.timezoneMinutes) + '\n';
|
|
commit_contents += '\n';
|
|
commit_contents += '' + message + (message[message.length-1] == '\n' ? '' : '\n');
|
|
// cat commit_contents | git hash-object -w -t commit --stdin
|
|
return hash_object(true, 'commit', true, commit_contents);
|
|
}
|
|
|
|
function format_date(d) {
|
|
return Math.floor((+d) / 1000);
|
|
}
|
|
function left_pad(s, char, len) {
|
|
while ((''+s).length < len) { s = '' + char + s; }
|
|
return s;
|
|
}
|
|
function format_timezone(tm) {
|
|
var h = Math.floor(Math.abs(+tm)/60);
|
|
var m = Math.abs(+tm)%60;
|
|
return (tm >= 0 ? '+' : '-') + left_pad(h, '0', 2) + left_pad(m, '0', 2);
|
|
}
|
|
</textarea>
|
|
|
|
<section id="store-commit-example">
|
|
<h1>Storing an example commit</h1>
|
|
<p>It is now possible to store a commit in the database. This saves
|
|
a copy of the tree along with some metadata about this version.
|
|
The first commit has no parent, which is represented by passing
|
|
the empty list.</p>
|
|
<textarea id="in13">
|
|
initial_commit = store_commit(
|
|
paths_to_tree(["README", "src/main.scm"]),
|
|
[],
|
|
{name:'Ada Lovelace', email:'ada@analyti.cal', date:new Date(1617120803000), timezoneMinutes: +60},
|
|
{name:'Ada Lovelace', email:'ada@analyti.cal', date:new Date(1617120803000), timezoneMinutes: +60},
|
|
'Initial commit');
|
|
</textarea>
|
|
</section>
|
|
</section>
|
|
|
|
<section id="git-branch">
|
|
<h1><code>git branch</code></h1>
|
|
<p>A branch is a pointer to a commit, stored in a file in <code>.git/refs/heads/name_of_the_branch</code>.
|
|
The branch can be overwritten with <code>git branch -f</code>. Also, as will be explained later,
|
|
<code>git commit</code> can update the pointer of a branch.</p>
|
|
<textarea id="in14">
|
|
function git_branch(branch_name, commit_ref, force) {
|
|
var commit_hash = git_rev_parse(commit_ref);
|
|
mkdir(join_paths(current_directory, '.git/refs'));
|
|
mkdir(join_paths(current_directory, '.git/refs/heads'));
|
|
if (!force && exists(join_paths(current_directory, '.git/refs/heads/' + branch_name))) {
|
|
alert("branch already exists");
|
|
return false;
|
|
} else {
|
|
write(join_paths(current_directory, '.git/refs/heads/' + branch_name), commit_hash + '\n');
|
|
return true;
|
|
}
|
|
}
|
|
|
|
// git branch main 0123456789012345678901234567890123456789
|
|
git_branch('main', initial_commit, false);
|
|
|
|
// git branch -f main 0123456789012345678901234567890123456789
|
|
git_branch('main', initial_commit, true);
|
|
</textarea>
|
|
</section>
|
|
|
|
<section id="HEAD">
|
|
<h1><code>HEAD</code></h1>
|
|
<p>
|
|
The HEAD indicates the "current" commit. It is set at first as part of the <code>git init</code> routine.
|
|
</p>
|
|
<textarea id="in15">
|
|
function git_init_head() {
|
|
write(join_paths(current_directory, '.git/HEAD'), 'ref: refs/heads/main\n');
|
|
}
|
|
|
|
git_init_head();
|
|
</textarea>
|
|
</section>
|
|
|
|
<section id="git-commit">
|
|
<h1><code>git commit</code></h1>
|
|
<p>If the <code>HEAD</code> points to a commit hash, then <code>git commit</code> updates the <code>HEAD</code> to point to the new commit.
|
|
Otherwise, when the <code>HEAD</code> points to a branch, then the target branch (represented by a file named <code>.git/refs/heads/the_branch_name</code>) is updated.</p>
|
|
<textarea id="in16">
|
|
var gitconfig = {
|
|
user: {
|
|
name: 'Ada Lovelace',
|
|
email: 'ada@analyti.cal',
|
|
}
|
|
};
|
|
var $EDITOR = function() { return window.prompt('Commit message:'); }
|
|
</textarea>
|
|
<textarea>
|
|
function git_commit(file_paths, message) {
|
|
var now = new Date();
|
|
var timestamp = (+now)/1000;
|
|
var timezoneMinutes = -(now.getTimezoneOffset());
|
|
|
|
var parent = git_rev_parse('HEAD');
|
|
var parents = parent ? [parent] : []
|
|
|
|
var new_commit_hash = store_commit(
|
|
paths_to_tree(file_paths),
|
|
parents,
|
|
{name:gitconfig.user.name, email:gitconfig.user.email, date:now, timezoneMinutes:timezoneMinutes },
|
|
{name:gitconfig.user.name, email:gitconfig.user.email, date:now, timezoneMinutes:timezoneMinutes },
|
|
message || $EDITOR());
|
|
|
|
advance_head(new_commit_hash);
|
|
|
|
return new_commit_hash;
|
|
}
|
|
</textarea>
|
|
<textarea>
|
|
function advance_head(new_commit_hash) {
|
|
var referenced_branch = git_symbolic_ref('HEAD');
|
|
if (referenced_branch) {
|
|
// Update the target of the ref:
|
|
write(join_paths(current_directory, '.git/' + referenced_branch), new_commit_hash + '\n');
|
|
} else {
|
|
// Detached HEAD, update .git/HEAD directly.
|
|
write(join_paths(current_directory, '.git/HEAD'), new_commit_hash + '\n');
|
|
}
|
|
}
|
|
</textarea>
|
|
<textarea>
|
|
function git_rev_parse(ref) {
|
|
var symbolic_ref_target = git_symbolic_ref(ref);
|
|
if (symbolic_ref_target) {
|
|
// symbolic ref like "ref: refs/heads/main"
|
|
return git_rev_parse(symbolic_ref_target);
|
|
} else if (/[0-9a-f]{40}/.test(ref)) {
|
|
// hash like "0123456789abcdef0123456789abcdef01234567"
|
|
return ref;
|
|
} else if (ref == 'HEAD') {
|
|
// user-friendly reference like "HEAD"
|
|
return git_rev_parse(trim_newline(read(join_paths(current_directory, '.git/' + ref))));
|
|
} else if (ref.startsWith('refs/') && exists(join_paths(current_directory, '.git/' + ref))) {
|
|
// user-friendly reference like "refs/heads/main"
|
|
return git_rev_parse(trim_newline(read(join_paths(current_directory, '.git/' + ref))));
|
|
} else if (exists(join_paths(current_directory, '.git/refs/heads/' + ref))) {
|
|
// user-friendly reference like "main" (a branch)
|
|
return git_rev_parse(trim_newline(read(join_paths(current_directory, '.git/refs/heads/' + ref))));
|
|
} else if (exists(join_paths(current_directory, '.git/refs/tags/' + ref))) {
|
|
// user-friendly reference like "v1.0" (a branch)
|
|
return git_rev_parse(trim_newline(read(join_paths(current_directory, '.git/refs/tags/' + ref))));
|
|
} else {
|
|
// unknown ref
|
|
return false;
|
|
}
|
|
}
|
|
</textarea>
|
|
<textarea>
|
|
function git_symbolic_ref(ref) {
|
|
var ref_file = join_paths(current_directory, '.git/' + ref);
|
|
if (exists(ref_file) && read(ref_file).startsWith('ref: ')) {
|
|
return trim_newline(read(ref_file)).substr('ref: '.length);
|
|
} else {
|
|
return false;
|
|
}
|
|
}
|
|
</textarea>
|
|
<textarea>
|
|
function trim_newline(s) {
|
|
if (s.endsWith('\n')) { return s.substr(0, s.length-1); } else { return s; }
|
|
}
|
|
</textarea>
|
|
<textarea>
|
|
write('proj/README', 'This is my Scheme project -- with updates!');
|
|
var second_commit = git_commit(['README', 'src/main.scm'], 'Some updates');
|
|
</textarea>
|
|
</section>
|
|
|
|
<section id="git-tag">
|
|
<h1><code>git tag</code></h1>
|
|
<p>Tags are like branches, but are stored in <code>.git/refs/tags/the_tag_name</code>
|
|
and a tag is not normally modified. Once created, it's supposed to always point
|
|
to the same version.</p>
|
|
<p>GIT does offer a <code>git tag -f existing-tag new-hash</code> command,
|
|
but using it should be a rare occurrence.</p>
|
|
<textarea id="in17">
|
|
function git_tag(tag_name, commit_hash) {
|
|
mkdir(join_paths(current_directory, '.git/refs'));
|
|
mkdir(join_paths(current_directory, '.git/refs/tags'));
|
|
if (exists(join_paths(current_directory, '.git/refs/tags/' + tag_name))) {
|
|
alert("tag already exists");
|
|
return false;
|
|
} else {
|
|
write(join_paths(current_directory, '.git/refs/tags/' + tag_name), commit_hash + '\n');
|
|
return true;
|
|
}
|
|
}
|
|
|
|
// git tag v1.0 0123456789012345678901234567890123456789
|
|
git_tag('v1.0', second_commit);
|
|
</textarea>
|
|
</section>
|
|
|
|
<section id="git-checkout">
|
|
<h1><code>git checkout</code></h1>
|
|
<section id="checkout-branch-vs-other">
|
|
<h1>Checkout, branches and other references</h1>
|
|
<p>More importantly, the HEAD does not normally point to a tag. Although nothing actually
|
|
prevents writing <code>ref: refs/tags/v1.0</code> into <code>.git/HEAD</code>, the GIT
|
|
commands will not automatically do this. For example, <code>git checkout tag-or-branch-or-hash</code>
|
|
will put a symbolic <code>ref: </code> in <code>.git/HEAD</code> only if the argument is a branch.</p>
|
|
<textarea id="in18">
|
|
function git_checkout(tag_or_branch_or_hash) {
|
|
if (exists(join_paths(current_directory, '.git/refs/heads/' + tag_or_branch_or_hash))) {
|
|
// Normal (attached) HEAD, points to 'ref: refs/heads/the_branch_name'
|
|
write(join_paths(current_directory, '.git/HEAD'), 'ref: refs/heads/' + tag_or_branch_or_hash + '\n');
|
|
} else {
|
|
// Detached HEAD, points directly to commit hash
|
|
write(join_paths(current_directory, '.git/HEAD'), git_rev_parse(tag_or_branch_or_hash) + '\n');
|
|
}
|
|
checkout_files(git_rev_parse('HEAD'));
|
|
}
|
|
</textarea>
|
|
</section>
|
|
|
|
<section id="checkout-files">
|
|
<h1>Checking out files</h1>
|
|
<textarea>
|
|
function checkout_files(hash) {
|
|
var commit = parse_commit(hash);
|
|
checkout_tree(current_directory, commit.tree);
|
|
}
|
|
|
|
function checkout_tree(path_prefix, hash) {
|
|
var entries = parse_tree(hash);
|
|
for (var i = 0; i < entries.length; i++) {
|
|
var o = parse_object(entries[i].hash);
|
|
var entry_path = join_paths(path_prefix, entries[i].name);
|
|
if (o.type == 'blob') {
|
|
write(entry_path, o.contents);
|
|
} else {
|
|
checkout_tree(entry_path, entries[i].hash)
|
|
}
|
|
}
|
|
}
|
|
</textarea>
|
|
</section>
|
|
|
|
<section id="parse-assert">
|
|
<h1>Assert</h1>
|
|
The parsers will check that their input looks reasonably well-formed, using <code>assert()</code>.
|
|
<textarea>
|
|
function assert(boolean, text) {
|
|
if (! boolean) { alert("assertion failed: " + text); throw new Error(text); }
|
|
}
|
|
</textarea>
|
|
</section>
|
|
|
|
<section id="parsed-compressed">
|
|
<h1>Reading compressed objects</h1>
|
|
<textarea>
|
|
function parse_object(hash) {
|
|
var compressed = read(join_paths(current_directory, '.git/objects/' + hash.substr(0,2) + '/' + hash.substr(2)));
|
|
var inflated = inflate(compressed);
|
|
var split = inflated.match(/^([\s\S]*?) ([\s\S]*?)\0([\s\S]*)$/);
|
|
|
|
assert(split, "ill-formed object");
|
|
var type = split[1];
|
|
var length = split[2];
|
|
var contents = split[3];
|
|
assert(contents.length == length, "object has incorrect length");
|
|
|
|
return { type: type, length: length, contents: contents };
|
|
}
|
|
</textarea>
|
|
</section>
|
|
|
|
<section id="parse-tree">
|
|
<h1>Parsing tree objects</h1>
|
|
<textarea>
|
|
function parse_tree(hash) {
|
|
var tree = parse_object(hash);
|
|
var split = tree.contents.split(/(?<=\0[\s\S]{20})/);
|
|
assert(split, 'invalid contents of tree object');
|
|
var entries = [];
|
|
for (var i = 0; i < split.length; i++) {
|
|
entries.push(parse_tree_entry(split[i]));
|
|
}
|
|
return entries;
|
|
}
|
|
</textarea>
|
|
|
|
<textarea>
|
|
function parse_tree_entry(entry) {
|
|
var split = entry.match(/^([0-9]+) ([\s\S]*)\0([\s\S]{20})$/);
|
|
assert(split, 'invalid entry in tree object');
|
|
var mode = split[1];
|
|
var name = split[2];
|
|
var hash = to_hex(split[3]);
|
|
return { mode: mode, name: name, hash: hash };
|
|
}
|
|
</textarea>
|
|
|
|
<p>The <code>parse_tree</code> function above needs a small utility to convert hashes in binary form to a hexadecimal representation.</p>
|
|
<textarea id="in19">
|
|
function to_hex(bin) {
|
|
var bin = String(bin);
|
|
var hex = "";
|
|
for (var i = 0; i < bin.length; i++) {
|
|
hex += left_pad(bin.charCodeAt(i).toString(16), '0', 2);
|
|
}
|
|
return hex;
|
|
}
|
|
</textarea>
|
|
</section>
|
|
|
|
<section id="parse-commit">
|
|
<h1>Parsing commit objects</h1>
|
|
<textarea>
|
|
function parse_commit(hash) {
|
|
var commit = parse_object(hash);
|
|
var lines = commit.contents.split('\n');
|
|
var tree = null;
|
|
var parents = [];
|
|
var author = null;
|
|
var committer = null;
|
|
var i;
|
|
// A blank line separates the headers from the message.
|
|
for (i = 0; i < lines.length && lines[i] != ''; i++) {
|
|
var split = lines[i].match(/^(.*?) (.*)$/);
|
|
assert(split, "ill-formed commit header: " + lines[i]);
|
|
var header = split[1];
|
|
var value = split[2];
|
|
switch (header) {
|
|
case 'tree':
|
|
assert(!tree, 'duplicate tree header in commit');
|
|
assert(/^[0-9a-f]{40}$/.test(value), "invalid tree header in commit");
|
|
tree = value;
|
|
break;
|
|
case 'parent':
|
|
assert(/^[0-9a-f]{40}$/.test(value), "invalid parent header in commit");
|
|
parents.push(value);
|
|
break;
|
|
case 'author':
|
|
assert(!author, 'duplicate author header in commit');
|
|
author = parse_author(value, 'author');
|
|
break;
|
|
case 'committer':
|
|
assert(!committer, 'duplicate committer header in commit');
|
|
committer = parse_author(value, 'committer');
|
|
break;
|
|
default: /* unknown field, skipping */ break;
|
|
}
|
|
}
|
|
// The message is everything after the blank line.
|
|
message = lines.splice(i+1).join('\n');
|
|
|
|
assert(tree, 'commit lacks tree header');
|
|
assert(author, 'commit lacks author header');
|
|
assert(committer, 'commit lacks committer header');
|
|
|
|
return {
|
|
tree: tree,
|
|
parents: parents,
|
|
author: author,
|
|
committer: committer,
|
|
message: message
|
|
};
|
|
}
|
|
</textarea>
|
|
</section>
|
|
|
|
<section id="parse-author-committer">
|
|
<h1>Parsing author and committer metadata</h1>
|
|
<textarea>
|
|
function parse_author(value, field) {
|
|
var split = value.match(/^(.*?) <(.*?)> ([0-9]+) ([+-])([0-9][0-9])([0-9][0-9])$/);
|
|
assert(split, 'ill-formed ' + field)
|
|
var name = split[1];
|
|
var email = split[2];
|
|
var date = new Date(parseInt(split[3], 10) * 1000);
|
|
var timezone_sign = (split[4] == '+' ? 1 : -1);
|
|
var timezone_hours = parseInt(split[5], 10);
|
|
var timezone_minutes = parseInt(split[6], 10);
|
|
var timezone = timezone_sign * (timezone_hours * 60 + timezone_minutes);
|
|
return { name: name, email: email, date: date, timzeone: timezone };
|
|
}
|
|
</textarea>
|
|
</section>
|
|
|
|
<section id="checkout-example">
|
|
<h1>Example checkout</h1>
|
|
<p></p>
|
|
<textarea id="in20">
|
|
git_checkout(initial_commit);
|
|
</textarea>
|
|
</section>
|
|
</section>
|
|
|
|
<section id="git-init">
|
|
<h1><code>git init</code></h1>
|
|
<p>The <code>git init</code> command creates the <code>.git</code> directory and points <code>.git/HEAD</code>
|
|
to the default branch (a file which does not exist yet, as this branch does not contain any commit at this point).</p>
|
|
<textarea id="in21">
|
|
function git_init() {
|
|
git_init_mkdir();
|
|
git_init_head();
|
|
}
|
|
</textarea>
|
|
</section>
|
|
|
|
<section id="index">
|
|
<h1>The index</h1>
|
|
<p>When adding files with <code>git add</code>, GIT does not immediately create a commit object.
|
|
Instead, it adds the files to the index, which uses a binary format with lots of metadata.
|
|
The mock filesystem used here lacks most of these pieces of information, so thr value <code>0</code>
|
|
will be used for most fields. See <a href="https://mincong.io/2018/04/28/git-index/">this blog post</a>
|
|
for a more in-depth study of the index.</p>
|
|
<textarea id="index-binary-utils">
|
|
function binary(val, bytes) {
|
|
return from_hex(left_pad(val.toString(16), '0', bytes*2));
|
|
}
|
|
|
|
function binary16(val) { return binary(val, 2); }
|
|
function binary32(val) { return binary(val, 4); }
|
|
function binary64(val) { return binary(val, 8); }
|
|
</textarea>
|
|
|
|
<textarea id="make-index">
|
|
function store_index(paths) {
|
|
var magic = 'DIRC' // DIRectory Cache
|
|
var version = binary32(2);
|
|
var entries = binary32(paths.length);
|
|
var header = magic + version + entries;
|
|
|
|
index = header;
|
|
|
|
for (var i = 0; i < paths.length; i++) {
|
|
var ctime = binary64(0);
|
|
var mtime = binary64(0);
|
|
var device = binary32(0);
|
|
var inode = binary32(0);
|
|
// default permissions for files, in octal.
|
|
var mode = binary32(0100644);
|
|
var uid = binary32(0);
|
|
var gid = binary32(0);
|
|
var size = binary32(read(join_paths(current_directory, paths[i])).length);
|
|
var hash = from_hex(hash_object(true, 'blob', false, paths[i]));
|
|
// for this simple index, the flags (the 4 higher bits) are 0.
|
|
assert(paths[i].length < 0xfff)
|
|
var flags_and_file_path_length = binary16(paths[i].length)
|
|
var file_path = paths[i] + '\0';
|
|
entry = ctime + mtime + device + inode + mode + uid + gid + size
|
|
+ hash + flags_and_file_path_length + file_path;
|
|
while (entry.length % 8 != 0) {
|
|
// pad with null bytes to a multiple of 8 bytes (64-bits).
|
|
entry += '\0';
|
|
}
|
|
|
|
index += entry;
|
|
}
|
|
|
|
index += from_hex(sha1(index));
|
|
|
|
write(join_paths(current_directory, '.git/index'), index)
|
|
}
|
|
</textarea>
|
|
</section>
|
|
|
|
<section id="playground">
|
|
<h1>Playground</h1>
|
|
<p>The implementation is now sufficiently complete to create a small repository.</p>
|
|
<textarea id="playground-reset">
|
|
// Reset the filesystem to its initial state
|
|
filesystem = {};
|
|
current_directory = '';
|
|
</textarea>
|
|
|
|
<textarea id="playground-play">
|
|
cd('proj');
|
|
mkdir('proj');
|
|
write('proj/README', 'This is my implementation of GIT.\n');
|
|
mkdir('proj/src');
|
|
write('proj/src/main.scm', "(define filesystem '())\n...\n");
|
|
|
|
git_init();
|
|
git_commit(['README', 'src/main.scm'], 'A well-understood initial commit.');
|
|
|
|
git_branch('dev', 'HEAD');
|
|
git_checkout('dev');
|
|
|
|
write('proj/src/main.scm', "(define filesystem '())\n(define current_directory \"\")\n");
|
|
git_commit(['README', 'src/main.scm'], 'What an update!');
|
|
|
|
git_checkout('main');
|
|
|
|
// update the cache of the working directory. Without this,
|
|
// GIT finds an empty cache, and thinks all files are scheduled
|
|
// for deletion, until "git add ." allows it to realize that
|
|
// the working directory matches the contents of HEAD.
|
|
store_index(['README', 'src/main.scm']);
|
|
</textarea>
|
|
|
|
<p>By clicking on "Copy commands to recreate in *nix terminal.", it is possible to copy a series of <code>mkdir …</code> and <code>printf … > …</code> commands that, when executed, will recreate the virtual filesystem on a real system. The resulting
|
|
folder is binary-compatible with the official <code>git log</code>, <code>git status</code>, <code>git checkout</code> etc.
|
|
commands.</p>
|
|
</section>
|
|
|
|
<section id="conclusion">
|
|
<h1>Conclusion</h1>
|
|
<p>This article shows that a large part of the core of GIT can be re-implemented in <span class="loc-count">a few</span> source lines of code* (<a href="javascript:___copy_all_code(); void(0);">copy all the code</a>).
|
|
<span style="font-size: small">* empty lines and single closing braces excluded, <span class="loc-count-total">a few more</span> in total.</span></p>
|
|
<div id="copy-all-code" style="display: none;"></div>
|
|
<ul>
|
|
|
|
</ul>
|
|
<li>Some of the features which may appear mysterious at first sight (e.g. detached HEAD) should be clearer with the knowledge of how GIT works behind the scenes.</li>
|
|
<li>Furthermore, branches are often associated with an intuition (containers into which commits are added) which does not match the implementation (mutable pointers to commits).</li>
|
|
<li>Finally, it is tempting to think of commits as patches. While <code>darcs</code> tries to expose an interface which matches this intuition, it is clear that the implementation of GIT considers commits as copies of the entire repository, and are linked to the previous version solely by the <code>parent</code> metadata in the commit headers.</li>
|
|
</ul>
|
|
<p>A few core commands like <code>git diff</code> and <code>git apply</code> are not described in this tutorial.
|
|
They are little more than improved versions of the classical *nix commands <code>diff</code> and <code>patch</code>.</p>
|
|
<p>Most other commands provided by GIT are merely convenience wrappers around these commands. For example, <code>git cherry-pick</code> is simply a combination of <code>git diff</code> between the tree of a commit and the tree of its parent, followed by <code>git apply</code> to apply the patch and <code>git commit</code> to create a new commit whose diff is equivalent to the diff of the original commit. As an other example, the command <code>git rebase</code> performs as succession of <code>cherry-pick</code> operations.</p>
|
|
<p>By keeping in mind the internal model of GIT, it becomes easier to understand the usual commands and their quirks. By undersanding the design philosophy behind the implementation, the day-to-day usage can become, hopefully, less surprising.</p>
|
|
</section>
|
|
|
|
<div id="toc"></div>
|
|
</article>
|
|
|
|
<script src="git-tutorial.js"></script>
|
|
</body>
|
|
</html>
|