1582 lines
68 KiB
HTML
1582 lines
68 KiB
HTML
<html>
|
|
<head>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
|
|
<title>GIT tutorial</title>
|
|
|
|
<!-- Third-party libraries: -->
|
|
<link rel="stylesheet" href="codemirror-5.60.0/lib/codemirror.css">
|
|
<script src="codemirror-5.60.0/lib/codemirror.js"></script>
|
|
<script src="codemirror-5.60.0/mode/javascript/javascript.js"></script>
|
|
<script src="sha1.js"></script>
|
|
<script src="pako.min.js"></script>
|
|
<script src="viz.js"></script>
|
|
<link rel="stylesheet" href="codemirror-5.60.0/lib/codemirror.css">
|
|
|
|
<!-- Implementation of the tutorial's helper tools (code editor, graph view, table of contents, table output and arrows): -->
|
|
<link rel="stylesheet" href="git-tutorial.css">
|
|
<script src="git-tutorial.js"></script>
|
|
<script class="example">
|
|
var examples=[];
|
|
function ___h2f(hash) { return 'proj/.git/objects/'+hash.substring(0,2)+'/'+hash.substring(2); }
|
|
function ___example(id, f) {
|
|
examples.push(function () {
|
|
var result = f();
|
|
var fs = {};
|
|
for (var i = 0; i < result.names.length; i++) {
|
|
fs[result.names[i]] = filesystem[result.names[i]];
|
|
}
|
|
var previous_fs = {};
|
|
for (var i = 0; i < result.previous_names.length; i++) {
|
|
previous_fs[result.previous_names[i]] = filesystem[result.previous_names[i]];
|
|
}
|
|
___eval_result_to_html(id, fs, previous_fs, [], true, result.omit_graph);
|
|
});
|
|
}
|
|
</script>
|
|
</head>
|
|
<body>
|
|
|
|
<article id="git-tutorial">
|
|
<h1>Under construction</h1>
|
|
|
|
<p>The main reference for this tutorial is the <a href="https://git-scm.com/book/en/v2/Git-Internals-Git-Objects">Pro Git book</a> section on GIT internals.</p>
|
|
|
|
<p>This tutorial uses three libraries:</p>
|
|
<ul>
|
|
<li><a href="https://codemirror.net/">CodeMirror</a>, released under the MIT license,</li>
|
|
<li><a href="https://www.movable-type.co.uk/scripts/sha1.html">sha1.js</a>, released under the MIT license,</li>
|
|
<li><a href="https://github.com/nodeca/pako">pako 2.0.3</a>, released under the MIT and Zlib licenses, see the project page for details,</li>
|
|
<li><a href="https://github.com/mdaines/viz.js">Viz.js</a> (<a href="https://github.com/mdaines/viz.js/releases/tag/v1.8.2">v1.8.2</a> which has a synchronous API), released under the MIT license.</li>
|
|
</ul>
|
|
|
|
<section id="introduction">
|
|
<h1>Introduction</h1>
|
|
<p>
|
|
GIT is based on a simple model, with a lot of shorthands for common
|
|
use cases. This model is sometimes hard to guess just from the
|
|
everyday commands. To illustrate how GIT works, we'll implement a
|
|
stripped down clone of GIT in <span class="loc-count">a few</span> lines of
|
|
JavaScript.
|
|
<span style="font-size: small">* empty lines and single closing braces
|
|
excluded, <span class="loc-count-total">a few more</span> in total.</span>
|
|
</p>
|
|
</section>
|
|
|
|
<section id="os-filesystem">
|
|
<h1>The Operating System's filesystem</h1>
|
|
|
|
<section id="os-filesystem-model">
|
|
<h1>Model of the filesystem</h1>
|
|
<p>The Operating System's filesystem will be simulated by a very
|
|
simple key-value store. In this very simple filesystem, directories
|
|
are entries mapped to <code>null</code> and files are entries mapped
|
|
to strings. The path to the current directory is stored in a separate
|
|
variable.</p>
|
|
<textarea id="in0">
|
|
var filesystem = {};
|
|
var current_directory = '';
|
|
</textarea>
|
|
</section>
|
|
|
|
<section id="os-filesystem-functions">
|
|
<h1>Filesystem access functions<span class="notoc"> (<code>read</code>, <code>write</code>, <code>mkdir</code>, <code>exists</code>, <code>remove</code>, <code>cd</code>)</span></h1>
|
|
<p>The filesystem exposes functions to read an entire file, create or
|
|
replace an entire file, create a directory, test the existence of a filesystem entry, and change the current directory.</p>
|
|
<textarea id="in1">
|
|
function read(filename) {
|
|
return filesystem[filename];
|
|
}
|
|
|
|
function write(filename, data) {
|
|
filesystem[filename] = String(data);
|
|
}
|
|
|
|
function exists(filename) {
|
|
return typeof(filesystem[filename]) !== 'undefined';
|
|
}
|
|
|
|
function mkdir(dirname) {
|
|
filesystem[dirname] = null;
|
|
}
|
|
|
|
function cd(dirname) {
|
|
current_directory = dirname;
|
|
}
|
|
|
|
function remove(path, recursive) {
|
|
if (recursive && filesystem[path] === null) {
|
|
var children = listdir(path);
|
|
for (var i = 0; i < children.length; i++) {
|
|
remove(path + '/' + children[i], true);
|
|
}
|
|
}
|
|
delete filesystem[path];
|
|
}
|
|
</textarea>
|
|
</section>
|
|
|
|
<section id="os-filesystem-listdir">
|
|
<h1>Filesystem access functions<span class="notoc"> (<code>listdir</code>)</span></h1></h1>
|
|
<p>It will be handy for some operations to list the contents of a
|
|
directory.</p>
|
|
<textarea id="in2">
|
|
function listdir(dirname) {
|
|
var depth = dirname.split('/').length;
|
|
// Get all paths in the filesystem
|
|
var paths = Object.keys(filesystem);
|
|
// Filter to keep only the paths starting with the given dirname
|
|
var prefix = dirname + '/';
|
|
var descendents = paths.filter(function (filename) {
|
|
return filename.startsWith(prefix) && (filename.length > prefix.length);
|
|
});
|
|
// Keep only the next path component
|
|
var children = descendents.map(function (filename) {
|
|
return filename.split('/')[depth];
|
|
});
|
|
// remove duplicates, listdir('a') with paths a/b/c and a/b/d and a/x
|
|
// should only return ['b', 'x'], not 'b', 'b', x.
|
|
return Array.from(new Set(children));
|
|
}
|
|
</textarea>
|
|
</section>
|
|
</section>
|
|
|
|
<section id="example-working-directory">
|
|
<h1>Example working tree</h1>
|
|
<p>Our imaginary user will create a <code>proj</code> directory,
|
|
and start filling in some files.</p>
|
|
<div class="trivia">
|
|
<p trivia>
|
|
A <em>working tree</em> designates the directory (and the subdirectories and files within) in which
|
|
the user will normally view and edit the files. GIT has commands to save the state of the working tree
|
|
(git commit), in order to be able to go back in time later on, and view older versions of the files.
|
|
The command <code>git worktree</code> allows the user to create multiple working trees using the same
|
|
local repository. This effectively allows the user to easily have two or more versions of the project
|
|
side-by-side. GIT commands can be invoked in either copy. It is worth noting that the <code>.git/</code>
|
|
directory exists only in the original working tree; while it is safe to remove other worktrees (followed by
|
|
an invocation of <code>git worktree prune</code> from one of the remaining working tree to let GIT
|
|
detect the deletion), the removal of the original working tree will discard ths <code>.git/</code>
|
|
directory, and all versions of the project that have not been published elsewhere (usually via
|
|
<code>git push</code>) will be lost.
|
|
</p>
|
|
</div>
|
|
<textarea id="in3">
|
|
mkdir('proj');
|
|
cd('proj');
|
|
write('proj/README', 'This is my Scheme project.\n');
|
|
mkdir('proj/src');
|
|
write('proj/src/main.scm', '(map (lambda (x) (+ x 1)) (list 1 2 3))\n');
|
|
</textarea>
|
|
</section>
|
|
|
|
<section id="git-init-dot-git">
|
|
<h1><code>git init</code> (creating <code>.git</code>)</h1>
|
|
<p>The first thing to do is to initialize the GIT directory.
|
|
For now, only the <code>.git</code> folder is needed, The rest
|
|
of the function implementing <code>git init</code> will be
|
|
written later.</p>
|
|
<textarea id="in4">
|
|
function join_paths(a, b) {
|
|
return (a == "") ? b : (a + "/" + b);
|
|
}
|
|
|
|
// git init (partial implementation: create the .git directory)
|
|
function git_init_mkdir() {
|
|
mkdir(join_paths(current_directory, '.git'));
|
|
}
|
|
|
|
git_init_mkdir();
|
|
</textarea>
|
|
<p>Click on the <em>eval</em> button to see the files and directories that were
|
|
created so far.</p>
|
|
</section>
|
|
|
|
<section id="git-hash-object">
|
|
<h1><code>git hash-object</code><span class="notoc"> (storing a copy of a file in <code>.git</code>)</span></h1>
|
|
<p>The most basic element of a GIT repository is an <em>object</em>. Objects have a type which can be
|
|
<code>blob</code> (individual files), <code>tree</code> (directories),
|
|
<code>commit</code> (pointers to a specific version of the root directory,
|
|
with a description and some metadata) and <code>tag</code> (named pointers to a specific commit,
|
|
with a description and some metadata).
|
|
|
|
When a file is added to the git repostitory, a compressed copy is stored in GIT's database,
|
|
in the <code>.git/objects/</code> folder. This copy is a <em>blob</em> object.</p>
|
|
<p>The compressed copy is given a unique filename, which is obtained by hashing the contents of the original file.
|
|
Some filesystems have poor performance when a single directory contains a large number of files, and some filesystems
|
|
have a limit on the number of files that a directory may contain. To circumvent these issues, the first two characters
|
|
of the hash are used as the name of an intermediate directory: if a file's hash is <code>0a1bd…</code>, its compressed
|
|
copy will be stored in <code>.git/objects/0a/1bd…</code></p>
|
|
|
|
<p>This function creates a file that looks like this:</p>
|
|
|
|
<div id="example-blob-object-template"></div>
|
|
<script class="example">
|
|
___example('example-blob-object-template', function() {
|
|
var object_contents = 'type length\000Contents of path_or_data';
|
|
var hash = sha1_from_bytes_returns_hex(object_contents);
|
|
var path = ___h2f(hash);
|
|
write(path, deflate(object_contents));
|
|
return { filesystem: filesystem, names: [path], previous_names: [] };
|
|
});
|
|
</script>
|
|
|
|
<p>The objects stored in the GIT database are compressed with zlib
|
|
(using the "deflate" compression method). The filesystem view shows
|
|
the marker <span class="deflated">deflated:</span> followed by the
|
|
uncompressed data. Click on the (un)compressed data to toggle between
|
|
this pretty-printed view and the raw compressed data.</p>
|
|
|
|
<p>When creating some <code>blob</code> objects, the result could be, for example:</p>
|
|
|
|
<div id="example-blob-objects"></div>
|
|
<script class="example">
|
|
___example('example-blob-objects', function() {
|
|
var names = [
|
|
___h2f(hash_object(true, 'blob', false, 'src/main.scm')),
|
|
___h2f(hash_object(true, 'blob', false, 'README')),
|
|
];
|
|
return { filesystem: filesystem, names: names, previous_names: [] };
|
|
});
|
|
</script>
|
|
|
|
<p>This function reproduces faithfully the behaviour of (a subset of the options of)
|
|
the <code>git hash-object</code> command which can be called on a real git command-line.</p>
|
|
|
|
<textarea id="in5">
|
|
// git hash-object [-w] -t <type> [--stdin] [path]
|
|
function hash_object(must_write, type, is_data, path_or_data) {
|
|
if (is_data) {
|
|
var data = path_or_data;
|
|
} else {
|
|
var data = read(join_paths(current_directory, path_or_data));
|
|
}
|
|
|
|
object_contents = type + ' ' + data.length + '\0' + data;
|
|
|
|
var hash = sha1_from_bytes_returns_hex(object_contents);
|
|
|
|
if (must_write) {
|
|
mkdir(join_paths(current_directory, '.git/objects'));
|
|
mkdir(join_paths(current_directory, '.git/objects/' + hash.substring(0,2)));
|
|
var path = '.git/objects/' + hash.substring(0,2) + '/' + hash.substring(2);
|
|
var object_full_path = join_paths(current_directory, path);
|
|
// deflate() compresses using zlib
|
|
write(object_full_path, deflate(object_contents));
|
|
}
|
|
|
|
return hash;
|
|
}
|
|
</textarea>
|
|
|
|
<section id="add-file-to-git">
|
|
<h1>Adding a file to the GIT database</h1>
|
|
<p>So far, our GIT database does not know about any of the user's
|
|
files. In order to add the contents of the <code>README</code> file in
|
|
the database, we use <code>git hash-object -w -t blob README</code>,
|
|
where <code>-w</code> tells GIT to <em>write</em> the object in its
|
|
database, and <code>-t blob</code> indicates that we want to create
|
|
a <em>blob</em> object, i.e. the contents of a file.</p>
|
|
<textarea id="in6">
|
|
// git hash-object -w -t blob README
|
|
hash_object(true, 'blob', false, 'README');
|
|
</textarea>
|
|
<p>Click on the <em>eval</em> button to see the file that was
|
|
created by this call.</p>
|
|
|
|
<p>You can notice that the database does not contain the name of the
|
|
original file, only its content, stored under a unique identifier which is
|
|
derived by hashing that content. Let's add the second user file
|
|
to the database.</p>
|
|
<textarea id="in7">
|
|
// git hash-object -w -t blob src/main.scm
|
|
hash_object(true, 'blob', false, 'src/main.scm');
|
|
</textarea>
|
|
</section>
|
|
</section>
|
|
|
|
<section id="zlib-compression-note">
|
|
<h1><code>zlib</code> compression</h1>
|
|
<p>GIT compresses objects with zlib. The <code>deflate()</code> function used in
|
|
the script above comes from the <a href="https://github.com/nodeca/pako">pako 2.0.3</a> library.
|
|
To view a zlib-compressed object in your *nix terminal, simply write this
|
|
declaration in your shell.</p>
|
|
<pre>
|
|
unzlib() {
|
|
python -c \
|
|
"import sys,zlib; \
|
|
sys.stdout.buffer.write(zlib.decompress(open(sys.argv[1], 'rb').read()));" \
|
|
"$1"
|
|
}
|
|
</pre>
|
|
<p>You can then inspect git objects as follows, using <code>hexdump</code> to view the null bytes and other non-printable bytes.</p>
|
|
<pre>unzlib .git/objects/95/d318ae78cee607a77c453ead4db344fc1221b7 | hexdump -Cv</pre>
|
|
</section>
|
|
|
|
<section id="storing-trees">
|
|
<h1>Storing trees (list of hashed files and subtrees)</h1>
|
|
<p>At this point GIT knows about the contents of both of the user's
|
|
files, but it would be nice to also store the filenames.
|
|
This is done by creating a <em>tree</em> object</p>
|
|
|
|
<p>A tree object can contain files (by associating the blob's hash to its name), or directories (by associating the hash of other subtrees to their name).
|
|
The mode (<code>100644</code> for the file and <code>40000</code> for the folder) indicates the permissions, and is given in octal using <a href="https://unix.stackexchange.com/a/145118/19059">the values used by *nix</a></p>
|
|
|
|
<div id="example-tree-objects"></div>
|
|
<script class="example">
|
|
___example('example-tree-objects', function() {
|
|
var main = ___h2f(hash_object(true, 'blob', false, 'src/main.scm'));
|
|
var readme = ___h2f(hash_object(true, 'blob', false, 'README'));
|
|
var src = ___h2f(store_tree("src", ["main.scm"], []));
|
|
var proj = ___h2f(paths_to_tree(["README", "src/main.scm"]));
|
|
var previous_names = [ main, readme ];
|
|
var names = [ main, readme, src, proj ];
|
|
return { filesystem: filesystem, names: names, previous_names: previous_names };
|
|
});
|
|
</script>
|
|
|
|
<p>In the contents of a tree, subdirectories (trees) are listed before files (blobs);
|
|
within each group the entries are ordered alphabetically.</p>
|
|
|
|
<textarea id="in8">
|
|
// base_directory is a string
|
|
// filenames is a list of strings
|
|
// subtrees is a list of {name, hash} objects.
|
|
function store_tree(base_directory, filenames, subtrees) {
|
|
function get_file_hash(filename) {
|
|
var path = join_paths(base_directory, filename);
|
|
var hash = hash_object(true, 'blob', false, path)
|
|
return hex_to_raw_bytes(hash);
|
|
}
|
|
|
|
var blobs = filenames.map(function (filename) {
|
|
return "100644 " + filename + "\0" + get_file_hash(filename);
|
|
});
|
|
|
|
var trees = subtrees.map(function (subtree) {
|
|
return "40000 " + subtree.name + "\0" + hex_to_raw_bytes(subtree.hash);
|
|
});
|
|
|
|
// blobs are listed before subtrees
|
|
var tree_content = blobs.join('') + trees.join('');
|
|
|
|
// cat tree_content | git hash-object -w -t tree --stdin
|
|
return hash_object(true, 'tree', true, tree_content);
|
|
}
|
|
</textarea>
|
|
|
|
<p>This function needs a small utility to convert hashes encoded in hexadecimal to raw bytes.</p>
|
|
<textarea id="in9">
|
|
function hex_to_raw_bytes(hex) {
|
|
var hex = String(hex);
|
|
var str = ""
|
|
for (var i = 0; i < hex.length; i+=2) {
|
|
str += String.fromCharCode(parseInt(hex.substring(i, i + 2), 16));
|
|
}
|
|
return str;
|
|
}
|
|
</textarea>
|
|
|
|
<section id="store-tree-example">
|
|
<h1>Example use of <code>store_tree()</code></h1>
|
|
|
|
<p>The following code, once uncommented, stores into the GIT database the trees for <code>src</code>
|
|
and for the root directory of the GIT project.</p>
|
|
<textarea id="in10">
|
|
//hash_src_tree = store_tree("src", ["main.scm"], []);
|
|
//hash_root_tree = store_tree("", ["README"], [{name:"src", hash:hash_src_tree}]);
|
|
</textarea>
|
|
<p>The <code>store_tree()</code> function needs to be called for the contents of subdirectories
|
|
first, and that result can be used to store the trees of upper directories. In the next section,
|
|
we will write a function which takes a list of paths, constructs an internal representation of
|
|
the hierarchy, and stores the corresponding trees bottom-up.</p>
|
|
</section>
|
|
|
|
<section id="store-tree-from-paths">
|
|
<h1>Storing a tree from a list of paths</h1>
|
|
<p>Making trees out of the subfolders one by one is cumbersome.
|
|
The following utility function takes a list of paths, and builds
|
|
a tree from those.</p>
|
|
|
|
<textarea id="in11">
|
|
function paths_to_tree(paths) {
|
|
// This temporary mutable object will store a hierarchy of
|
|
// subfolders and files, e.g.
|
|
// {
|
|
// subfolders: { src: { subfolders: [], files: ['main.scm'] } }
|
|
// files: ['README']
|
|
// }
|
|
var hierarchy = { subfolders: {}, files: [] };
|
|
|
|
// This splits the input paths on occurrences of "/",
|
|
// and inserts them into the "hierarchy" object.
|
|
for (var i = 0; i < paths.length; i++) {
|
|
var path_components = paths[i].split('/');
|
|
var h = hierarchy;
|
|
for (var j = 0; j < path_components.length - 1; j++) {
|
|
if (! h.subfolders.hasOwnProperty(path_components[j])) {
|
|
h.subfolders[path_components[j]] = {
|
|
subfolders: {},
|
|
files: []
|
|
};
|
|
}
|
|
h = h.subfolders[path_components[j]];
|
|
}
|
|
h.files[h.files.length] = path_components[path_components.length - 1];
|
|
}
|
|
|
|
// This function takes the path to a directory, e.g. "src",
|
|
// and a hierarchy object e.g. { subfolders: [], files: ['main.scm'] }.
|
|
// It recursively stores the tree object for that directory into
|
|
// GIT's database.
|
|
var to_tree = function(base_directory, hierarchy) {
|
|
var subtrees = [];
|
|
for (var i in hierarchy.subfolders) {
|
|
if (hierarchy.subfolders.hasOwnProperty(i)) {
|
|
subtrees[subtrees.length] = {
|
|
name: i,
|
|
hash: to_tree(join_paths(base_directory, i), hierarchy.subfolders[i])
|
|
};
|
|
}
|
|
}
|
|
return store_tree(base_directory, hierarchy.files, subtrees);
|
|
}
|
|
|
|
// Store the trees for the whole hierarchy, starting from the
|
|
// root directory of the GIT repository (which is represented
|
|
// as an empty path "")
|
|
return to_tree("", hierarchy);
|
|
}
|
|
|
|
// git add README src/main.scm
|
|
paths_to_tree(["README", "src/main.scm"]);
|
|
</textarea>
|
|
</section>
|
|
</section>
|
|
|
|
<section id="store-commit">
|
|
<h1>Storing a commit in the GIT database</h1>
|
|
<p>Now that the GIT database contains the entire tree for the current version,
|
|
a commit can be created. A commit contains</p>
|
|
<ul>
|
|
<li>the hash of the tree object,</li>
|
|
<li>the hash of the previous commit, which is dubbed the <code>parent</code> (merge commits have two or more parents, and the initial commit has no parent commit),</li>
|
|
<li>information about the author (the person who initially wrote the code),</li>
|
|
<li>information about the committer (the person who adds the code to the GIT
|
|
database, often the same person as the author, but it can be a different person
|
|
e.g. when someone else rewrites the history with a rebase or applies a patch recieved
|
|
by e-mail),</li>
|
|
<li>and a description.</li>
|
|
</ul>
|
|
|
|
<div id="example-commit-object"></div>
|
|
<script class="example">
|
|
___example('example-commit-object', function() {
|
|
var main = ___h2f(hash_object(true, 'blob', false, 'src/main.scm'));
|
|
var readme = ___h2f(hash_object(true, 'blob', false, 'README'));
|
|
var src = ___h2f(store_tree("src", ["main.scm"], []));
|
|
var proj = ___h2f(paths_to_tree(["README", "src/main.scm"]));
|
|
var initial_commit = ___h2f(store_commit(
|
|
paths_to_tree(["README", "src/main.scm"]),
|
|
[],
|
|
{name:'Ada Lovelace', email:'ada@analyti.cal', date:new Date(1617120803000), timezoneMinutes: +60},
|
|
{name:'Ada Lovelace', email:'ada@analyti.cal', date:new Date(1617120803000), timezoneMinutes: +60},
|
|
'Initial commit'));
|
|
var previous_names = [ main, readme, src, proj ];
|
|
var names = [ main, readme, src, proj, initial_commit ];
|
|
return { filesystem: filesystem, names: names, previous_names: previous_names };
|
|
});
|
|
</script>
|
|
|
|
<p>The author and committer information contain</p>
|
|
<ul>
|
|
<li>the person's name,</li>
|
|
<li>the person's email,</li>
|
|
<li>the *nix timestamp at which the version was authored or committed,</li>
|
|
<li>and the <a href="https://www.youtube.com/watch?v=q2nNzNo_Xps">timezone for that timestamp</a>.</li>
|
|
</ul>
|
|
<textarea id="in12">
|
|
function store_commit(tree, parents, author, committer, message) {
|
|
var commit_contents = '';
|
|
commit_contents += 'tree ' + tree + '\n';
|
|
for (var i = 0; i < parents.length; i++) {
|
|
commit_contents += 'parent ' + parents[i] + '\n';
|
|
}
|
|
commit_contents += 'author ' + author.name
|
|
+ ' <' + author.email + '> '
|
|
+ format_date(author.date) + ' '
|
|
+ format_timezone(author.timezoneMinutes) + '\n';
|
|
commit_contents += 'committer ' + committer.name
|
|
+ ' <' + committer.email + '> '
|
|
+ format_date(committer.date) + ' '
|
|
+ format_timezone(committer.timezoneMinutes) + '\n';
|
|
commit_contents += '\n';
|
|
commit_contents += '' + message;
|
|
if (message[message.length-1] != '\n') { commit_contents += '\n'; }
|
|
// cat commit_contents | git hash-object -w -t commit --stdin
|
|
return hash_object(true, 'commit', true, commit_contents);
|
|
}
|
|
|
|
function format_date(d) {
|
|
return Math.floor((+d) / 1000);
|
|
}
|
|
function left_pad(s, char, len) {
|
|
while ((''+s).length < len) { s = '' + char + s; }
|
|
return s;
|
|
}
|
|
function format_timezone(tm) {
|
|
var h = Math.floor(Math.abs(+tm)/60);
|
|
var m = Math.abs(+tm)%60;
|
|
return (tm >= 0 ? '+' : '-') + left_pad(h, '0', 2) + left_pad(m, '0', 2);
|
|
}
|
|
</textarea>
|
|
|
|
<section id="store-commit-example">
|
|
<h1>Storing an example commit</h1>
|
|
<p>It is now possible to store a commit in the database. This saves
|
|
a copy of the tree along with some metadata about this version.
|
|
The first commit has no parent, which is represented by passing
|
|
the empty list.</p>
|
|
<textarea id="in13">
|
|
var author = {
|
|
name: 'Ada Lovelace',
|
|
email: 'ada@analyti.cal',
|
|
date: new Date(1617120803000),
|
|
timezoneMinutes: +60
|
|
}
|
|
var committer = author; // in this case, Ada commits her own changes.
|
|
var initial_commit = store_commit(
|
|
paths_to_tree(["README", "src/main.scm"]),
|
|
[],
|
|
author,
|
|
committer,
|
|
'Initial commit');
|
|
</textarea>
|
|
</section>
|
|
</section>
|
|
|
|
<section id="resolving-references">
|
|
<h1>resolving references</h1>
|
|
|
|
<p>The next few subsections will introduce <em>symbolic references</em>
|
|
and other references like branch names, the special name <code>HEAD</code>
|
|
or tag names.</p>
|
|
|
|
<p>Most GIT commands accept as an argument a commit hash or a named reference to a hash.
|
|
In order to implement those, we need to be able to resolve these references first.</p>
|
|
|
|
<p>Symbolic references are nothing more than regular files containing a hexadecimal
|
|
hash or a string of the form <code>ref: path/to/other/symbolic/reference</code>.
|
|
The <code>HEAD</code> reference is stored in <code>.git/HEAD</code>, and can point
|
|
directly to a commit hash like
|
|
<span id="example-reference-head-hash">0123456789abcdef0123456789abcdef01234567</span>,
|
|
or can point to another symbolic reference, in which case the <code>.git/HEAD</code> file
|
|
will contain e.g. <code>refs/heads/main</code>.</p>
|
|
|
|
<p>Branches are simple files stored in <code>.git/refs/heads/name-of-the-branch</code>
|
|
and usually contain a hash like
|
|
<span id="example-reference-branch-hash">0123456789abcdef0123456789abcdef01234567</span>.</p>
|
|
|
|
<p>Tags are identical to branches in terms of representation. It seems that the only difference
|
|
between tags and branches is the behaviour of <code>git checkout</code> and similar commands.
|
|
These commands, as explained in <a href="git-checkout">the section about <code>git checkout</code></a> below,
|
|
normally write <code>ref: refs/heads/name-of-branch</code> in <code>.git/HEAD</code> when
|
|
checking out a branch, but write the hash of the target commit when checking out a tag or
|
|
any other non-branch reference.</p>
|
|
|
|
<div id="example-reference"></div>
|
|
<script class="example">
|
|
___example('example-reference', function() {
|
|
var h2f = function(hash) { return 'proj/.git/objects/'+hash.substring(0,2)+'/'+hash.substring(2); }
|
|
var main = h2f(hash_object(true, 'blob', false, 'src/main.scm'));
|
|
var readme = h2f(hash_object(true, 'blob', false, 'README'));
|
|
var src = h2f(store_tree("src", ["main.scm"], []));
|
|
var proj = h2f(paths_to_tree(["README", "src/main.scm"]));
|
|
|
|
var initial_commit_hash = store_commit(
|
|
paths_to_tree(["README", "src/main.scm"]),
|
|
[],
|
|
{name:'Ada Lovelace', email:'ada@analyti.cal', date:new Date(1617120803000), timezoneMinutes: +60},
|
|
{name:'Ada Lovelace', email:'ada@analyti.cal', date:new Date(1617120803000), timezoneMinutes: +60},
|
|
'Initial commit');
|
|
var initial_commit = h2f(initial_commit_hash);
|
|
|
|
git_branch('main', initial_commit_hash, true);
|
|
var main_branch = 'proj/.git/refs/heads/main';
|
|
|
|
git_tag('v1.0', initial_commit_hash, true);
|
|
var v1_0_tag = 'proj/.git/refs/tags/v1.0';
|
|
|
|
git_init_head();
|
|
var head = 'proj/.git/HEAD';
|
|
|
|
document.getElementById('example-reference-head-hash').innerText = initial_commit_hash;
|
|
document.getElementById('example-reference-branch-hash').innerText = initial_commit_hash;
|
|
|
|
var previous_names = [ main, readme, src, proj, initial_commit ];
|
|
var names = [ main, readme, src, proj, initial_commit, main_branch, v1_0_tag, head ];
|
|
return { filesystem: filesystem, names: names, previous_names: previous_names }
|
|
});
|
|
</script>
|
|
|
|
<p>We'll start with a small utility to remove the newline at the end of a string.
|
|
GIT references are usually files containing a hexadecimal hash, and following
|
|
*NIX tradition these files finish with a newline byte. When reading these
|
|
references, we need to get rid of the newline first.</p>
|
|
|
|
<textarea>
|
|
// Removes the newline at the end of a string, if present.
|
|
function trim_newline(s) {
|
|
return (s[s.length-1] == '\n') ? s.substring(0, s.length-1) : s;
|
|
}
|
|
</textarea>
|
|
|
|
<section id="git-symbolic-ref">
|
|
<h1><code>git symbolic-ref</code></h1>
|
|
<p><code>git symbolic-ref</code> is a low-level command which reads
|
|
(and in the official GIT implementation also writes and updates)
|
|
symbolic references given a path relative to <code>.git/</code>.
|
|
For example, <code>git symbolic-ref HEAD</code> will read the
|
|
contents of the file <code>.git/HEAD</code>, and if that file starts
|
|
with <code>ref: </code>, the rest of the line will be returned.</p>
|
|
|
|
<textarea>
|
|
function git_symbolic_ref(ref) {
|
|
var ref_file = join_paths(current_directory, '.git/' + ref);
|
|
if (exists(ref_file) && read(ref_file).startsWith('ref: ')) {
|
|
var result = trim_newline(read(ref_file)).substring('ref: '.length);
|
|
var recursive = git_symbolic_ref(result);
|
|
return recursive || result;
|
|
} else {
|
|
return false;
|
|
}
|
|
}
|
|
</textarea>
|
|
<div class="trivia">
|
|
<p>The official implementation of GIT follows references recursively
|
|
and returns the <code>path/to/file</code> of the last file of the
|
|
form <code>ref: path/to/file</code>. In the example below,
|
|
<code>git symbolic-ref HEAD</code> would
|
|
<ul>
|
|
<li>read the file <code>proj/.git/HEAD</code> which contains <code>ref: refs/heads/main</code>,</li>
|
|
<li>follow that indirection and read the file <code>proj/.git/refs/heads/main</code> which contains <code>ref: refs/heads/other</code></li>
|
|
<li>follow that indirection and read the file <code>proj/.git/refs/heads/other</code> which contains a hash</li>
|
|
<li>return the last file path that contained a <code>ref:</code>, i.e. return the string <code>refs/heads/other</code></li>
|
|
</ul>
|
|
<div id="example-recursive-ref"></div>
|
|
<script class="example">
|
|
___example('example-recursive-ref', function() {
|
|
var h2f = function(hash) { return 'proj/.git/objects/'+hash.substring(0,2)+'/'+hash.substring(2); }
|
|
var main = h2f(hash_object(true, 'blob', false, 'src/main.scm'));
|
|
var readme = h2f(hash_object(true, 'blob', false, 'README'));
|
|
var src = h2f(store_tree("src", ["main.scm"], []));
|
|
var proj = h2f(paths_to_tree(["README", "src/main.scm"]));
|
|
|
|
var initial_commit_hash = store_commit(
|
|
paths_to_tree(["README", "src/main.scm"]),
|
|
[],
|
|
{name:'Ada', email:'ada@...', date:new Date(1617120803000), timezoneMinutes: +60},
|
|
{name:'Ada', email:'ada@...', date:new Date(1617120803000), timezoneMinutes: +60},
|
|
'Initial commit');
|
|
var initial_commit = h2f(initial_commit_hash);
|
|
|
|
write('proj/.git/refs/heads/main', 'ref: refs/heads/other\n');
|
|
var main_branch = 'proj/.git/refs/heads/main';
|
|
|
|
git_branch('other', initial_commit_hash, true);
|
|
var other_branch = 'proj/.git/refs/heads/other';
|
|
|
|
git_init_head();
|
|
var head = 'proj/.git/HEAD';
|
|
|
|
document.getElementById('example-reference-head-hash').innerText = initial_commit_hash;
|
|
document.getElementById('example-reference-branch-hash').innerText = initial_commit_hash;
|
|
|
|
var previous_names = [ initial_commit ];
|
|
var names = [ initial_commit, main_branch, other_branch, head ];
|
|
return { filesystem: filesystem, names: names, previous_names: previous_names }
|
|
});
|
|
</script>
|
|
</div>
|
|
</section>
|
|
|
|
<section id="git-rev-parse">
|
|
<h1><code>git rev-parse</code></h1>
|
|
<p><code>git rev-parse</code> is another low-level command. It takes a symbolic reference or other reference,
|
|
and returns the hash. The difference with <code>git symbolic-ref</code> is that <code>symbolic-ref</code> follows indirections
|
|
to other references, and returns the last named reference in the chain of indirections, whereas <code>rev-parse</code>
|
|
goes one step further and returns the hash pointed to by the last named reference.</p>
|
|
<textarea>
|
|
function follow_ref(path) {
|
|
return git_rev_parse(trim_newline(read(join_paths(current_directory, path))));
|
|
}
|
|
function git_rev_parse(ref) {
|
|
var symbolic_ref_target = git_symbolic_ref(ref);
|
|
if (symbolic_ref_target) {
|
|
// symbolic ref like "ref: refs/heads/main"
|
|
return git_rev_parse(symbolic_ref_target);
|
|
} else if (/[0-9a-f]{40}/.test(ref)) {
|
|
// hash like "0123456789abcdef0123456789abcdef01234567"
|
|
return ref;
|
|
} else if (ref == 'HEAD') {
|
|
// user-friendly reference like "HEAD"
|
|
return follow_ref('.git/' + ref);
|
|
} else if (ref.startsWith('refs/')
|
|
&& exists(join_paths(current_directory, '.git/' + ref))) {
|
|
// user-friendly reference like "refs/heads/main"
|
|
return follow_ref('.git/' + ref);
|
|
} else if (exists(join_paths(current_directory, '.git/refs/heads/' + ref))) {
|
|
// user-friendly reference like "main" (a branch)
|
|
return follow_ref('.git/refs/heads/' + ref);
|
|
} else if (exists(join_paths(current_directory, '.git/refs/tags/' + ref))) {
|
|
// user-friendly reference like "v1.0" (a branch)
|
|
return follow_ref('.git/refs/tags/' + ref);
|
|
} else {
|
|
// unknown ref
|
|
return false;
|
|
}
|
|
}
|
|
</textarea>
|
|
</section>
|
|
</section>
|
|
|
|
<section id="git-branch">
|
|
<h1><code>git branch</code></h1>
|
|
|
|
<p>A branch is a pointer to a commit, stored in a file in <code>.git/refs/heads/name_of_the_branch</code>.
|
|
The branch can be overwritten with <code>git branch -f</code>. Also, as will be explained later,
|
|
<code>git commit</code> can update the pointer of a branch.</p>
|
|
|
|
<textarea id="in14">
|
|
function git_branch(branch_name, commit_ref, force) {
|
|
var commit_hash = git_rev_parse(commit_ref);
|
|
mkdir(join_paths(current_directory, '.git/refs'));
|
|
mkdir(join_paths(current_directory, '.git/refs/heads'));
|
|
var branch_path = '.git/refs/heads/' + branch_name;
|
|
var full_branch_path = join_paths(current_directory, branch_path);
|
|
if (!force && exists(full_branch_path)) {
|
|
alert("branch already exists");
|
|
return false;
|
|
} else {
|
|
write(full_branch_path, commit_hash + '\n');
|
|
return true;
|
|
}
|
|
}
|
|
</textarea>
|
|
|
|
<p>When we call <code>git branch main HEAD</code> or equivalently
|
|
<code>git branch main <span id="example-git-branch-head-hash">0123456789012345678901234567890123456789</span></code>,
|
|
a file containing that hash is created in <code>.git/refs/heads/main</code>. This file acts as a pointer
|
|
to the branch, and this pointer can be read e.g. by <code>git rev-parse</code>.</p>
|
|
|
|
<div id="example-git-branch"></div>
|
|
<script class="example">
|
|
___example('example-git-branch', function() {
|
|
var h2f = function(hash) { return 'proj/.git/objects/'+hash.substring(0,2)+'/'+hash.substring(2); }
|
|
var main = h2f(hash_object(true, 'blob', false, 'src/main.scm'));
|
|
var readme = h2f(hash_object(true, 'blob', false, 'README'));
|
|
var src = h2f(store_tree("src", ["main.scm"], []));
|
|
var proj = h2f(paths_to_tree(["README", "src/main.scm"]));
|
|
|
|
var initial_commit_hash = store_commit(
|
|
paths_to_tree(["README", "src/main.scm"]),
|
|
[],
|
|
{name:'Ada', email:'ada@...', date:new Date(1617120803000), timezoneMinutes: +60},
|
|
{name:'Ada', email:'ada@...', date:new Date(1617120803000), timezoneMinutes: +60},
|
|
'Initial commit');
|
|
var initial_commit = h2f(initial_commit_hash);
|
|
|
|
git_branch('main', initial_commit_hash, true);
|
|
var main_branch = 'proj/.git/refs/heads/main';
|
|
|
|
//git_init_head();
|
|
//var head = 'proj/.git/HEAD';
|
|
|
|
document.getElementById('example-git-branch-head-hash').innerText = initial_commit_hash;
|
|
|
|
var previous_names = [ main, readme, src, proj, initial_commit ];
|
|
var names = [ main, readme, src, proj, initial_commit, main_branch ];
|
|
return { filesystem: filesystem, names: names, previous_names: previous_names }
|
|
});
|
|
</script>
|
|
|
|
<p>After creating the branch, we show how the file <code>.git/refs/heads/main</code> can be overwritten
|
|
using <code>git branch -f</code></p>
|
|
|
|
<textarea id="inex14">
|
|
// git branch main 0123456789012345678901234567890123456789
|
|
git_branch('main', initial_commit, false);
|
|
|
|
// git branch -f main 0123456789012345678901234567890123456789
|
|
git_branch('main', initial_commit, true);
|
|
</textarea>
|
|
</section>
|
|
|
|
<section id="HEAD">
|
|
<h1><code>HEAD</code></h1>
|
|
<p>
|
|
The <code>HEAD</code> indicates the "current" commit. It is set at first as part of the <code>git init</code> routine.
|
|
</p>
|
|
<textarea id="in15">
|
|
function git_init_head() {
|
|
write(join_paths(current_directory, '.git/HEAD'), 'ref: refs/heads/main\n');
|
|
}
|
|
|
|
git_init_head();
|
|
</textarea>
|
|
|
|
<p>
|
|
Usually, the <code>HEAD</code> is a symbolic reference to a branch, i.e. the
|
|
file <code>.git/HEAD</code> contains <code>ref: refs/heads/name-of-branch</code>.
|
|
When checking out a commit by specifying its hash directly, or when checking out
|
|
a non-branch reference, the file <code>.git/HEAD</code> contains the hash of the
|
|
commit instead.
|
|
</p>
|
|
|
|
<p>
|
|
The state in which <code>.git/HEAD</code> contains a commit hash is called
|
|
"detached HEAD", and often sounds alarming to people who have not encountered this
|
|
before. As we will see in the following sections, the only difference between detached
|
|
HEAD and the normal state is that <code>git commit</code> updates the branch to point
|
|
to the new commit in the normal mode of operation. When the <code>HEAD</code> is detached,
|
|
it does not point to a specific branch, and <code>git commit</code> updates the HEAD
|
|
directly instead, overwriting it with the new commit hash.
|
|
</p>
|
|
|
|
<p>
|
|
Since the HEAD is supposed to be a transient pointer, it is easy to lose track of the hash of
|
|
an important commit. For example, the following sequence of operations:
|
|
|
|
<pre>
|
|
git checkout 0123456789abcdef0123456789abcdef01234567
|
|
|
|
touch new_file
|
|
git add new_file
|
|
git commit -m 'This is a commit adding a new file'
|
|
|
|
git checkout branch-of-feature-foobar
|
|
</pre>
|
|
|
|
roughly means:
|
|
|
|
<pre>
|
|
HEAD = 0123456789abcdef0123456789abcdef01234567
|
|
// overwrite the contents of the working tree with
|
|
// the contents of commit 0123456789abcdef0123456789abcdef01234567
|
|
checkout(0123456789abcdef0123456789abcdef01234567)
|
|
|
|
// create commit with the new file:
|
|
HEAD = commit(…)
|
|
|
|
// Checkout other branch
|
|
HEAD = git_rev_parse('branch-of-feature-foobar')
|
|
</pre>
|
|
</p>
|
|
|
|
<p>
|
|
The hash of the new commit which is stored in HEAD on the second step is overwritten
|
|
in the third step. In order to later retrieve that specific version with the precious
|
|
new_file, one needs that hash. It would be possible to note down these hashes in a
|
|
simple text file, but GIT offers a mechanism for that: branches. After all, branches are
|
|
merely named text files containing the hash of the latest commit in that line of work.
|
|
</p>
|
|
|
|
<p>
|
|
The hash of a commit created with <code>git commit</code> does not only exist in the
|
|
HEAD file (when in detached HEAD) or in the current branch file (normal mode). The official
|
|
implementation of GIT keeps a log of the changes being made to the various references.
|
|
<code>.git/logs/HEAD</code> contains a log of the hashes pointed to by <code>.git/HEAD</code>,
|
|
and <code>.git/logs/refs/heads/main</code> contains a log of the hashes pointed to by
|
|
<code>.git/refs/heads/main</code>, and the commands <code>git reflog</code> and
|
|
<code>git reflog main</code> pretty-print these files.
|
|
</p>
|
|
|
|
<p>
|
|
There are a few more ways to find a lost commit hash, including a careful invocation of
|
|
<code>git fsck</code> which checks that the files stored in <code>.git/</code> are not
|
|
corrupted, and that no reference (to another reference or a commit, tree or blob) points
|
|
to a non-existing file. The <code>git fsck --unreachable</code> option tells this command
|
|
to print all object hashes which are not pointed to indirectly by any named reference
|
|
(so-called unreachable objects, which are well-formed but are not indirectly linked to
|
|
from a branch or other kind of named pointer).
|
|
</p>
|
|
|
|
<p>
|
|
The reflog can be used to recover a lost hash but handling hashes manually like this is
|
|
somewhat error-prone, and most new users are not aware of those features; for this reason
|
|
GIT commands tend to display a warning when switching to a detached HEAD state.
|
|
</p>
|
|
</section>
|
|
|
|
|
|
<section id="git-config">
|
|
<h1>git config</h1>
|
|
<p>
|
|
The official implementation of GIT stores the settings in various files (<code>.git/config</code> within a repository,
|
|
<code>~/.gitconfig</code> in the user's home folder, and several other places).
|
|
</p>
|
|
<textarea id="in16">
|
|
var gitconfig = {
|
|
user: {
|
|
name: 'Ada Lovelace',
|
|
email: 'ada@analyti.cal',
|
|
}
|
|
};
|
|
var $EDITOR = function() { return window.prompt('Commit message:'); }
|
|
</textarea>
|
|
<p>
|
|
These files use a <code>.ini</code> syntax
|
|
with <code>key = value</code> lines grouped under some <code>[section]</code> headings. The configuration above could be
|
|
stored in <code>~/.gitconfig</code> or <code>.git/config</code> using the following syntax:
|
|
</p>
|
|
<pre>
|
|
[user]
|
|
name = Ada Lovelace
|
|
email = ada@analyti.cal
|
|
</pre>
|
|
<p>
|
|
The <code>$EDITOR</code> variable is a traditional *NIX environment variable, and could e.g. be declared with
|
|
<code>EDITOR=nano</code> in <code>~/.profile</code> or <code>~/.bashrc</code>.
|
|
</p>
|
|
</section>
|
|
|
|
<section id="git-commit">
|
|
<h1><code>git commit</code></h1>
|
|
|
|
<p>
|
|
The <code>git commit</code> command stores a commit (metadata and a pointer to a tree
|
|
containing the files given on the command-line), and updates the <code>HEAD</code> or
|
|
current branch to point to the new commit.
|
|
</p>
|
|
<textarea>
|
|
function git_commit(file_paths, message) {
|
|
var now = new Date();
|
|
var timestamp = (+now)/1000;
|
|
var timezoneMinutes = -(now.getTimezoneOffset());
|
|
|
|
var parent = git_rev_parse('HEAD');
|
|
var parents = parent ? [parent] : []
|
|
|
|
var new_commit_hash = store_commit(
|
|
paths_to_tree(file_paths),
|
|
parents,
|
|
{name:gitconfig.user.name, email:gitconfig.user.email, date:now, timezoneMinutes:timezoneMinutes },
|
|
{name:gitconfig.user.name, email:gitconfig.user.email, date:now, timezoneMinutes:timezoneMinutes },
|
|
message || $EDITOR());
|
|
|
|
advance_head_or_branch(new_commit_hash);
|
|
|
|
return new_commit_hash;
|
|
}
|
|
</textarea>
|
|
|
|
<p>If the <code>HEAD</code> points to a commit hash, then <code>git commit</code> updates the <code>HEAD</code> to point to the new commit.
|
|
Otherwise, when the <code>HEAD</code> points to a branch, then the target branch (represented by a file named <code>.git/refs/heads/the_branch_name</code>) is updated.</p>
|
|
|
|
<textarea>
|
|
function advance_head_or_branch(new_commit_hash) {
|
|
var referenced_branch = git_symbolic_ref('HEAD');
|
|
if (referenced_branch) {
|
|
// Update the target of the ref:
|
|
write(join_paths(current_directory, '.git/' + referenced_branch), new_commit_hash + '\n');
|
|
} else {
|
|
// Detached HEAD, update .git/HEAD directly.
|
|
write(join_paths(current_directory, '.git/HEAD'), new_commit_hash + '\n');
|
|
}
|
|
}
|
|
</textarea>
|
|
|
|
<p>
|
|
The official implementation of <code>git commit</code> makes use of <a href="#index">the index</a>.
|
|
When a file is scheduled for the next commit using <code>git add path/to/file</code>, it is added to
|
|
the index. The index is a representation of a collection of copies of files, which can efficiently be
|
|
compared to the working tree. It uses a different representation, but its role is very similar
|
|
to that of a tree object along with the subtrees and blob objects of individual files. When
|
|
<code>git commit</code> is called without specifying any files, it creates a commit containing the
|
|
version of the files stored in the index.
|
|
</p>
|
|
<p>
|
|
In this simplified implementation, we only support creating commits by specifying all the files that
|
|
must be present in the commit (including unchanged files). This contrasts with the official implementation
|
|
which would create a tree containing the files from the current HEAD, as well as the added, modified or
|
|
deleted files specified by <code>git add</code> or specified directly on the <code>git commit</code>
|
|
command-line.
|
|
</p>
|
|
|
|
<textarea>
|
|
write('proj/README', 'This is my Scheme project -- with updates!');
|
|
var second_commit = git_commit(['README', 'src/main.scm'], 'Some updates');
|
|
</textarea>
|
|
</section>
|
|
|
|
<section id="git-tag">
|
|
<h1><code>git tag</code></h1>
|
|
<p>Tags behave like branches, but are stored in <code>.git/refs/tags/the_tag_name</code>
|
|
and a tag is not normally modified. Once created, it's supposed to always point
|
|
to the same version.</p>
|
|
<p>GIT does offer a <code>git tag -f existing-tag new-hash</code> command,
|
|
but using it should be a rare occurrence.</p>
|
|
<textarea id="in17">
|
|
function git_tag(tag_name, commit_hash, force) {
|
|
mkdir(join_paths(current_directory, '.git/refs'));
|
|
mkdir(join_paths(current_directory, '.git/refs/tags'));
|
|
if (!force && exists(join_paths(current_directory, '.git/refs/tags/' + tag_name))) {
|
|
alert("tag already exists");
|
|
return false;
|
|
} else {
|
|
write(join_paths(current_directory, '.git/refs/tags/' + tag_name), commit_hash + '\n');
|
|
return true;
|
|
}
|
|
}
|
|
</textarea>
|
|
<p>Intuitively, tags differ from branches in the following way: when checking out a branch,
|
|
and a subsequent commit is made, the branch is updated to point to the new commit's hash.
|
|
As we've seen in the implementation of <code>git commit</code>, the difference is actually
|
|
in the contents of the <code>.git/HEAD</code> file. If it is a symbolic reference (generally
|
|
a pointer to a branch), then the target of that reference is updated every time a new commit
|
|
is created. If the <code>.git/HEAD</code> file contains the hash of a commit, then the
|
|
<code>.git/HEAD</code> file itself is updated every time a new commit is created.
|
|
</p>
|
|
<p>
|
|
Therefore, tags and branches differ only in their usage and in the path under which they are
|
|
stored (<code>.git/refs/heads/name-of-the-branch</code> vs. <code>.git/refs/tags/name-of-the-tag</code>).
|
|
The file <code>.git/HEAD</code> is overwritten by <code>git commit</code> and <code>git checkout</code>.
|
|
It is the latter command which will behave differently for tags and branches; <code>git checkout branch-name</code>
|
|
turns the HEAD into a symbolic reference, whereas <code>git checkout tag-name</code> resolves the tag name to
|
|
a commit hash, and writes that hash directly into <code>.git/HEAD</code>.
|
|
</p>
|
|
<textarea id="inex17">
|
|
// git tag v1.0 0123456789012345678901234567890123456789
|
|
git_tag('v1.0', second_commit);
|
|
</textarea>
|
|
</section>
|
|
|
|
<section id="git-checkout">
|
|
<h1><code>git checkout</code></h1>
|
|
<p>
|
|
The <code>git checkout commit-hash-or-reference</code> command modifies the HEAD to point to the given commit,
|
|
and modifies the working tree to match the contents of the tree object pointed to by that commit.
|
|
</p>
|
|
<textarea id="in18">
|
|
function git_checkout(tag_or_branch_or_hash) {
|
|
if (exists(join_paths(current_directory, '.git/refs/heads/' + tag_or_branch_or_hash))) {
|
|
// Normal (attached) HEAD, points to 'ref: refs/heads/the_branch_name'
|
|
write(join_paths(current_directory, '.git/HEAD'), 'ref: refs/heads/' + tag_or_branch_or_hash + '\n');
|
|
} else {
|
|
// Detached HEAD, points directly to commit hash
|
|
write(join_paths(current_directory, '.git/HEAD'), git_rev_parse(tag_or_branch_or_hash) + '\n');
|
|
}
|
|
checkout_files(git_rev_parse('HEAD'));
|
|
}
|
|
</textarea>
|
|
<section id="checkout-branch-vs-other">
|
|
<h1>Checkout, branches and other references</h1>
|
|
<p>The HEAD does not normally point to a tag. Although nothing actually
|
|
prevents writing <code>ref: refs/tags/v1.0</code> into <code>.git/HEAD</code>, the GIT
|
|
commands will not automatically do this. For example, <code>git checkout tag-or-branch-or-hash</code>
|
|
will put a symbolic <code>ref: </code> in <code>.git/HEAD</code> only if the argument is a branch.</p>
|
|
</section>
|
|
|
|
<section id="checkout-files">
|
|
<h1>Checking out files</h1>
|
|
<p>
|
|
In order to replace the contents of the working tree with those of the given commit, we
|
|
recursively compare the subtrees, deleting from the working tree the files or directories
|
|
that are not present in the tree object, and overwriting the others.
|
|
</p>
|
|
<p>
|
|
The official implementation of GIT will record the diff between the current working tree
|
|
and the current commit, and will re-apply these changes on top of the freshly checked-out commit.
|
|
The official <code>git checkout</code> command will print warnings and refuse to proceed when
|
|
these changes cannot be re-applied without conflict, encouraging the user to create a commit
|
|
containing this updated version or to stash the changes (effectively creating a temporary commit
|
|
containing this version, pointed to by <code>.git/refs/stash</code>). Our simple implementation
|
|
will always overwrite the changes.
|
|
</p>
|
|
<textarea>
|
|
function checkout_files(hash) {
|
|
var commit = parse_commit(hash);
|
|
checkout_tree(current_directory, commit.tree);
|
|
}
|
|
|
|
function checkout_tree(path_prefix, hash) {
|
|
var entries = parse_tree(hash);
|
|
var entries_names = entries.map(function (entry) { return entry.name; });
|
|
|
|
var working_directory_contents = listdir(path_prefix);
|
|
|
|
for (var i = 0; i < working_directory_contents.length; i++) {
|
|
if (entries_names.indexOf(working_directory_contents[i]) == -1
|
|
&& working_directory_contents[i] != '.git') {
|
|
// The file or directory exists in the working tree, but
|
|
// not in the commit that is being checked out, remove it recursively.
|
|
remove(join_paths(path_prefix, working_directory_contents[i]), true);
|
|
}
|
|
}
|
|
|
|
for (var i = 0; i < entries.length; i++) {
|
|
var o = parse_object(entries[i].hash);
|
|
var entry_path = join_paths(path_prefix, entries[i].name);
|
|
if (o.type == 'blob') {
|
|
write(entry_path, o.contents);
|
|
} else {
|
|
checkout_tree(entry_path, entries[i].hash)
|
|
}
|
|
}
|
|
}
|
|
</textarea>
|
|
</section>
|
|
|
|
<section id="parse-assert">
|
|
<h1>Assert</h1>
|
|
<p>
|
|
The <code>checkout_tree()</code> function needs to read the commit, tree and blob objects from the
|
|
<code>.git/</code> folder. The following sections will introduce some parsers for these objects.
|
|
The parsers will check that their input looks reasonably well-formed, using <code>assert()</code>.</p>
|
|
<textarea>
|
|
function assert(boolean, text) {
|
|
if (! boolean) { alert("GIT: assertion failed: " + text);
|
|
throw new Error("GIT: assertion failed: " + text); }
|
|
}
|
|
</textarea>
|
|
</section>
|
|
|
|
<section id="parsed-compressed">
|
|
<h1>Reading compressed objects</h1>
|
|
<p>The GIT objects which are stored in <code>.git/objects</code> are compressed with <code>zlib</code>, and need to be
|
|
uncompressed before they can be parsed. The actual implementation of GIT also stores some objects in <em>packs</em>. Packs
|
|
contain a large number of objects, and used a form of delta compression, which effectively stores objects as the diff with
|
|
another similar object, in order to optimize the disk space usage.</p>
|
|
<p>Our simplified implementation only deals with zlib-compressed objects, and cannot read from pack files. The function below
|
|
extracts the type and length, which form the header present in all objects, and returns those along with the contents of the
|
|
object.
|
|
</p>
|
|
<textarea>
|
|
function parse_object(hash) {
|
|
var compressed = read(join_paths(current_directory, '.git/objects/' + hash.substring(0,2) + '/' + hash.substring(2)));
|
|
var inflated = inflate(compressed);
|
|
var split = inflated.match(/^([\s\S]*?) ([\s\S]*?)\0([\s\S]*)$/);
|
|
|
|
assert(split, "ill-formed object");
|
|
var type = split[1];
|
|
var length = split[2];
|
|
var contents = split[3];
|
|
assert(contents.length == length, "object has incorrect length");
|
|
|
|
return { type: type, length: length, contents: contents };
|
|
}
|
|
</textarea>
|
|
</section>
|
|
|
|
<section id="parse-tree">
|
|
<h1>Parsing tree objects</h1>
|
|
<p>We will start by parsing tree objects. As a reminder, a tree object has the following form:</p>
|
|
<div id="example-tree-objects-parse"></div>
|
|
<script class="example">
|
|
___example('example-tree-objects-parse', function() {
|
|
var main = ___h2f(hash_object(true, 'blob', false, 'src/main.scm'));
|
|
var readme = ___h2f(hash_object(true, 'blob', false, 'README'));
|
|
var src = ___h2f(store_tree("src", ["main.scm"], []));
|
|
var proj = ___h2f(paths_to_tree(["README", "src/main.scm"]));
|
|
var previous_names = [ ];
|
|
var names = [ proj ];
|
|
return { filesystem: filesystem, names: names, previous_names: previous_names, omit_graph: true };
|
|
});
|
|
</script>
|
|
<p>
|
|
After the object header, we have a mode, a filename, a null byte and a hash consisting of 20 bytes.
|
|
The null byte cannot appear in the mode or filename, so we use this null + hash as a delimiter
|
|
(the non-greedy match ensures the null byte terminator will not match with a <code>00</code> byte in the hash)
|
|
</p>
|
|
<textarea>
|
|
function parse_tree(hash) {
|
|
var tree = parse_object(hash);
|
|
var i = 0;
|
|
var entries = [];
|
|
while (i < tree.contents.length) {
|
|
// skip to the null terminator
|
|
var space_offset = tree.contents.indexOf(' ', i);
|
|
var null_offset = tree.contents.indexOf('\0', i);
|
|
|
|
// add 20 bytes for the hash that follows, and check the object isn't shorter than that
|
|
if (space_offset < null_offset && null_offset + 20 < tree.contents.length) {
|
|
var mode = tree.contents.substring(i, space_offset);
|
|
var name = tree.contents.substring(space_offset+1, null_offset);
|
|
var hash = to_hex(tree.contents.substring(null_offset + 1, null_offset + 1 + 20));
|
|
entries.push({ mode: mode, name: name, hash: hash });
|
|
} else {
|
|
assert(false, 'invalid contents of tree object');
|
|
}
|
|
|
|
i = null_offset + 20 + 1;
|
|
}
|
|
return entries;
|
|
}
|
|
</textarea>
|
|
|
|
<p>
|
|
The <code>parse_tree</code> function above needs a small utility to convert hashes represented using
|
|
raw bytes to a hexadecimal representation.
|
|
</p>
|
|
<textarea id="in19">
|
|
function to_hex(bin) {
|
|
var bin = String(bin);
|
|
var hex = "";
|
|
for (var i = 0; i < bin.length; i++) {
|
|
hex += left_pad(bin.charCodeAt(i).toString(16), '0', 2);
|
|
}
|
|
return hex;
|
|
}
|
|
</textarea>
|
|
</section>
|
|
|
|
<section id="parse-commit">
|
|
<h1>Parsing commit objects</h1>
|
|
<p>The following function is fairly long, but only parses lines of the form <code>header-name header-value</code>
|
|
(with some restrictions depending on the header), followed by a blenk line, and a free-form description.</p>
|
|
<textarea>
|
|
function parse_commit(hash) {
|
|
var commit = parse_object(hash);
|
|
var lines = commit.contents.split('\n');
|
|
var tree = null;
|
|
var parents = [];
|
|
var author = null;
|
|
var committer = null;
|
|
var i;
|
|
// A blank line separates the headers from the message.
|
|
for (i = 0; i < lines.length && lines[i] != ''; i++) {
|
|
var split = lines[i].match(/^(.*?) (.*)$/);
|
|
assert(split, "ill-formed commit header: " + lines[i]);
|
|
var header = split[1];
|
|
var value = split[2];
|
|
switch (header) {
|
|
case 'tree':
|
|
assert(!tree, 'duplicate tree header in commit');
|
|
assert(/^[0-9a-f]{40}$/.test(value), "invalid tree header in commit");
|
|
tree = value;
|
|
break;
|
|
case 'parent':
|
|
assert(/^[0-9a-f]{40}$/.test(value), "invalid parent header in commit");
|
|
parents.push(value);
|
|
break;
|
|
case 'author':
|
|
assert(!author, 'duplicate author header in commit');
|
|
author = parse_author(value, 'author');
|
|
break;
|
|
case 'committer':
|
|
assert(!committer, 'duplicate committer header in commit');
|
|
committer = parse_author(value, 'committer');
|
|
break;
|
|
default: /* unknown field, skipping */ break;
|
|
}
|
|
}
|
|
// The message is everything after the blank line.
|
|
message = lines.splice(i+1).join('\n');
|
|
|
|
assert(tree, 'commit lacks tree header');
|
|
assert(author, 'commit lacks author header');
|
|
assert(committer, 'commit lacks committer header');
|
|
|
|
return {
|
|
tree: tree,
|
|
parents: parents,
|
|
author: author,
|
|
committer: committer,
|
|
message: message
|
|
};
|
|
}
|
|
</textarea>
|
|
</section>
|
|
|
|
<section id="parse-author-committer">
|
|
<h1>Parsing author and committer metadata</h1>
|
|
<p>The author and committer metadata has the form <code>Name <email@domain.tld> timestamp +timezone</code>,
|
|
for example <code>Ada Lovelace <ada@analyti.cal> 1617120803 +0100</code></p>
|
|
<textarea>
|
|
function parse_author(value, field) {
|
|
var split = value.match(/^(.*?) <(.*?)> ([0-9]+) ([+-])([0-9][0-9])([0-9][0-9])$/);
|
|
assert(split, 'ill-formed ' + field)
|
|
var name = split[1];
|
|
var email = split[2];
|
|
var date = new Date(parseInt(split[3], 10) * 1000);
|
|
var timezone_sign = (split[4] == '+' ? 1 : -1);
|
|
var timezone_hours = parseInt(split[5], 10);
|
|
var timezone_minutes = parseInt(split[6], 10);
|
|
var timezone = timezone_sign * (timezone_hours * 60 + timezone_minutes);
|
|
return { name: name, email: email, date: date, timzeone: timezone };
|
|
}
|
|
</textarea>
|
|
</section>
|
|
|
|
<section id="checkout-example">
|
|
<h1>Example checkout</h1>
|
|
<p>
|
|
Now that we can parse blobs objects, trees, and commits, it is now possible to checkout a given commit.
|
|
The following operation will revert the working tree to the state that was copied in the initial commit.
|
|
</p>
|
|
<textarea id="in20">
|
|
git_checkout(initial_commit);
|
|
</textarea>
|
|
</section>
|
|
</section>
|
|
|
|
<section id="git-init">
|
|
<h1><code>git init</code></h1>
|
|
<p>The <code>git init</code> command creates the <code>.git</code> directory and points <code>.git/HEAD</code>
|
|
to the default branch (a file which does not exist yet, as this branch does not contain any commit at this point).</p>
|
|
<textarea id="in21">
|
|
function git_init() {
|
|
git_init_mkdir();
|
|
git_init_head();
|
|
}
|
|
</textarea>
|
|
</section>
|
|
|
|
<section id="index">
|
|
<h1>The index</h1>
|
|
<p>When adding files with <code>git add</code>, GIT does not immediately create a commit object.
|
|
Instead, it adds the files to the index, which uses a binary format with lots of metadata.
|
|
The mock filesystem used here lacks most of these pieces of information, so the value <code>0</code>
|
|
will be used for most fields. See <a href="https://mincong.io/2018/04/28/git-index/">this blog post</a>
|
|
for a more in-depth study of the index.</p>
|
|
<textarea id="index-raw-bytes-utils">
|
|
function raw_bytes(val, bytes) {
|
|
return hex_to_raw_bytes(left_pad(val.toString(16), '0', bytes*2));
|
|
}
|
|
|
|
function raw_bytes16(val) { return raw_bytes(val, 2); }
|
|
function raw_bytes32(val) { return raw_bytes(val, 4); }
|
|
function raw_bytes64(val) { return raw_bytes(val, 8); }
|
|
</textarea>
|
|
|
|
<textarea id="make-index">
|
|
function store_index(paths) {
|
|
var magic = 'DIRC' // DIRectory Cache
|
|
var version = raw_bytes32(2);
|
|
var entries = raw_bytes32(paths.length);
|
|
var header = magic + version + entries;
|
|
|
|
index = header;
|
|
|
|
for (var i = 0; i < paths.length; i++) {
|
|
var ctime = raw_bytes64(0);
|
|
var mtime = raw_bytes64(0);
|
|
var device = raw_bytes32(0);
|
|
var inode = raw_bytes32(0);
|
|
// default permissions for files, in octal.
|
|
var mode = raw_bytes32(0100644);
|
|
var uid = raw_bytes32(0);
|
|
var gid = raw_bytes32(0);
|
|
var size = raw_bytes32(read(join_paths(current_directory, paths[i])).length);
|
|
var hash = hex_to_raw_bytes(hash_object(true, 'blob', false, paths[i]));
|
|
// for this simple index, the flags (the 4 higher bits) are 0.
|
|
assert(paths[i].length < 0xfff)
|
|
var flags_and_file_path_length = raw_bytes16(paths[i].length)
|
|
var file_path = paths[i] + '\0';
|
|
entry = ctime + mtime + device + inode + mode + uid + gid + size
|
|
+ hash + flags_and_file_path_length + file_path;
|
|
while (entry.length % 8 != 0) {
|
|
// pad with null bytes to a multiple of 8 bytes (64-bits).
|
|
entry += '\0';
|
|
}
|
|
|
|
index += entry;
|
|
}
|
|
|
|
index += hex_to_raw_bytes(sha1_from_bytes_returns_hex(index));
|
|
|
|
write(join_paths(current_directory, '.git/index'), index)
|
|
}
|
|
</textarea>
|
|
</section>
|
|
|
|
<section id="playground">
|
|
<h1>Playground</h1>
|
|
<p>The implementation is now sufficiently complete to create a small repository.</p>
|
|
<textarea id="playground-reset">
|
|
// Reset the filesystem to its initial state
|
|
filesystem = {};
|
|
current_directory = '';
|
|
</textarea>
|
|
|
|
<textarea id="playground-play">
|
|
mkdir('proj');
|
|
cd('proj');
|
|
write('proj/README', 'This is my implementation of GIT.\n');
|
|
mkdir('proj/src');
|
|
write('proj/src/main.scm', "(define filesystem '())\n...\n");
|
|
|
|
git_init();
|
|
git_commit(['README', 'src/main.scm'], 'A well-understood initial commit.');
|
|
|
|
git_branch('dev', 'HEAD');
|
|
git_checkout('dev');
|
|
|
|
write('proj/src/main.scm', "(define filesystem '())\n(define current_directory \"\")\n");
|
|
git_commit(['README', 'src/main.scm'], 'What an update!');
|
|
|
|
git_checkout('main');
|
|
|
|
// update the cache of the working tree. Without this,
|
|
// GIT finds an empty cache, and thinks all files are scheduled
|
|
// for deletion, until "git add ." allows it to realize that
|
|
// the working tree matches the contents of HEAD.
|
|
store_index(['README', 'src/main.scm']);
|
|
</textarea>
|
|
|
|
<p>By clicking on "Copy commands to recreate in *nix terminal.", it is possible to copy a series of <code>mkdir …</code> and <code>printf … > …</code> commands that, when executed, will recreate the virtual filesystem on a real system. The resulting
|
|
folder is bit-compatible with the official <code>git log</code>, <code>git status</code>, <code>git checkout</code> etc.
|
|
commands.</p>
|
|
</section>
|
|
|
|
<section id="suggested-exercises">
|
|
<h1>Suggested exercises</h1>
|
|
<p>
|
|
The reader willing to improve their grasp of GIT's mental model, and reduce their reliance on a few learned recipies, might
|
|
be interested in the following warm-up exercises:
|
|
</p>
|
|
|
|
<section class="exercise" id="exercise-cat-file">
|
|
<h1>Inspection using <code>git cat-file</code></h1>
|
|
<p class="exercise-task">
|
|
Inspect an existing repository, starting with <code>cat .git/HEAD</code> and using <code>git cat-file -p some-hash</code>
|
|
to pretty-print an object given its hash.
|
|
</p>
|
|
<p class="exercise-reason">
|
|
This will help sink in the points explained in this tutorial, and give a better
|
|
understanding of the internals of GIT. This knowledge is helpful for day-to-day tasks, as the GIT commands usually perform
|
|
simple changes to this internal representation. Understanding the representation better can demistify the semantics of
|
|
the daily GIT commands. Furthermore, equipped with a better understanding of GIT's implementation, the dreamy reader will
|
|
be tempted to compare this lack of intrinsic complexity with the apparent complexity, and be entitled to expect a better,
|
|
less arcane user interface for a tool with such a simple implementation.
|
|
</p>
|
|
</section>
|
|
<section class="exercise" id="exercise-files-in-dot-git">
|
|
<h1>Inspection of the files in <code>.git/</code></h1>
|
|
<p class="exercise-task">
|
|
Inspect a small existing repository, starting with <code>cat .git/HEAD</code> and using the <code>zlib</code> decompression
|
|
tool from the <a href=#zlib-compression-note><code>zlib</code> compression</a> section. Larger repositories will make use
|
|
of GIT packs, which are compressed archives containing a number of objects. GIT packs only matter as an optimization of the
|
|
disk space used by large repositories, but other tools would be necessary to inspect those.
|
|
</p>
|
|
<p class="exercise-reason">
|
|
This should help understand
|
|
the internal representation of GIT commits and branches, and should help having a instinctive idea of how the data store is
|
|
modified by the various commands. This in turn could come in handy in case of apparent data loss (a lost stash or a checkout
|
|
leaving an unreferenced commit on a detached HEAD), as this would help understand the work done by the various
|
|
disaster-recovery one-liners that a quick panicked online search provides.
|
|
</p>
|
|
</section>
|
|
<section class="exercise" id="exercise-repo-from-statch">
|
|
<h1>Creating a repository from scratch</h1>
|
|
<p class="exercise-task">
|
|
Run <code>git init new-directory</code> in a terminal, and create an initial single-file commit from scratch, using only
|
|
<code>git hash-object</code>, <code>printf</code> and overwriting <code>.git/HEAD</code> and/or
|
|
<code>.git/refs/heads/name-of-a-branch</code>. This will involve retracing the steps in this tutorial to create a blob
|
|
object for the file, a tree object to be the directory containing just that file, and a commit object.
|
|
</p>
|
|
<p class="exercise-reason">
|
|
This exercise should
|
|
help sink in the feeling that the internal representation of GIT commits is not very complex, and that many commands with
|
|
convoluted options have very simple semantics. For example, <code>git reset --soft other-commit</code> is little more than
|
|
writing that other commit's hash in <code>.git/refs/heads/name-of-the-current-branch</code> or <code>.git/HEAD</code>.
|
|
Furthermore, equipped with an even better understanding of GIT's implementation, the dreamy reader will
|
|
be tempted to compare this lack of intrinsic complexity with the sheer complexity of the systems they are working with on
|
|
a day-to-day basis, and be entitled to expect better features in a versioning tool. After all, writing those
|
|
<span class="loc-count">few</span> lines of code to reimplement the core of a versioning tool shouldn't take more than a
|
|
couple of afternoons, surely our community can do better?
|
|
</p>
|
|
</section>
|
|
<section class="exercise" id="exercise-only-basic-commands">
|
|
<h1>Using only basic GIT commands</h1>
|
|
<p class="exercise-task">
|
|
For a couple of weeks, only use the GIT commands <code>commit</code>, <code>diff</code>, <code>checkout</code>,
|
|
<code>merge</code>, <code>cherry-pick</code>, <code>log</code>, <code>clone</code>, <code>fetch</code> and
|
|
<code>push remote hash-of-commit:refs/heads/name-of-the-branch</code>. In particular, don't use <code>rebase</code>
|
|
which is just a wrapper around a sequence of <code>cherry-pick</code> commands, don't use <code>pull</code> which is
|
|
just a wrapper around <code>fetch</code> and <code>merge</code>, don't use <code>git push</code> as-is and instead
|
|
explicitly give the name (origin) or URL of the remote, the hash of the commit to push, and the path that should be
|
|
updated on the remote (<code>git push</code> while the <code>main</code> branch is checked out locally is equivalent
|
|
to <code>git push origin HEAD:refs/heads/main</code>, where <code>HEAD</code> can be replaced by the actual hash of
|
|
the commit).
|
|
</p>
|
|
<p class="exercise-reason">
|
|
This should help sink in the feeling that the internals of GIT are very simple (most of these commands
|
|
are implemented in this tutorial, and the other ones are merely wrappers around enhanced versions of the *NIX commands
|
|
<code>diff</code>, <code>patch</code> and <code>scp</code>), and that the rest of the GIT toolkit consists mostly of
|
|
convenience wrappers to help seasoned users perform common tasks more efficiently.
|
|
</p>
|
|
</section>
|
|
<section class="exercise" id="exercise-commits-are-copies">
|
|
<h1>Understanding commits as copies of the root directory</h1>
|
|
<p class="exercise-task">
|
|
Try not even using <code>git cherry-pick</code> or <code>git diff</code> a few times, instead make two copies the git
|
|
directoy, check out the two different commits in each copy, and use the traditional *NIX commands <code>diff</code> and
|
|
<code>patch</code>.
|
|
</p>
|
|
<p class="exercise-reason">
|
|
This should help sink in the feeling that commits are not diffs, but are actual (deduplicated)
|
|
copies of the entire project directory. GIT commits are quite similar to the age-old manual versioning technique of
|
|
copying the entire directory under a new name at each version, except that the metadata keeps track of which version
|
|
was the previous one (or which versions were merged together to obtain the new one), and the deduplication avoids
|
|
excessive space usage, as would be the case with <code>cp --reflink</code> on a filesystem supporting Copy-On-Write (COW).
|
|
</p>
|
|
</section>
|
|
<section class="exercise" id="exercise-branches-as-pointers">
|
|
<h1>Branches as pointers: living without branches</h1>
|
|
<p class="exercise-task">
|
|
For a couple of weeks, don't use any local branch, and stay in detached HEAD state all the time. When checking out a
|
|
colleague's work, use <code>git fetch && git checkout origin/remote-branch</code>, and use the reflog and a text file
|
|
outside of the repository to keep track of the latest commit in a current "branch" instead of relying on GIT.
|
|
</p>
|
|
<p class="exercise-reason">
|
|
This should help sink in the feeling that branches are not containers in which commits pile up, but are merely pointers
|
|
to the latest commit that are automatically updated.
|
|
</p>
|
|
</section>
|
|
</section>
|
|
|
|
<section id="conclusion">
|
|
<h1>Conclusion</h1>
|
|
<p>This article shows that a large part of the core of GIT can be re-implemented in <span class="loc-count">a few</span> source lines of code* (<a href="javascript:___copy_all_code(); void(0);">copy all the code</a>).
|
|
<span style="font-size: small">* empty lines and single closing braces excluded, <span class="loc-count-total">a few more</span> in total.</span></p>
|
|
<div id="copy-all-code" style="display: none;"></div>
|
|
<ul>
|
|
|
|
</ul>
|
|
<li>Some of the features which may appear mysterious at first sight (e.g. detached HEAD) should be clearer with the knowledge of how GIT works behind the scenes.</li>
|
|
<li>Furthermore, branches are often associated with an intuition (containers into which commits are added) which does not match the implementation (mutable pointers to commits).</li>
|
|
<li>Finally, it is tempting to think of commits as patches. While <code>darcs</code> tries to expose an interface which matches this intuition, it is clear that the implementation of GIT considers commits as copies of the entire repository, and are linked to the previous version solely by the <code>parent</code> metadata in the commit headers.</li>
|
|
</ul>
|
|
<p>A few core commands like <code>git diff</code> and <code>git apply</code> are not described in this tutorial.
|
|
They are little more than improved versions of the classical *nix commands <code>diff</code> and <code>patch</code>.</p>
|
|
<p>Most other commands provided by GIT are merely convenience wrappers around these commands. For example, <code>git cherry-pick</code> is simply a combination of <code>git diff</code> between the tree of a commit and the tree of its parent, followed by <code>git apply</code> to apply the patch and <code>git commit</code> to create a new commit whose diff is equivalent to the diff of the original commit. As an other example, the command <code>git rebase</code> performs as succession of <code>cherry-pick</code> operations.</p>
|
|
<p>By keeping in mind the internal model of GIT, it becomes easier to understand the usual commands and their quirks. By undersanding the design philosophy behind the implementation, the day-to-day usage can become, hopefully, less surprising.</p>
|
|
</section>
|
|
|
|
<div id="toc"></div>
|
|
</article>
|
|
|
|
<script>
|
|
(function() {
|
|
var script = ___script_log_header;
|
|
var ta = document.getElementsByTagName('textarea');
|
|
for (var j = 0; j < ta.length; j++) {
|
|
if (ta[j] == document.getElementById('playground-reset')) {
|
|
break;
|
|
}
|
|
script += ta[j].value + "\n\n";
|
|
}
|
|
var js = document.getElementsByTagName('script');
|
|
for (var j = 0; j < js.length; j++) {
|
|
if (js[j].className.indexOf('example') != -1) {
|
|
script += js[j].innerText;
|
|
}
|
|
}
|
|
script += '\nfor (var i = 0; i < examples.length; i++) { examples[i](); }';
|
|
eval(script);
|
|
})();
|
|
___git_tutorial_onload()
|
|
</script>
|
|
</body>
|
|
</html>
|