2009-12-07

Building Commits from scratch using Git::PurePerl

Anyone who's tried modifying a git repository with code is likely to have discovered a few problems with the experience.

The most common one I encounter is the result of changes having to be done in the git working directory.

This occurs mostly when there are files in a state of non-commit, or files are present in the directory, in an unrevisioned state, and the same file exists on another branch, in a revisioned state.

Here you have a blocking scenario when you try check out that branch.

You can't checkout the other branch, because of the file system collision.

Also, you have a problem that occurs when 2 processes simultaneously try working with the same code check out, ie: You have a cron job that copies files into the master branch, or a cron job that copies files out of some other branch, etc, etc.., etc, or for some reason, programs which rely on a given file from a given branch being the one it sees all the time.

Generally, the easy way to get around the file system collision fun is with git stash, then git checkout $otherbranch , do what you need to do , git checkout $original , and git stash apply to get it back. That however, is too much of a dance for a human to do when they know everything, let alone a bit of naïve code with a stack load of broken conditional checks. Let alone the code do it right with all the other fun stuff that can occur.

Do it in the Rams

The great thing about git, is you don't *need* to do everything in the working directory. If you know the internals well enough, you can perform the whole commit on the other directory, without ever needing to check it out. This is especially handy when one branch is purely automated generation based on another.

With a helping hand from Git::PurePerl its possible to build everything from scratch without needing to modify files in the working directory. Unfortunately, the documentation is a bit sparse, the whole dist could be a bit enhanced documentation wise, but the good news is, for the most part, it works great, and its structured well enough you can work it out by reading the code.

The Phases of commit generation

A commit is composed of 3 main pieces of data:
Commit Metadata
Comprised of authors ( the author, and the committer), timestamps ( Commit timestamps, Author timestamps ), commit message, and an optional commit parent
A tree object
Every commit refers to a tree object. A tree object essentially is a list of files, and metadata about files. Tree objects can also refer to tree objects, and this forms a sort of directory structure.
A File object
Essentially a blob of data.
You can read how theses objects interact, the conceptual details, their implementation, and how to create them directly in the filesystem via the git command line in the ProGit book, Pro Git, Chapter 9: Git Internals, but this guide will focus more on how to do it with Git::PurePerl, and entirely in-memory.

Injecting the files objects

DIY Commits have to be composed in reverse order, you need to create the files, then create the trees, then create the commits.

use strict;
use warnings;
use Git::PurePerl;
use Git::PurePerl::NewObject::Blob;

my $git = Git::PurePerl->new(
gitdir => '/some/dir/foo/.git' # Or use the less direct 'directory' form.
);

my $blob = Git::PurePerl::NewObject::Blob->new(
content => 'String Of File Content',
);

$git->put_object( $blob );

Congratulations. You just stored a file blob in your git database. Although nothing refers to it at present, its not unlike a detached node in a graph, and the next time somebody calls git prune in the repository, that file blob will vanish again.

Now, lets try that again with a scattering of objects:

use strict;
use warnings;
use Git::PurePerl;
use Git::PurePerl::NewObject::Blob;

my $git = Git::PurePerl->new(
gitdir => '/some/dir/foo/.git' # Or use the less direct 'directory' form.
);

# 10 Blobs please.
my @blobs = map {
my $blob = Git::PurePerl::NewObject::Blob->new(
content => 'String Of File Content, no' . $_ ,
);
} 1 .. 10;

$git->put_object( $_ ) for @blobs;

You'll now note if you execute git prune -n in your working directory, that there are 10 objects that are not attached to anything pending prune. This number should NOT change if you re-run the above code multiple times, as content with identical SHA1's are only added to the data store once.

Building Tree Objects

You now have a scattering of files. Well, more data that represents files. there's no file name or permissions metadata yet. Tree objects connect these files with their names and attributes. A tree is basically a blob with a specific content and format. The content of this blob is a series of entries.

Every entry has 3 parts,
  • mode
  • filename
  • object sha1
.
The sha1 is the sha1 of a thing, either a Blob, or another Tree object. The filename is a given name for that commit. The 'mode' I don't fully understand yet, all I know is Files work with 100644, and Trees work with 040000.

use strict;
use warnings;
use Git::PurePerl;
use Git::PurePerl::NewObject::Blob;
use Git::PurePerl::NewObject::Tree;
use Git::PurePerl::NewDirectoryEntry;

my $git = Git::PurePerl->new(
gitdir => '/some/dir/foo/.git' # Or use the less direct 'directory' form.
);

# 10 Blobs please.
my @blobs = map {
my $blob = Git::PurePerl::NewObject::Blob->new(
content => 'String Of File Content, no' . $_ ,
);
} 1 .. 10;

# only put the first blob on the tree.
my $tree = Git::PurePerl::NewObject::Tree->new(
directory_entries => [
Git::PurePerl::NewDirectoyEntry->new(
mode => 100644,
filename => "FooFile",
sha1 => $blobs[0]->sha1,
)
],
);

# stash blobs
$git->put_object( $_ ) for @blobs;
# stash tree
$git->put_object( $tree );


Now, as with above, there's still no commit these are bound to. They're just floating bits of data.
Also, we probably want a dir or 2.

use strict;
use warnings;
use Git::PurePerl;
use Git::PurePerl::NewObject::Blob;
use Git::PurePerl::NewObject::Tree;
use Git::PurePerl::NewDirectoryEntry;

my $git = Git::PurePerl->new(
gitdir => '/some/dir/foo/.git' # Or use the less direct 'directory' form.
);

# 10 Blobs please.
my @blobs = map {
my $blob = Git::PurePerl::NewObject::Blob->new(
content => 'String Of File Content, no' . $_ ,
);
} 1 .. 10;

my $i = 0;
my @direntries = map {
$i++;
Git::PurePerl::NewDirectoyEntry->new(
mode => 100644,
filename => "FooFile_$i",
sha1 => $_->sha1,
)
} @blobs;

my ( @treeblobs );

my (@dira, @dirb);
@dira = splice @direntries, 0, 5, ();
@dirb = splice @direntries, 0, 3, ();

my $tree_dira = Git::PurePerl::NewObject::Tree->new(
directory_entries => \@dira,
);
push @treeblobs, $tree_dira;

my $tree_dirb = Git::PurePerl::NewObject::Tree->new(
directory_entries => \@dirb,
);
push @treeblobs, $tree_dirb;

my $root_tree = Git::PurePerl::NewObject::Tree->new(
directory_entries => [
@direntries,
Git::PurePerl::NewDirectoryEntry->new(
mode => 040000,
filename => 'SubDir_A',
sha1 => $tree_dira->sha1,
),
Git::PurePerl::NewDirectoryEntry->new(
mode => 040000,
filename => 'SubDir_B',
sha1 => $tree_dirb->sha1,
),
]
);
push @treeblobs, $root_tree;

# stash blobs
$git->put_object( $_ ) for @blobs;
$git->put_object( $_ ) for @treeblobs;



Horray, all going to plan, you now have a simple digraph of data in git!.
There is still no root node, so git prune will still delete them all, but we're almost there.

Commit it!

This is the finishing touch. Once you do this, the object will hold its own in the datastore, and all be written in the metadata.

We need:
  1. Authors
  2. Timestamps
  3. Commit messages
  4. Optional: parent commit
  5. Target branch

Important Note about parent

. Although parent is optional, you shouldn't treat it as such unless you know what you're doing, or there is in fact no parent ( ie: its a brand spanking new branch, aka, a new symbolic ref). As git is represented as a chain:
[branch]->{commit}
             -V
            {commit}
             -V
            {commit}
and a "branch" is pretty-much a pointer to the head commit of a series, creating a singular commit at the end of a branch with no parent behaves the same as if you had DELETED THE WHOLE BRANCH, created a new, history-less symbolic-ref with the same name, and committed the commit to it, leaving a branch history of 1 item
We strongly recommend you use this field.

use strict;
use warnings;
use Git::PurePerl;
use Git::PurePerl::NewObject::Blob;
use Git::PurePerl::NewObject::Tree;
use Git::PurePerl::NewDirectoryEntry;
use Git::PurePerl::NewObject::Commit;
use Git::PurePerl::Actor;
use DateTime;

my $git = Git::PurePerl->new(
gitdir => '/some/dir/foo/.git' # Or use the less direct 'directory' form.
);
# Snip
# ....
# Snip

my $root_tree = something();

# Create the commit author.
my $author = Git::PurePerl::Actor->new(
name => "Bob Smith",
email => "BobSmith@example.com",
);
my $timestamp = DateTime->now();

my @parent;
if ( 0 ){
# Optional code block to determine parent commit id.
my $p = $git->ref_sha1('refs/heads/thebranchname');
@parent = ( parent => $p );
}

my $commit = Git::PurePerl::NewObject::Commit->new(
tree => $root_tree->sha1,
author => $author,
authored_time => $timestamp,
committer => $author,
committed_time => $timestamp,
comment => <<'EOF'; This is a commit message!11!!. EOF ); # stash blobs $git->put_object( $_ ) for @blobs;
$git->put_object( $_ ) for @treeblobs;
$git->put_object( $_, 'thebranchname') for ( $commit );



and that's it!. There is a new commit in the repository with the data you ascribed =)

There has been only one visible change as far as the filesystem is concerned, and that's the current checked out branch has been changed from whatever it was on, to 'thebranchname'. For our intents, this is not what we want, as this drops the whole thing we were trying to achieve of "nothing outside of this code should have any substantial visible effect".

Disabling the branch switch

The comments in the code indicate there may be a future time at which we don't have to work around this behaviour, but until now, here is a good WorksForMe™ way to do it.

sub Git::PurePerl::put_object_noswitch {
my ( $self, $object, $ref ) = @_;
$self->loose->put_object($object);

return unless ( $object->kind eq 'commit' );

$refname = 'master' unless $refname;
$self->update_ref_sane( $ref, $object->sha1 );
my $ref = Path::Class::file( $self->gitdir, 'refs', 'heads', $refname );
$ref->parent->mkpath;
my $ref_fh = $ref->openw;
$ref_fh->print($object->sha1) || die "Error writing to $ref";
}

Thanks for reading/Inspiration/Rationale

This document is presently serving as a convenient ways to get my mind state onto print, a brain dump, so I can get the logic of things I'm doing nice and clear. As a result of the above code, which I think is bit too complicated for somebody who wishes to use it for commit purposes, I've started on Git::PurePerl::CommitBuilder that will hopefully provide a much more friendly UI for that purpose than the one above.

No comments:

Post a Comment