Friday, 30 July 2010

Git Internals: An Executive Summary in 30 Lines of Perl, for smart newbies.

Update: Modified code a bit to handle the 'pack' specials. They're not so straight forward, will blog more on that later.

This blog post is not intended as a replacement for a real in-depth understanding of Gits command line interface, but it does aim to maximise the exposure of how it works internally, as really, its internal logic is astoundingly simple, and anyone with a good background in graph theory and databases will pretty much be able to quickly see the elegance in it. For more details, check out the excellent book, Pro Git, especially the internals chapter

The code

Gits core essentials, are almost nothing more than a bunch of deflated(zlib) text files. I'm going to assume you've got enough intelligence to RTFM and get a copy of something gitty and text based checked out. Perl Modules are good examples of this. I'm using my Dist::Zilla::PluginBundle::KENTNL::Lite tree.

git clone git://github.com/kentfredric/Dist-Zilla-PluginBundle-KENTNL-Lite.git /tmp/SomeDirName

I'm going to show you the core of git's system, which is just the "object" store.

cd /tmp/SomeDirName/.git
find objects/

Woot, there is all your files and stuff in git. How does it work? Thats where the perl script comes in.

#!/usr/bin/perl
use strict;
use warnings;

use Compress::Zlib;
use Carp qw( croak );

sub inflate_file {
    my ( $filename , $OFH ) = @_;
    my ( $inflator, $status ) = Compress::Zlib::inflateInit or croak("Cannot create inflator: $@");
    my $input = '';
    open my $fh, '<', $filename or croak("Can't open $filename, $@ $! $?");
    binmode $fh;
    binmode $OFH;

    my ( $output );
    while ( read( $fh, $input, 4096 )) {
        ( $output , $status ) = $inflator->inflate( \$input );
        print { $OFH } $output if $status == Compress::Zlib::Z_OK or $status == Compress::Zlib::Z_STREAM_END;
        last if $status != Compress::Zlib::Z_OK;
    }
    croak( "Inflation failed of $filename , $@" ) unless $status == Compress::Zlib::Z_STREAM_END;
}

for ( @ARGV ) {
    next if $_ =~ /\.(idx|pack)|packs/;
    print qq{<--------BEGIN $_ --------->\n};
    inflate_file( $_ , *STDOUT );
    print qq{<--------END $_ --------->\n};

}

Pretend you cargo-cult dump that code to /tmp/deflate.pl

Now check this out:

perl /tmp/deflate.pl $( find objects/ -type f ) | cat -v | less

Awesome, you're now seeing the guts of how your repository works. For real. All we did was deflate each and every object. You'll see 3 types of object, ( each object says at the front what type they are before the ^@ ), tree's, blobs, and commits ( with trees being the most complicated of all ).

Blobs, they're just a files contents

Commits, all they are is a blob of text, with commit messages and stuff, timestamps, etc, and with text references (pretend its like an a-href in a web page or something ) to preceding ( parent ) commits, and a commit tree.

Trees are probably the hardest to work out just by looking at it. Its more or less just another text file, with another list of text references, except text references are pointing at either blobs, or other trees. So, you can pretend a "tree" is like a "dir" in some ways. There's data besides this, like file/dir names, and permissions, but thats the gist of it.

This has been your executive summary =)

Monday, 19 July 2010

Current Limitations In Exception Driven Perl: Stringy Core Exceptions

Lets just assume for one moment that we have a proper Exception Hierarchy, and that this wasn't a huge gaping hole in the current Exception landscape.

There's still the other problem of so much Perl code being not designed in Exception friendly ways.

die "$string"and croak "$string" is about as detailed as you get from most things.

And I'm sure everyone agrees that only passes for the bare minimum of exception handling techniques. No benefits of runtime stack introspection ( Edit: Ok, not without mangling sigdie, yuck ), re-throwing exceptions without losing the source failure point ( Edit: to clarify, not all 'die' calls are represented in the error ), let alone problem classification without resorting to regexing' the failure string. ( and that's far from reliable, considering those strings are targeted at humans, not machines, so are prone to being modified at a time later in life in a way your regex won't recognise, breaking your code ).

autodie is a good start to solving this problem, it doesn't have all the bells and whistles I'd hoped for, it has an error hierarchy, but it doesn't appear very flexible to extensible into other projects ( the whole thing is defined in a 'my' variable in Fatal.pm it seems ), and additionally, it doesn't supplement any of the things in Perl that already just die by throwing their own stringy exception, because as far as autodie appears concerned, if its already throwing an exception, why replace it?

One such builtin that is in this type of problem is require

There are at least 3 unique separate failure conditions that I know 'require' can spit out.

  • File not Found in @INC
  • require returned false value
  • compilation failed in require

All of the above being reported merely as strings leaves much to be desired. Sure, its great when things fail in obvious ways, but handling it in code is far too pesky.

Not everyone will have experienced this problem of course, but let me demonstrate a scenario.

sub findFirst { 
  my $plugin = shift;
  my $parent = "SomeApp";
  my @guessOrder = ( $plugin . "::" . $parent , $plugin );
  my @fails;
  for( @guessOrder ){
     local $@;
     eval "require $_; 1";
     if ( $@ ) {
        die $@ if $@ !~ /not found/ ; 
        push @fails, $@;
     } else { 
        return $_ ; 
     }
  }
  die "Couldn't load any of @guessOrder : @fails ";
}

my $plug = findFirst("Foo::Bar");

This is about as semantically clean as I can get it. The goal here is to permit "Not Found" family of require failures, but upon encountering something that exists but is merely broken, then push that failure up to userland, and, in the event none are found, dump all the errors out showing all the attempted paths that were searched and what was searched for.

But there are several problems with this code, the most obvious is that stringy eval is a really bad idea, I had hoped that at least one of the workarounds for this sillyness on CPAN came with something that threw an Exception object instead of a string.... but no, all I can find is ones that rely on the stock Perl system, and ones that go contrary to all logic and require you to check a return value for failure.

Another problem is the check for a string in the error. This is not as big a problem, but somebody malicious I guess could break something by explicitly crafting a death message that matched that line.

Another lovely problem is that death-rethrowing thing. Finding everywhere that the problem occurred in a non-insane way is hard. Ideally, not only should you have a trace depth from top level down to the point of the failure, but also a trace of everywhere the error was re-thrown, because the failure is really a domino effect, and not being able to see how it propagates without dropping into a debugger is hell.You tend to need more complex cases to see why this is happening though.

#!/usr/bin/perl

use strict;
use warnings;

sub fail {
  die "Hurp Durp!";
}

sub maybfail {
  unless ( eval { fail; 1; } ) {
    die "maybfail: $@";
  }
}

sub moarfail {
  unless ( eval { maybfail; 1; } ) {
    die "Moarfail: $@";
  }
}

moarfail;
To me, I'd like to be able to see that
  • the root error occurred as main:22 { moarfail:17 { maybfail:11 { fail:7 { die } } }
  • The error was rethrown at main:22{ moarfail:17 { maybfail:12 } }
  • The error was rethrown at main:22{ moarfail:18 }
At present, here's the best I can get out of that simple structure:
$ perl -MCarp::Always /tmp/die.pl 
Moarfail: maybfail: Hurp Durp! at /tmp/die.pl line 18
 main::moarfail() called at /tmp/die.pl line 22
$ perl /tmp/die.pl 
Moarfail: maybfail: Hurp Durp! at /tmp/die.pl line 7.
$ perl -MCarp::Always /tmp/die.pl 
Moarfail: maybfail: Hurp Durp! at /tmp/die.pl line 18
 main::moarfail() called at /tmp/die.pl line 22
$ perl -MDevel::SimpleTrace /tmp/die.pl 
Moarfail: maybfail: Hurp Durp!
 at main::fail(/tmp/die.pl:7)
 at (/tmp/die.pl:11)
 at main::maybfail(/tmp/die.pl:11)
 at (/tmp/die.pl:17)
 at main::moarfail(/tmp/die.pl:17)
 at main::(/tmp/die.pl:22)

Note how none of those traces reflect the fact I call "die" on line 12? Be glad the die isn't like 30 lines away in a different method where it might go completely unnoticed.

In fact, each and every one of these backtraces confuse me, because I can't work out why some know about the failure origin, and others don't ... ( Carp::Always seems to let you down and being completely unable to see a stack. :/ )

I would in fact, much rather prefer something like this that actually worked:

#!/usr/bin/perl

use strict;
use warnings;

sub fail {
    BasicException->throw( error => 'HurpDurp' );
}

sub maybfail {
  try { 
      fail;
  } catch ( BasicException $e ) { 
     MoreComplexException->adopt( $e )->throw( error => 'Maybfail');
  }
}

sub moarfail {
  try { 
      maybfail;
  } catch ( MoreComplexException $e ) { 
     EvenMoreComplexExcetpion->adopt( $e )->throw( error => 'Moarfail');
 }
}

moarfail;

Nothing I've seen handles that "adopt" thing, but its my little way of saying "We are in fact creating a new exception, because we want to provide more information about the problem, and increase the meaning of the problem relative to this context, but we also want to recognise that this problem is likely caused by another problem(s) that we identify here."

In case you TL;DR'd here, ( and because my train of thought was just snapped -_- ), the summary of this is: Its really challenging doing proper exception-oriented Perl when so many code features still throw those nasty stringy exceptions. :(

Sunday, 18 July 2010

Current Limitations In Exception Driven Perl: Exception Base Classes.

I've started re-attempting to do Exception Oriented Perl Programming recently, and quickly discovered a whole raft of things that got in my way.

This is the first of such things.

I was very much appreciative of Exception::Class, it looks Mostly to Do The Right thing, its mostly simple and straight forward, it itself has some apparent limitations with regard to exception driven code, but I'll cover those later.

The biggest annoyance I have at present is there is no apparent de-facto base set of Exception classes to derive everything else from. I was expecting some sort of Exception Hierarchy much like Moose's Type Hierarchy, but none is to be found anywhere, and this stinks.

Is everyone to have their own base hierarchy for everything? The idea of every project having its own FileException class ship with it to me feels like Fail, and this problem I feel will be needed to addressed before more people start taking exception driven Perl seriously.

Additional to this fun, is presently, all the exception classes share the same name-space as everything else in Perl, because they're just Perl packages. I accept this limitation is mostly Perl's fault, but I still dislike it. The 'Type' name-space suffers a similar problem, but its not quite so bad.

The challenge here is having adequate classes to represent accurately all the classes of exception one wishes to provide, but have them still sanely organised, but without people needing to type out 100character incantations just to throw an exception.

Something akin to MooseX::Types which injects subs into the context would be nice-ish, the only problem there is when you do something stupid like create/import an exception with a name identical to a child namespace, ie:

   package Bar;
   use SomeTypePackage qw( Foo );
   use Bar::Foo; # Hurp durp. Bar::Foo->import() ==> Bar::Foo()->import() 
   Bar::Foo->new(); # moar hurp durp. Bar::Foo()->import() 

Its reasonably easy to work around, but discovering you've failed in this way is slightly less than obvious.

Sunday, 27 June 2010

Todays amusing Perl parser confusion

Have a look at this very simple code and see what you expect it will do:

#!/usr/bin/perl
use strict;
use warnings;


print "hello";

1

=pod

=cut
__END__

It looks trivial right?

Not so.

$ perl /tmp/pl.pl 
Can't modify constant item in scalar assignment at /tmp/pl.pl line 13, at EOF
Bareword "cut" not allowed while "strict subs" in use at /tmp/pl.pl line 8.
Bareword "pod" not allowed while "strict subs" in use at /tmp/pl.pl line 8.
Execution of /tmp/pl.pl aborted due to compilation errors.

Wait.

Wut?

Running it through Deparse reveals the culprit:

$ perl -MO=Deparse /tmp/pl.pl 
Can't modify constant item in scalar assignment at /tmp/pl.pl line 13, at EOF
Bareword "cut" not allowed while "strict subs" in use at /tmp/pl.pl line 8.
Bareword "pod" not allowed while "strict subs" in use at /tmp/pl.pl line 8.
/tmp/pl.pl had compilation errors.
use warnings;
use strict 'refs';
print 'hello';
1 = 'pod' = 'cut';
__DATA__

Pesky indeed!.

The solution? Insert the humble ; like your mother taught you to.

#!/usr/bin/perl
use strict;
use warnings;


print "hello";

1;

=pod

=cut
__END__
$ perl /tmp/pl.pl 
hello

Perhaps this is worthy of applying a bugfix. Perl version = 5.12.1 =).

Friday, 25 June 2010

Any good advice on focusing on the one scope in this massively metarecursive language?

The recursivity of the meta-programming these days in Perl is astounding.

This is not necessarily a bad thing, but it has its drawbacks in various fields

While I love authoring modules, and I love contributing to various projects, I often find this is a need, when I would rather be focusing on something that I need.

An Example

Let me give you and example: one of my family members requested them work on a website for them, for one of their businesses, and I as a result want to produce the best product I possibly can for this.

The first concern I encountered was shipping it. I need to be able to develop this website in a way that I can ship it somewhere ( target unknown ) and have a relatively quick, relatively hassle-free installation that Just Works, so in the event I have to hand the code over to somebody else to work with, or ship it to a different server where I may have less control over the environment or distribution it runs, it will still mostly just work

This lead me to my state-of-packaging post, where I started wasting various time trying to work out what best way to bundle/package and otherwise get the software to just work.

This need sort-of emerged out of the want to use the latest and greatest tools, such as Plack, the latest editions of Moose, etc.

However, as discovered in the aforementioned article, the state of linux distributions with regard to Perl in the larger scale largely sucks, and pretty much the "best" option tends to result in "using CPAN".

CPAN is great and all, don't get me wrong, but compared to existing linux distribution package management techniques, Perl dependency and file management leaves much to be desired. Sure, its miles ahead of Ruby and Python, ( not to mention evolutions of species better than PHP, Java and C/C++'s native package management ) but since when do we use the lesser tools as our measure of standard?

So anyhow, after musing for several days on this dilemma, researching various options, talking to various people, and blogging about it, and not getting very far, I decide I'm just wasting my time again and I should just hack something up on my box, and worry about this package management crap later

Distraction 2.0

So, I decide to get it working on my machine first, worry about everywhere else later, you know, when it matters. This is of course a potentially dangerous decision from a reliability standpoint, because you may discover whatever technique you decided to use on your system is completely non-viable on another.

On my machine, the first thing I do is go through my toolkit and update all the various packages I'll need using my Distributions Package Management tools. ( This surprisingly in my experience sucks less than it does than on the other distributions I've tried ).

Then I discover a discrepancy in how another developer has mapped Perl dependencies to Package Manager dependencies, that is different to how I've been doing them, and I then have to work out if its merely an error, or its intent. ( The specifics of this I won't bore you with here ). As part of diagnosis, while I'm waiting for a response on IRC from the developer who wrote that mapping, I of course write a Perl script to work out where else this style of mapping is being used in attempt to gauge how often it is used.

This eventually diverges until I'm parsing individual build scripts with Perl and am trying to extract balanced bracket sets from these files with context. ( Bad me, I should have just used Text::Balanced )

Fortunately, I disregarded that script eventually, because I realised how much of the day I'd wasted on this problem already. Argh. Still no closer to even starting the actual code :|

Other times, when doing the update phase, I discover a package incompatibility with Perl, for whatever reason. A recent example is some bizarre failure with Eval::Context. This failure is being a bit hard to trace down, because the failure occurs, as far as I can make out, in Carp. The usual techniques such as -MCarp::Always or -MDevel::SimpleTrace do not want to work, as for some reason, their presence cause the wonderful Heisenbug scenario, the bug vanishes! ( well, and a new one appears in its place ). And to make matters worse ( much much worse ), when I run the build + test by hand instead of under the packager sandbox installation system, the bug also vanishes. Pesky indeed. ( I haven't filed a bug for the above yet, in case you're asking, there's simply no point filing one until I can reliably recreate the scenario in a sterile way. And as a general rule I've found with Perl, most of the time, If I figure out what the problem is, I figure out a solution at the same time )

Lets assume for a moment I was able to actually work out what was going on, after dicking around for a few hours, I'd probably have found a patch that worked too, and possibly submitted a bug-request and patch to upstream, and then applied the workaround to the Perl overlay, I'd be able to get on my way to the next package.

Granted, at the moment, the number of failing packages I'm encountering is much much higher as I'm helping test the Perl 5.12.1 release precluding the integration into the main tree, and I'm voluntarily fixing these things because somebody has to test this stuff before it hits Luser land

More Recursion

Its not the case this time, I mean, yet with this project ( mostly because its yet to have any code! ), but I often find myself swimming deeper and deeper into the metaprogrammy sea.

In the beginning, it was just writing modules that made my life easier.

Then comes the fun of distribution of those modules to make others life easier

Then comes the want to make distribution of Modules easier

Then comes the awesome madness that is Dist::Zilla

Then comes you writing plugins for Dist::Zilla

Then you're writing plugin bundles for the above

Then you're working on Dist::Zilla itself( Patches ! :D )

Then you're contributing code to other peoples Dist::Zilla plugins

Then you're contributing code to fix various packages that other peoples Dist::Zilla's plugins use.

All this is great stuff, really, community++, but something in the back of my mind says "Hey, you're lost in the meta, you're so far removed from what you were actually trying to achieve you can no longer see the woods for the trees, in fact, you can't even see trees, all you're seeing is carbon atoms and you're trying to compute the spin on their electrons!"

My Problem Really

I think my problem is really I don't see a viable way of staying strictly a "high-level abstraction" consumer, and just using the abstractions that exist to achieve my goal, and I'm always drilling down into the guts of things, patching their core, getting all low-level into the implementation of things and forgetting my original goal for weeks.

The best I can come up with is "hey, perhaps you'll have to be anti-contributive a bit, an er, yuck, but write code that is probably redundant somewhere in a way that's not really optimally reusable, because the long-term maintenance requirements of publican shared code are a bit high"?

I think I just sicked up in my mouth at the idea of that :/

But I have to find some way to focus on the project level, food doesn't put itself on the table!

Some basic statistics on "Line Noise"

I was reading another blog about somebody intending to analyse what amount of perl code constitutes as "Line Noise", but they didn't appear to have Actually Done It.

I took a naïve approach and didn't make any assumptions about what "line noise" constitutes, and just did basic statistics on the prevalence of various characters for the sake of interest.

Partial Dump
  0.2 % :   511319 x char   64 : "\@"
  0.2 % :   564540 x char   55 : 7
  0.2 % :   593117 x char   79 : "O"
  0.2 % :   601710 x char   77 : "M"
  0.3 % :   675072 x char   92 : "\\"
  0.3 % :   684986 x char   68 : "D"
  0.3 % :   698665 x char   78 : "N"
  0.3 % :   709768 x char   76 : "L"
  0.3 % :   712074 x char   80 : "P"
  0.3 % :   763426 x char   56 : 8
  0.3 % :   784577 x char  107 : "k"
  0.3 % :   797560 x char   82 : "R"
  0.3 % :   833723 x char   54 : 6
  0.4 % :   912737 x char   52 : 4
  0.4 % :   920716 x char   93 : "]"
  0.4 % :   921001 x char   91 : "["
  0.4 % :   924075 x char   73 : "I"
  0.4 % :   947539 x char  118 : "v"
  0.4 % :   956653 x char   67 : "C"
  0.4 % :   996323 x char   65 : "A"
  0.4 % :  1000637 x char   83 : "S"
  0.5 % :  1125435 x char  119 : "w"
  0.5 % :  1151874 x char   46 : "."
  0.5 % :  1220735 x char   34 : "\""
  0.5 % :  1222341 x char    9 : "\t"
  0.5 % :  1222927 x char   51 : 3
  0.5 % :  1241600 x char   69 : "E"
  0.5 % :  1243448 x char   53 : 5
  0.5 % :  1332828 x char   84 : "T"
  0.6 % :  1443662 x char   57 : 9
  0.6 % :  1491434 x char  120 : "x"
  0.6 % :  1499376 x char  125 : "}"
  0.6 % :  1500792 x char  123 : "{"
  0.7 % :  1718028 x char  103 : "g"
  0.7 % :  1739054 x char   40 : "("
  0.7 % :  1739695 x char   41 : ")"
  0.7 % :  1792258 x char   59 : ";"
  0.7 % :  1825133 x char  121 : "y"
  0.8 % :  1837291 x char   98 : "b"
  0.8 % :  1842316 x char   35 : "#"
  0.8 % :  1960600 x char   50 : 2
  0.9 % :  2149806 x char   62 : ">"
  1.0 % :  2410416 x char   49 : 1
  1.1 % :  2594921 x char   61 : "="
  1.1 % :  2684166 x char   95 : "_"
  1.1 % :  2709633 x char  112 : "p"
  1.2 % :  2818643 x char   58 : ":"
  1.2 % :  2952175 x char  104 : "h"
  1.2 % :  2995621 x char   45 : "-"
  1.3 % :  3151943 x char  109 : "m"
  1.3 % :  3283418 x char   36 : "\$"
  1.3 % :  3291138 x char  102 : "f"
  1.4 % :  3339529 x char   39 : "'"
  1.4 % :  3355931 x char  117 : "u"
  1.5 % :  3638254 x char   99 : "c"
  1.6 % :  4016055 x char  100 : "d"
  1.9 % :  4598003 x char   44 : ","
  2.0 % :  4786703 x char  108 : "l"
  2.2 % :  5472272 x char   48 : 0
  2.6 % :  6279579 x char  110 : "n"
  2.6 % :  6306811 x char  111 : "o"
  2.7 % :  6625715 x char  105 : "i"
  2.8 % :  6872608 x char  114 : "r"
  3.0 % :  7315145 x char  115 : "s"
  3.1 % :  7522087 x char   97 : "a"
  3.6 % :  8711403 x char   10 : "\n"
  3.7 % :  8972142 x char  116 : "t"
  5.4 % : 13289205 x char  101 : "e"
 24.2 % : 59186425 x char   32 : " "

I find it quite intriguing how the various bracketings are unbalanced. Also the significantly greater use of ">" vs "<" indicates people write more than they read.Edit: probably more =>

Also, what is extremely amusing, is in this sort order, ignoring "r" "a" and "t" and all whitespace going down, a word is formed. That word.... is "noise". Weird.

For a full dump of my diagnositcs, see my github gist

The code I used to generate these stats is pretty straight forward, and would be interested in seeing what sort of results other people get, and possibly the result of adapting the code to work for C and other non-perl languages to work out how much "line noise" they are.

#!/usr/bin/perl
use strict;
use warnings;

use 5.12.1;
use File::Find::Rule            ();
use File::Find::Rule::Perl      ();
use Data::Dumper                qw( Dumper );

say $_ for ( @INC );

my @pmfiles = File::Find::Rule->perl_file->in( @INC );

my %stats;

for my $file ( @pmfiles ){
    say "scanning $file";
    open my $fh, '<', $file or next;
    my $char;
    while( read $fh, $char, 1 ){
        $stats{$char}++;
    }
#    last;
}

my @data = sort { $a->[0] <=> $b->[0] } map { [ $stats{$_} , $_ ] } keys %stats;

$Data::Dumper::Terse = 1;
$Data::Dumper::Useqq = 1;

my $numchars;
$numchars += $_ for values %stats;

for( @data ){
    printf "%5.1f %% : %8d x char %4d : %s" ,
       ( $_->[0] / $numchars * 100 ) , 
       $_->[0] , 
       ord( $_->[1] ),
       Dumper( $_->[1] );
}

Thursday, 17 June 2010

The Search for the Perfect Project Setup

I feel a bit like a retard today.

Perhaps, a spectacular one. I don't even know what to search for with regard to my problem as follows, and I guess I don't have the best Idea of what I want, so I'm blogging about it in the hope I can linearise my thought process a bit and work out what to do, and perhaps, somebody can point me in the right direction.

NB. There's a fair bit of "TL;DR" content here, but it stands in case people try to suggest I use these solutions instead, Its primarily a demonstration of what I've tried, and the logic I've obtained therein which I used to reach my current conclusion, and thus, my actual request.

Firstly, My current situation

At the moment, I install all my modules, not via any of the CPAN clients, but through my distribution. This yields a much cleaner system, and dependency tracking is more reversible, which files were installed by which distribution is more reliable, and distribution collisions are explicitly barred.

This is moderately straight forward, in Gentoo, we have these ebuilds which automate most of the hard work, and the technical debt of building a CPAN module and installing it is pretty much 0. A single 30 line text file, most of which is boiler-plate, ( and generated ), and its essentially bash code, almost freebsd in nature.

I'm not a fan-boy for Gentoo for any of the traditional reasons people ascribe to it ( i.e. as funrolloops portrays ). I actually like how the package management works, I like having access to all the source, I like being able to break stuff and report reasonable bug reports to get actual bugs fixed, and I like being able to Just Fix It myself when I want to. I'm not going to go and rubbish anybody else for their distribution choices or why they choose them, just for me, Gentoo is the sweet spot in my taste system. ( I just expect people to return the favour and not treat me like the retard because I'm not using $THEIR_SYSTEM )

As a general rule, other distributions have given me various headaches for various reasons, I haven't tried Arch yet, so I can't write that off as unfit for my way of working yet, but from what I see its mostly nice.

Perceived Obstacles: In walks Deb/Buntu

For various reasons, my way of working with Perl on Gentoo is not very friendly on some other Distros. At present, I have box running Ubuntu, which I initially set up to JustWork and be pretty simple for flatmates to use as an Internet terminal. It has since lost this role, and its really too much effort for me to wipe it off and install $OtherDistro from scratch on it. And fundementally, needing to do that just to work in Perl on that distro in a satisfactory nature is either a failure in that distro ( Snarky comments about Ubuntu here ), a failure of Perl ( I hope not, ) , or a failure of myself ( Pretty likely ).

I've seen and tried using dh-make-perl and its behaviour is very dis-satisfactory. Unfortunately, the most recent Perl I can get on Ubuntu is 5.10.1 , and the most recent version of dh-make-perl I can get on Ubuntu is the geriatric 0.62, which is goodness knows versions behind Debians equivalent.

dh-make-perl problems

  1. Non Recursive nature

    I can handle this, that's OK, I'm used to walking deps by myself on Gentoo where needed and satisfying them, its not challenging. But that said, these files are generated build scripts which are just text files, which are essentially generated from a naive template, and this is *really fast*. The dh-make-perl script by comparison takes as long to generate and build the .deb file as I could generate and edit the text file myself by hand!.

    Additionally, at present I only generate my files by hand by choice. I only do it by hand to guarantee quality in the generation, so that I can redistribute it.

    I could just use Vincent Pit(VPIT)++'s marvellous CPANPLUS::Dist::Gentoo which for the most part JustWorks™. It does all the cool recursive traversal, generation of ebuilds where needed, and its hands free, and fast.

    I attempted to use CPANPLUS::Dist::Deb, and that kinda just failed, which I'll go into later

  2. On half the things I've tried to build with it so far, its failed

    Again, possibly I'm a retard, or possibly Ubuntu is failing again, but it keeps dying with weird problems trying to find dependencies, or computing dependencies, and sometimes even can't detect things that have been built earlier and installed. ( For the record, I've been banging my head against the wall trying to get Plack to build )

    Sure, due to the nature of perl stuff its a bit hellish to extract dependencies reliably in all cases, but even then, this is Plack man, its pretty straight forward.

    Gentoo dependencies are reasonably simple to sort out when automation gets it wrong, the Debian format? I don't even know where to start.

    Granted I haven't spent much time reading the Debian Developer Guides to learn how to fix this sort of problem, and what sort of incantations to call to get something to build once I've manually fixed the problem, but its really overkill to even need to do that, I didn't need to read anything to start hacking on ebuilds. Its all self-contained and its bash, a language I already know, and extremely straight forward. Sure, I needed to learn a bit for supremely advanced edge cases, but I don't see demand for those on a regular basis.

I guess the obvious solution to the above would be learning more about Debian? But I've already exercised more than my share of WTF quota in this avenue.

CPANPLUS::Dist::Deb

Either this module sucks, or its just terribly broken, or its sucking due to ubuntuisims. My impression is its starting to be a little under-maintained, but not sure. The first time I tried to use it ( well, install it that is ), the majority of its tests just failed hard. So, I upgraded from Karmic to Lucid, and as a result, tests just Hang instead for about 5 minutes, before running the tests again, and failing most of them. Brilliant.

make[1]: Entering directory `/home/anyone/pl/CPANPLUS-Dist-Deb-0.12'
PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
t/00_constants.t .. ok     
t/01_load.t ....... ok    
t/02_debs.t ....... 1/? # Taking care of Build / xs  # massive hang here.

And then the rest of the Massive Failure is too big to include even in this inordinately large blog I eventually managed to get it to build and install, but I had to use --notest to get it to work.

Actually, I had to use define DEB_BUILD_OPTIONS="nocheck" .. because for some lovely reason, --notest, despite being very helpful, is deprecated!

Then the real fun started

using cpan2dist --format CPANPLUS::Dist::Deb Plack went off and decided to build packages with stupid names ( 'cpan-libplack-perl' anyone? ), that then fubared for some reason I still don't even want to understand. Hell, it makes Java back-traces look simple.

Conclusion: Perhaps relying on distro-packaged CPAN packages on most distros still sucks too hard

I've come to understanding at long last why people JustUseCpan™ instead of relying on their distros. Just look at the massive hell-hole of problems I encountered on just one distribution of Linux. Woe be unto him to wants to develop a Perl Project and then ship it and hope its easy to install using the tools provided by the recipients distribution of choice. I've been lulled into a false sense of security by my lovely system which is so simple to use.

So, You're doing a Project and relying on CPAN.pm and friends

There's a variety of goals a person like myself wants to achieve with this scenario.

  1. Low Pollution

    Pooping over /usr and friends is unacceptable. Especially if its not 100% Guaranteed reversible. No 2 Modules should be able to modify each others files, either by intent or accident. In some distributions, this is guaranteed by building and installing into a clean directory-tree with a "sandbox" mechanism that prohibits writing outside the build environment, and then collision-testing all the files in the clean-install directory prior to unpacking them into the file-system, and then bailing if a collision occurs. I like to have this degree of certainty with modules, and in fact, all software, which is the primary reason I rely on my Distros' package manager because it can give me these guarantees.

    You should NOT need elevated permissions to ever perform configure/build/test or install. Final application to the file-system should be performed by an externality with the needed permissions, that has no way of being "scripted" during the install phase by the package that is being installed.

    If another mechanism can exist within a context ( think perhaps something like local::lib ) that give me this same certainty without resorting t say, putting the whole bastard in git and relying on the ability to revert commits, ( its not that I'm averse to gitifying an install tree, its just when you install lots of modules, you don't want to have to halt things between installations just to maintain a 1:1 commit:distribution ratio -_-. I tried something like this once, and it was masochism ) then ThatConcept++, I want it!

  2. Ease of Roll-Out/Distribution

    Ideally, you want Some Way to minimise the amount of work one needs to do on any given target to make sure the installed modules are the very same ones that were on the platform it was developed in. Having to do the above dicking around on various distributions with their rubbishy package management crap, is a real nightmare. Especially if you don't have the luxury of knowing in advance what the target machine will be running. Sure, you try to know, but sometimes requirements change, and sometimes you don't get much choice about the machine you're working with, so its great to have it completely not matter where you're taking it.

    If you can assume its going to have a working version of some recent version of Perl, and that its not a completely different platform to the original ( ie: transitioning from Linux to Win32( or worse, Win64 ) is a nightmare, it would be nice to be unilaterally transformable, but that's too much "dream" at the moment ), then you can dump your code tree on it and have it more-or-less JustWork without having to waste more time working out how to get the bastard up and running.

    For me, this means I'd want a way to have a mostly-perl-version agnostic local::lib-ish installation, which essentially requires

    1. Checkout
    2. Some way to rebuild .XS stuff for $arch_target without needing to reinstall everything from scratch
    3. Optionally run t/* tests for everything that's installed
    4. Run/Serve up the code

  3. Somehow avoid the need to build a second instance of Perl on the target machine

    Having to do this is both very annoying, and very time consuming. Having a system, a methodology that avoids this need and Just Works for everyone who uses this methodology would be great

Kicking around the idea

/
 build/
      tars/
         Source tar.gz's 
      tmp/
         "Scratch" directory where things are configured/built/fake-installed
      installed-t/
        dist-name-version/
          Some attempt at extracting t/ from each dist
 cpan/
      main/
        primary @INC Path
      profile_a/
        supplementary @INC for experiments
 project/
      project_code*

There's some theoretical layout ideas. Some borrowed from how CPAN currently works.

To facilitate this layout however, some theoretical tools are needed

  1. Firstly, some way to create an @INC path that includes only the modules shipped with Perl itself, if that. This would be like local::lib, except we explicitly do not want modules that are provided by the system to be visible. This is to ensure that when new modules are added to the projects dependencies, they have to be installed in the projects custom inc path in order to work, to avoid the issue of going later on to a different machine, and then and only then discovering you need it.
    If there is no practical way to modify @INC that satisfies this criteria, then a combination of Module::CoreList and require hijacking would be needed to prohibit loading non-core modules from the system.
  2. Secondly, some way to "bootstrap" an environment for anything that might be using the project, be it hacking up $ENV vars like local::lib does, or something that loads itself via perl -M to mess with stuff before the rest of the code runs.
  3. A variation on the above to be able to run a cpan client without vision of "system" Perl libraries, in order to install things as if they were nowhere on the system already.
  4. Optionally, some tool that hooks into the cpan client to extract information to facilitate rebuilding XS files and running tests at a later install
  5. Some method to bundle an entire project tree for network-redistribution ( Git is the most logical option to me, but Rsync or tar.gz + scp would be suffice here too )
  6. A recipient tool on the receiving end that can re-inflate the code directory back in place ( git checkout for example )
  7. An ability to, like on the design machine, "bootstrap" into the controlled environment scenario.
  8. Optional/Nice to have: Automated XS Rebuild for all applicable items if needed
  9. Optional/Nice to have: Automated re-test of everything installed ( preferably without having to re-unpack re-configure re-rebuild and re-install every single package.( The idea is, to have the system be able to make itself useful, in the shortest possible time, without having to connect to the internet to download more data at any stage )
  10. Run the "bootstrapped" services.

This is about as far as I've gotten in my fleshing out of my desirables, let alone building a solution that works. I am sort-of hoping there is something simple and straight-forward that already exists and I can just go use and then recommend to everyone else I see because its just so damn awesome. But as I stated half-an-hour of reading ago, I don't have a good idea how to look :/

In the famous words of one too many lazy coder: "Plz Halps"

In case something in the above has made you want to mock me, please remember, I already said I feel like a retard.