2010-06-25

Any good advice on focusing on the one scope in this massively metarecursive language?

The recursivity of the meta-programming these days in Perl is astounding.

This is not necessarily a bad thing, but it has its drawbacks in various fields

While I love authoring modules, and I love contributing to various projects, I often find this is a need, when I would rather be focusing on something that I need.

An Example

Let me give you and example: one of my family members requested them work on a website for them, for one of their businesses, and I as a result want to produce the best product I possibly can for this.

The first concern I encountered was shipping it. I need to be able to develop this website in a way that I can ship it somewhere ( target unknown ) and have a relatively quick, relatively hassle-free installation that Just Works, so in the event I have to hand the code over to somebody else to work with, or ship it to a different server where I may have less control over the environment or distribution it runs, it will still mostly just work

This lead me to my state-of-packaging post, where I started wasting various time trying to work out what best way to bundle/package and otherwise get the software to just work.

This need sort-of emerged out of the want to use the latest and greatest tools, such as Plack, the latest editions of Moose, etc.

However, as discovered in the aforementioned article, the state of linux distributions with regard to Perl in the larger scale largely sucks, and pretty much the "best" option tends to result in "using CPAN".

CPAN is great and all, don't get me wrong, but compared to existing linux distribution package management techniques, Perl dependency and file management leaves much to be desired. Sure, its miles ahead of Ruby and Python, ( not to mention evolutions of species better than PHP, Java and C/C++'s native package management ) but since when do we use the lesser tools as our measure of standard?

So anyhow, after musing for several days on this dilemma, researching various options, talking to various people, and blogging about it, and not getting very far, I decide I'm just wasting my time again and I should just hack something up on my box, and worry about this package management crap later

Distraction 2.0

So, I decide to get it working on my machine first, worry about everywhere else later, you know, when it matters. This is of course a potentially dangerous decision from a reliability standpoint, because you may discover whatever technique you decided to use on your system is completely non-viable on another.

On my machine, the first thing I do is go through my toolkit and update all the various packages I'll need using my Distributions Package Management tools. ( This surprisingly in my experience sucks less than it does than on the other distributions I've tried ).

Then I discover a discrepancy in how another developer has mapped Perl dependencies to Package Manager dependencies, that is different to how I've been doing them, and I then have to work out if its merely an error, or its intent. ( The specifics of this I won't bore you with here ). As part of diagnosis, while I'm waiting for a response on IRC from the developer who wrote that mapping, I of course write a Perl script to work out where else this style of mapping is being used in attempt to gauge how often it is used.

This eventually diverges until I'm parsing individual build scripts with Perl and am trying to extract balanced bracket sets from these files with context. ( Bad me, I should have just used Text::Balanced )

Fortunately, I disregarded that script eventually, because I realised how much of the day I'd wasted on this problem already. Argh. Still no closer to even starting the actual code :|

Other times, when doing the update phase, I discover a package incompatibility with Perl, for whatever reason. A recent example is some bizarre failure with Eval::Context. This failure is being a bit hard to trace down, because the failure occurs, as far as I can make out, in Carp. The usual techniques such as -MCarp::Always or -MDevel::SimpleTrace do not want to work, as for some reason, their presence cause the wonderful Heisenbug scenario, the bug vanishes! ( well, and a new one appears in its place ). And to make matters worse ( much much worse ), when I run the build + test by hand instead of under the packager sandbox installation system, the bug also vanishes. Pesky indeed. ( I haven't filed a bug for the above yet, in case you're asking, there's simply no point filing one until I can reliably recreate the scenario in a sterile way. And as a general rule I've found with Perl, most of the time, If I figure out what the problem is, I figure out a solution at the same time )

Lets assume for a moment I was able to actually work out what was going on, after dicking around for a few hours, I'd probably have found a patch that worked too, and possibly submitted a bug-request and patch to upstream, and then applied the workaround to the Perl overlay, I'd be able to get on my way to the next package.

Granted, at the moment, the number of failing packages I'm encountering is much much higher as I'm helping test the Perl 5.12.1 release precluding the integration into the main tree, and I'm voluntarily fixing these things because somebody has to test this stuff before it hits Luser land

More Recursion

Its not the case this time, I mean, yet with this project ( mostly because its yet to have any code! ), but I often find myself swimming deeper and deeper into the metaprogrammy sea.

In the beginning, it was just writing modules that made my life easier.

Then comes the fun of distribution of those modules to make others life easier

Then comes the want to make distribution of Modules easier

Then comes the awesome madness that is Dist::Zilla

Then comes you writing plugins for Dist::Zilla

Then you're writing plugin bundles for the above

Then you're working on Dist::Zilla itself( Patches ! :D )

Then you're contributing code to other peoples Dist::Zilla plugins

Then you're contributing code to fix various packages that other peoples Dist::Zilla's plugins use.

All this is great stuff, really, community++, but something in the back of my mind says "Hey, you're lost in the meta, you're so far removed from what you were actually trying to achieve you can no longer see the woods for the trees, in fact, you can't even see trees, all you're seeing is carbon atoms and you're trying to compute the spin on their electrons!"

My Problem Really

I think my problem is really I don't see a viable way of staying strictly a "high-level abstraction" consumer, and just using the abstractions that exist to achieve my goal, and I'm always drilling down into the guts of things, patching their core, getting all low-level into the implementation of things and forgetting my original goal for weeks.

The best I can come up with is "hey, perhaps you'll have to be anti-contributive a bit, an er, yuck, but write code that is probably redundant somewhere in a way that's not really optimally reusable, because the long-term maintenance requirements of publican shared code are a bit high"?

I think I just sicked up in my mouth at the idea of that :/

But I have to find some way to focus on the project level, food doesn't put itself on the table!

Some basic statistics on "Line Noise"

I was reading another blog about somebody intending to analyse what amount of perl code constitutes as "Line Noise", but they didn't appear to have Actually Done It.

I took a naïve approach and didn't make any assumptions about what "line noise" constitutes, and just did basic statistics on the prevalence of various characters for the sake of interest.

Partial Dump
  0.2 % :   511319 x char   64 : "\@"
  0.2 % :   564540 x char   55 : 7
  0.2 % :   593117 x char   79 : "O"
  0.2 % :   601710 x char   77 : "M"
  0.3 % :   675072 x char   92 : "\\"
  0.3 % :   684986 x char   68 : "D"
  0.3 % :   698665 x char   78 : "N"
  0.3 % :   709768 x char   76 : "L"
  0.3 % :   712074 x char   80 : "P"
  0.3 % :   763426 x char   56 : 8
  0.3 % :   784577 x char  107 : "k"
  0.3 % :   797560 x char   82 : "R"
  0.3 % :   833723 x char   54 : 6
  0.4 % :   912737 x char   52 : 4
  0.4 % :   920716 x char   93 : "]"
  0.4 % :   921001 x char   91 : "["
  0.4 % :   924075 x char   73 : "I"
  0.4 % :   947539 x char  118 : "v"
  0.4 % :   956653 x char   67 : "C"
  0.4 % :   996323 x char   65 : "A"
  0.4 % :  1000637 x char   83 : "S"
  0.5 % :  1125435 x char  119 : "w"
  0.5 % :  1151874 x char   46 : "."
  0.5 % :  1220735 x char   34 : "\""
  0.5 % :  1222341 x char    9 : "\t"
  0.5 % :  1222927 x char   51 : 3
  0.5 % :  1241600 x char   69 : "E"
  0.5 % :  1243448 x char   53 : 5
  0.5 % :  1332828 x char   84 : "T"
  0.6 % :  1443662 x char   57 : 9
  0.6 % :  1491434 x char  120 : "x"
  0.6 % :  1499376 x char  125 : "}"
  0.6 % :  1500792 x char  123 : "{"
  0.7 % :  1718028 x char  103 : "g"
  0.7 % :  1739054 x char   40 : "("
  0.7 % :  1739695 x char   41 : ")"
  0.7 % :  1792258 x char   59 : ";"
  0.7 % :  1825133 x char  121 : "y"
  0.8 % :  1837291 x char   98 : "b"
  0.8 % :  1842316 x char   35 : "#"
  0.8 % :  1960600 x char   50 : 2
  0.9 % :  2149806 x char   62 : ">"
  1.0 % :  2410416 x char   49 : 1
  1.1 % :  2594921 x char   61 : "="
  1.1 % :  2684166 x char   95 : "_"
  1.1 % :  2709633 x char  112 : "p"
  1.2 % :  2818643 x char   58 : ":"
  1.2 % :  2952175 x char  104 : "h"
  1.2 % :  2995621 x char   45 : "-"
  1.3 % :  3151943 x char  109 : "m"
  1.3 % :  3283418 x char   36 : "\$"
  1.3 % :  3291138 x char  102 : "f"
  1.4 % :  3339529 x char   39 : "'"
  1.4 % :  3355931 x char  117 : "u"
  1.5 % :  3638254 x char   99 : "c"
  1.6 % :  4016055 x char  100 : "d"
  1.9 % :  4598003 x char   44 : ","
  2.0 % :  4786703 x char  108 : "l"
  2.2 % :  5472272 x char   48 : 0
  2.6 % :  6279579 x char  110 : "n"
  2.6 % :  6306811 x char  111 : "o"
  2.7 % :  6625715 x char  105 : "i"
  2.8 % :  6872608 x char  114 : "r"
  3.0 % :  7315145 x char  115 : "s"
  3.1 % :  7522087 x char   97 : "a"
  3.6 % :  8711403 x char   10 : "\n"
  3.7 % :  8972142 x char  116 : "t"
  5.4 % : 13289205 x char  101 : "e"
 24.2 % : 59186425 x char   32 : " "

I find it quite intriguing how the various bracketings are unbalanced. Also the significantly greater use of ">" vs "<" indicates people write more than they read.Edit: probably more =>

Also, what is extremely amusing, is in this sort order, ignoring "r" "a" and "t" and all whitespace going down, a word is formed. That word.... is "noise". Weird.

For a full dump of my diagnositcs, see my github gist

The code I used to generate these stats is pretty straight forward, and would be interested in seeing what sort of results other people get, and possibly the result of adapting the code to work for C and other non-perl languages to work out how much "line noise" they are.

#!/usr/bin/perl
use strict;
use warnings;

use 5.12.1;
use File::Find::Rule            ();
use File::Find::Rule::Perl      ();
use Data::Dumper                qw( Dumper );

say $_ for ( @INC );

my @pmfiles = File::Find::Rule->perl_file->in( @INC );

my %stats;

for my $file ( @pmfiles ){
    say "scanning $file";
    open my $fh, '<', $file or next;
    my $char;
    while( read $fh, $char, 1 ){
        $stats{$char}++;
    }
#    last;
}

my @data = sort { $a->[0] <=> $b->[0] } map { [ $stats{$_} , $_ ] } keys %stats;

$Data::Dumper::Terse = 1;
$Data::Dumper::Useqq = 1;

my $numchars;
$numchars += $_ for values %stats;

for( @data ){
    printf "%5.1f %% : %8d x char %4d : %s" ,
       ( $_->[0] / $numchars * 100 ) , 
       $_->[0] , 
       ord( $_->[1] ),
       Dumper( $_->[1] );
}