Showing posts with label module. Show all posts
Showing posts with label module. Show all posts

2010-12-19

Introducing Data::Handle

Comming to a mirror near you, soon, is Data::Handle.

What does Data::Handle do?

Data::Handle solves 2 very simple problems that occur with the __DATA__ section and the associated *DATA Glob, and both of them are to do with "multiple modules trying to access the section".

1. Provide a reliable way to get a file-handle with the position at the start of the __DATA__ segment

  1. *DATA is really a pointer to the entire file, and not just the data segment
  2. The Perl interpreter sets the current position in the file to be after the __DATA__ line

The first time you read from *DATA this of course works fine, but the issue is once you read it, it moves the internal file cursor, and if you read the whole section, after the first complete read, the cursor now points to EOF. For a second block of code to re-read this data without communicating with the first block of code, it has to then rewind the file cursor back to the start prior to reading, and there is no way naturally to know where that point to rewind back to is.

Other modules so far have remedied this by trying to rewind to the start of the file, and manually emulate various parts of the Perl Parser to re-find the start of the __DATA__ section before re-reading its contents.

This module however takes a different approach, and assumes that hopefully, the first person to read that file handle will know what they're doing, and use this module to do it. This module will then record the file offset the __DATA__ section began at, so from that point onwards, rewinding to the start is a trivial exercise.

And all this happens for you simply by you doing :

my $handle = Data::Handle->new( __PACKAGE__ ); 

instead of doing

my $handle = do { no strict 'refs'; \*{ __PACKAGE__ . "::DATA"} };
. ( Note: Side perk, the new syntax is simpler, more straight forward, easier to remember, and no dicking around with strict! ;D )

2. Provide a reliable way for 2 separate logical code units to access the same __DATA__ segment without interfering with each other

Because *DATA is a filehandle, and there is only one of them, seeking around in it can be problematic.

Especially if you have 2 code units that are trying to read it from different places. For a contrived example, prior to this module if you wanted to go back and re-read the start of the section, or skip forwards and read something later in the section, without forgetting where you are now, you'd need a contrived dance of seek/tell. Instead, now, you can just create another worker that will read that stuff for you, and the original handle will retain its position.

my $handle = Data::Handle->new( 'Foo' );
while( <$handle> ){ 

   if ( $_ =~ /something/ ){ 
       # get line 1. 
       my $slave = Data::Handle->new('Foo');
       my $firstline = <$slave>;
       do_stuff_with_first_line($firstline);
   }
   
   # continue as normal.
}

Internally, there is a lovely dance of Seek() going on there, but from an interface perspective, you don't need to know its seeking, all you need to know is "Get reference to DATA, get data from it".

Sure, you can probably argue you could do it easily with lots of seek() in a nice way, but that logic falls apart when you have code in 2 separate places reading the same *DATA.

Its much smarter to be defensive about it, and have some assurance that you can read a file descriptor in a safe way without something evil like this tampering with it.


my $handle = do { no strict 'refs'; \*{ __PACKAGE__ . "::DATA"} };
while(<*DATA>){ 
   do_something_with_($_);
   evil_function();   
}

....
sub evil_function { 
  my $handle = do { no strict 'refs'; \*{ __PACKAGE__ . "::DATA"} };
  seek $handle, 0, 3; # seek to EOF.
}

That is spooky action at a distance!

Data::Handle solves this by meticulously tracking position in each instance, and re-seeking the file handle to the place it was at the end of the last tracked read, so regardless of how much seeking around some other module did, as long as you got on the scene first, you should be unstoppable ;)

2010-06-17

The Search for the Perfect Project Setup

I feel a bit like a retard today.

Perhaps, a spectacular one. I don't even know what to search for with regard to my problem as follows, and I guess I don't have the best Idea of what I want, so I'm blogging about it in the hope I can linearise my thought process a bit and work out what to do, and perhaps, somebody can point me in the right direction.

NB. There's a fair bit of "TL;DR" content here, but it stands in case people try to suggest I use these solutions instead, Its primarily a demonstration of what I've tried, and the logic I've obtained therein which I used to reach my current conclusion, and thus, my actual request.

Firstly, My current situation

At the moment, I install all my modules, not via any of the CPAN clients, but through my distribution. This yields a much cleaner system, and dependency tracking is more reversible, which files were installed by which distribution is more reliable, and distribution collisions are explicitly barred.

This is moderately straight forward, in Gentoo, we have these ebuilds which automate most of the hard work, and the technical debt of building a CPAN module and installing it is pretty much 0. A single 30 line text file, most of which is boiler-plate, ( and generated ), and its essentially bash code, almost freebsd in nature.

I'm not a fan-boy for Gentoo for any of the traditional reasons people ascribe to it ( i.e. as funrolloops portrays ). I actually like how the package management works, I like having access to all the source, I like being able to break stuff and report reasonable bug reports to get actual bugs fixed, and I like being able to Just Fix It myself when I want to. I'm not going to go and rubbish anybody else for their distribution choices or why they choose them, just for me, Gentoo is the sweet spot in my taste system. ( I just expect people to return the favour and not treat me like the retard because I'm not using $THEIR_SYSTEM )

As a general rule, other distributions have given me various headaches for various reasons, I haven't tried Arch yet, so I can't write that off as unfit for my way of working yet, but from what I see its mostly nice.

Perceived Obstacles: In walks Deb/Buntu

For various reasons, my way of working with Perl on Gentoo is not very friendly on some other Distros. At present, I have box running Ubuntu, which I initially set up to JustWork and be pretty simple for flatmates to use as an Internet terminal. It has since lost this role, and its really too much effort for me to wipe it off and install $OtherDistro from scratch on it. And fundementally, needing to do that just to work in Perl on that distro in a satisfactory nature is either a failure in that distro ( Snarky comments about Ubuntu here ), a failure of Perl ( I hope not, ) , or a failure of myself ( Pretty likely ).

I've seen and tried using dh-make-perl and its behaviour is very dis-satisfactory. Unfortunately, the most recent Perl I can get on Ubuntu is 5.10.1 , and the most recent version of dh-make-perl I can get on Ubuntu is the geriatric 0.62, which is goodness knows versions behind Debians equivalent.

dh-make-perl problems

  1. Non Recursive nature

    I can handle this, that's OK, I'm used to walking deps by myself on Gentoo where needed and satisfying them, its not challenging. But that said, these files are generated build scripts which are just text files, which are essentially generated from a naive template, and this is *really fast*. The dh-make-perl script by comparison takes as long to generate and build the .deb file as I could generate and edit the text file myself by hand!.

    Additionally, at present I only generate my files by hand by choice. I only do it by hand to guarantee quality in the generation, so that I can redistribute it.

    I could just use Vincent Pit(VPIT)++'s marvellous CPANPLUS::Dist::Gentoo which for the most part JustWorks™. It does all the cool recursive traversal, generation of ebuilds where needed, and its hands free, and fast.

    I attempted to use CPANPLUS::Dist::Deb, and that kinda just failed, which I'll go into later

  2. On half the things I've tried to build with it so far, its failed

    Again, possibly I'm a retard, or possibly Ubuntu is failing again, but it keeps dying with weird problems trying to find dependencies, or computing dependencies, and sometimes even can't detect things that have been built earlier and installed. ( For the record, I've been banging my head against the wall trying to get Plack to build )

    Sure, due to the nature of perl stuff its a bit hellish to extract dependencies reliably in all cases, but even then, this is Plack man, its pretty straight forward.

    Gentoo dependencies are reasonably simple to sort out when automation gets it wrong, the Debian format? I don't even know where to start.

    Granted I haven't spent much time reading the Debian Developer Guides to learn how to fix this sort of problem, and what sort of incantations to call to get something to build once I've manually fixed the problem, but its really overkill to even need to do that, I didn't need to read anything to start hacking on ebuilds. Its all self-contained and its bash, a language I already know, and extremely straight forward. Sure, I needed to learn a bit for supremely advanced edge cases, but I don't see demand for those on a regular basis.

I guess the obvious solution to the above would be learning more about Debian? But I've already exercised more than my share of WTF quota in this avenue.

CPANPLUS::Dist::Deb

Either this module sucks, or its just terribly broken, or its sucking due to ubuntuisims. My impression is its starting to be a little under-maintained, but not sure. The first time I tried to use it ( well, install it that is ), the majority of its tests just failed hard. So, I upgraded from Karmic to Lucid, and as a result, tests just Hang instead for about 5 minutes, before running the tests again, and failing most of them. Brilliant.

make[1]: Entering directory `/home/anyone/pl/CPANPLUS-Dist-Deb-0.12'
PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
t/00_constants.t .. ok     
t/01_load.t ....... ok    
t/02_debs.t ....... 1/? # Taking care of Build / xs  # massive hang here.

And then the rest of the Massive Failure is too big to include even in this inordinately large blog I eventually managed to get it to build and install, but I had to use --notest to get it to work.

Actually, I had to use define DEB_BUILD_OPTIONS="nocheck" .. because for some lovely reason, --notest, despite being very helpful, is deprecated!

Then the real fun started

using cpan2dist --format CPANPLUS::Dist::Deb Plack went off and decided to build packages with stupid names ( 'cpan-libplack-perl' anyone? ), that then fubared for some reason I still don't even want to understand. Hell, it makes Java back-traces look simple.

Conclusion: Perhaps relying on distro-packaged CPAN packages on most distros still sucks too hard

I've come to understanding at long last why people JustUseCpan™ instead of relying on their distros. Just look at the massive hell-hole of problems I encountered on just one distribution of Linux. Woe be unto him to wants to develop a Perl Project and then ship it and hope its easy to install using the tools provided by the recipients distribution of choice. I've been lulled into a false sense of security by my lovely system which is so simple to use.

So, You're doing a Project and relying on CPAN.pm and friends

There's a variety of goals a person like myself wants to achieve with this scenario.

  1. Low Pollution

    Pooping over /usr and friends is unacceptable. Especially if its not 100% Guaranteed reversible. No 2 Modules should be able to modify each others files, either by intent or accident. In some distributions, this is guaranteed by building and installing into a clean directory-tree with a "sandbox" mechanism that prohibits writing outside the build environment, and then collision-testing all the files in the clean-install directory prior to unpacking them into the file-system, and then bailing if a collision occurs. I like to have this degree of certainty with modules, and in fact, all software, which is the primary reason I rely on my Distros' package manager because it can give me these guarantees.

    You should NOT need elevated permissions to ever perform configure/build/test or install. Final application to the file-system should be performed by an externality with the needed permissions, that has no way of being "scripted" during the install phase by the package that is being installed.

    If another mechanism can exist within a context ( think perhaps something like local::lib ) that give me this same certainty without resorting t say, putting the whole bastard in git and relying on the ability to revert commits, ( its not that I'm averse to gitifying an install tree, its just when you install lots of modules, you don't want to have to halt things between installations just to maintain a 1:1 commit:distribution ratio -_-. I tried something like this once, and it was masochism ) then ThatConcept++, I want it!

  2. Ease of Roll-Out/Distribution

    Ideally, you want Some Way to minimise the amount of work one needs to do on any given target to make sure the installed modules are the very same ones that were on the platform it was developed in. Having to do the above dicking around on various distributions with their rubbishy package management crap, is a real nightmare. Especially if you don't have the luxury of knowing in advance what the target machine will be running. Sure, you try to know, but sometimes requirements change, and sometimes you don't get much choice about the machine you're working with, so its great to have it completely not matter where you're taking it.

    If you can assume its going to have a working version of some recent version of Perl, and that its not a completely different platform to the original ( ie: transitioning from Linux to Win32( or worse, Win64 ) is a nightmare, it would be nice to be unilaterally transformable, but that's too much "dream" at the moment ), then you can dump your code tree on it and have it more-or-less JustWork without having to waste more time working out how to get the bastard up and running.

    For me, this means I'd want a way to have a mostly-perl-version agnostic local::lib-ish installation, which essentially requires

    1. Checkout
    2. Some way to rebuild .XS stuff for $arch_target without needing to reinstall everything from scratch
    3. Optionally run t/* tests for everything that's installed
    4. Run/Serve up the code

  3. Somehow avoid the need to build a second instance of Perl on the target machine

    Having to do this is both very annoying, and very time consuming. Having a system, a methodology that avoids this need and Just Works for everyone who uses this methodology would be great

Kicking around the idea

/
 build/
      tars/
         Source tar.gz's 
      tmp/
         "Scratch" directory where things are configured/built/fake-installed
      installed-t/
        dist-name-version/
          Some attempt at extracting t/ from each dist
 cpan/
      main/
        primary @INC Path
      profile_a/
        supplementary @INC for experiments
 project/
      project_code*

There's some theoretical layout ideas. Some borrowed from how CPAN currently works.

To facilitate this layout however, some theoretical tools are needed

  1. Firstly, some way to create an @INC path that includes only the modules shipped with Perl itself, if that. This would be like local::lib, except we explicitly do not want modules that are provided by the system to be visible. This is to ensure that when new modules are added to the projects dependencies, they have to be installed in the projects custom inc path in order to work, to avoid the issue of going later on to a different machine, and then and only then discovering you need it.
    If there is no practical way to modify @INC that satisfies this criteria, then a combination of Module::CoreList and require hijacking would be needed to prohibit loading non-core modules from the system.
  2. Secondly, some way to "bootstrap" an environment for anything that might be using the project, be it hacking up $ENV vars like local::lib does, or something that loads itself via perl -M to mess with stuff before the rest of the code runs.
  3. A variation on the above to be able to run a cpan client without vision of "system" Perl libraries, in order to install things as if they were nowhere on the system already.
  4. Optionally, some tool that hooks into the cpan client to extract information to facilitate rebuilding XS files and running tests at a later install
  5. Some method to bundle an entire project tree for network-redistribution ( Git is the most logical option to me, but Rsync or tar.gz + scp would be suffice here too )
  6. A recipient tool on the receiving end that can re-inflate the code directory back in place ( git checkout for example )
  7. An ability to, like on the design machine, "bootstrap" into the controlled environment scenario.
  8. Optional/Nice to have: Automated XS Rebuild for all applicable items if needed
  9. Optional/Nice to have: Automated re-test of everything installed ( preferably without having to re-unpack re-configure re-rebuild and re-install every single package.( The idea is, to have the system be able to make itself useful, in the shortest possible time, without having to connect to the internet to download more data at any stage )
  10. Run the "bootstrapped" services.

This is about as far as I've gotten in my fleshing out of my desirables, let alone building a solution that works. I am sort-of hoping there is something simple and straight-forward that already exists and I can just go use and then recommend to everyone else I see because its just so damn awesome. But as I stated half-an-hour of reading ago, I don't have a good idea how to look :/

In the famous words of one too many lazy coder: "Plz Halps"

In case something in the above has made you want to mock me, please remember, I already said I feel like a retard.

2009-12-03

Dist-Zilla-Plugin-Git-CommitBuild

I've contemplated this for a while. I might get a round tuit, and do this myself, so this blog entry is here to jog my brain, jot down ideas, possibly collect info.

If you look at the repositories behind any of my CPAN dists ( well, most of them ), you'll see I maintain both release and source branches for the entire history ( http://github.com/kentfredric/ELF-Extract-Sections/network ), and more recently, maintaining a sort of "pre-release/release" sub-system, where stuff I build just for testing/preview purposes may have a life on their own branch, sort of like release candidates. 

This is essentially to provide a branch that is containing a full copy of all the generated code, as posted on CPAN, as opposed to the source that it is generated from, for posterity reasons mostly, and so I can deprecate versions on CPAN for incompatibility reasons one day , and people won't be left in the lurch to get an identical copy of it somewhere, as it will always be in the git history, just grab the right tag and you're set.

They could always use the backpan, but that has 2 caveats in my experience.

  1.  No diff mechanism. This feature is very important to people who do release maintenance for distributions, as  its the only good way to conclusively see what exactly changed between 2 consecutive versions, in order to update their internal dependency data that controls the shipping of the built copies.

    For this reason also, I loathe every time somebody deletes an older copy of their dist when its not been outdated for < 3 months, because it can take that long to notice that the shipped copy is outdated and for somebody to request a version bump. Not being able to use CPAN's diff feature makes this task much more challenging. ( At least for me, for that is how I do my work-flow, and I kind-of help out lots with gentoo's perl-experimental overlay ).

  2. Sometimes, versions live too short a time to be backed up on backpan. This is very problematic, for the above reason, and for the reason is you have no historical record of what happened outside the Changes, and the original commit history.  You could probably argue there's no reason to ever want these version that never made it to backpan, and you'd probably be right.
So to remedy this problem, here is what I do.
  1. Commit, and tag the exact source tree that was used to generate the released code in the notation %v-source. This theoretically guarantees that anyone can check out that exact same release, run "dzil release", and produce more or less the exact same output, with the only difference possibly being the version numbers emitted if you're using an [AutoVersion] or [AutoVersion::Relative] plugin.


    Here is the code snippet I use to do this that uses Jerome Quelin's [Git] plugin suite.
    [Git::Check]
    filename = Changes

    [NextRelease]

    [Git::Tag]
    filename = Changes
    tag_format = %v-source

    [Git::Commit]

    This Order is important. In the Build phase, [NextRelease] formats the Changes template into an exportable form, and puts the datestamp in it.

    In the pre-release phase, [Git::Check] makes sure theres nothing in the tree that isn't committed.

    [UploadToCpan]uploads the dist to CPAN, and the post-release phase kicks in.

    [NextRelease]  then kicks in again, and reformats the Changes so it resembles the previously released Changes except with that {{$NEXT}} stuff in it ready for hacking on. 

    [Git::Tag]  tags the last commit ( that is, not the current tree with the modified Changes, that's not committed yet, but the commit that it was at still when we released ) with %v-source, and then [Git::Commit] commits the updated Changes as a new commit ( with a copy of the first segment of the Changes file as its commit message )

  2. Have a separate commit history just for releases to be copied into.
    git symbolic-ref HEAD refs/heads/releases
    The first commit of this is built, generally from the first releases files. At present, I do this first release as so:
    rsync -avp Some-Dist-Name-0.010101/ ./
    then weed out all the files I'm pretty sure weren't in the generated tree by hand. ( I had a some code that did it all with rsync, and had an ignore list so that the --delete-after argument didn't accidentally erase all of .git, which would be very sad, but I accidentally deleted it :[  )

    This tree now represents an exact copy of the generated code, and it is committed as follows:
    git commit -m "Build of 0deadbeef0, version 0.010101 on cpan"
    or similar, to assure that every commit on the release branch, is a direct derivative of another commit on the source branch, and there's an intrinsic link between them.  ( I avoided having a direct link, because that gives cleaner histories ).


  3. That commit is tagged as the released version, ( ie: 0.010101 )
Now all this is wonderfully Tedious. At present, the best I have a script that makes the "commit and tag" phase on the release branch reasonably painless, but what I want to do, is have a nice way, to automate all of the above, every single bit of it, with a plugin.

Here is some proposed syntax.

[Git::CommitBuild / prerelease ]
branch = prereleases
autocreate = 1
phase = build

[Git::CommitBuild / release]
branch = releases
autocreate = 1
phase = after_release

 Why this notation? well, I guess it just seems the right amount of flexible to me.
the text after the  / is totally optional, and its just a way to let dzil differentiate between copies of the same plugin.  

branch is to tell it what git branch to work with. I figgured I could just use the name part after the  /, but it seemed nasty to me ( spaces for instance ). At very best, it could default to that value if branch = is not specified.

autocreate = 1 would magick the branch out of fat air the first time you tried to commit to it and it wasn't there. This would be off by default, as it could be annoying to you if you'd already created another branch with a different name for that purpose, and typoed and it created another branch. This way it fails instead of annoying you.

phase = build is sadly the most scary bit I'm trying to eliminate the stink of. Essentially, I have one plugin that does only one thing, but there are 2 different times I may want to run it at. ( And there are possibly more places people might want to say "stop the build, store this somewhere, then continue" ). 

In the above scenario, 'prerelease' I envisage as only getting run when I call "dzil build" explicitly. NOT 'dzil test' and NOT 'dzil release', only 'dzil build'.

Also, Ideally, the whole commit phase should be done, magically, entirely in memory, with some magical git magic, to eliminate the whole "write it out to the filesystem before creating the actual commit data" part of the equation, so that nowhere anywhere does there transpire something like
git checkout releases
git commit stuff
git checkout master
which causes anarchy in the event anything else happened to be using the file-system.

Thoughts/Suggestions anyone?