2010-06-17

The Search for the Perfect Project Setup

I feel a bit like a retard today.

Perhaps, a spectacular one. I don't even know what to search for with regard to my problem as follows, and I guess I don't have the best Idea of what I want, so I'm blogging about it in the hope I can linearise my thought process a bit and work out what to do, and perhaps, somebody can point me in the right direction.

NB. There's a fair bit of "TL;DR" content here, but it stands in case people try to suggest I use these solutions instead, Its primarily a demonstration of what I've tried, and the logic I've obtained therein which I used to reach my current conclusion, and thus, my actual request.

Firstly, My current situation

At the moment, I install all my modules, not via any of the CPAN clients, but through my distribution. This yields a much cleaner system, and dependency tracking is more reversible, which files were installed by which distribution is more reliable, and distribution collisions are explicitly barred.

This is moderately straight forward, in Gentoo, we have these ebuilds which automate most of the hard work, and the technical debt of building a CPAN module and installing it is pretty much 0. A single 30 line text file, most of which is boiler-plate, ( and generated ), and its essentially bash code, almost freebsd in nature.

I'm not a fan-boy for Gentoo for any of the traditional reasons people ascribe to it ( i.e. as funrolloops portrays ). I actually like how the package management works, I like having access to all the source, I like being able to break stuff and report reasonable bug reports to get actual bugs fixed, and I like being able to Just Fix It myself when I want to. I'm not going to go and rubbish anybody else for their distribution choices or why they choose them, just for me, Gentoo is the sweet spot in my taste system. ( I just expect people to return the favour and not treat me like the retard because I'm not using $THEIR_SYSTEM )

As a general rule, other distributions have given me various headaches for various reasons, I haven't tried Arch yet, so I can't write that off as unfit for my way of working yet, but from what I see its mostly nice.

Perceived Obstacles: In walks Deb/Buntu

For various reasons, my way of working with Perl on Gentoo is not very friendly on some other Distros. At present, I have box running Ubuntu, which I initially set up to JustWork and be pretty simple for flatmates to use as an Internet terminal. It has since lost this role, and its really too much effort for me to wipe it off and install $OtherDistro from scratch on it. And fundementally, needing to do that just to work in Perl on that distro in a satisfactory nature is either a failure in that distro ( Snarky comments about Ubuntu here ), a failure of Perl ( I hope not, ) , or a failure of myself ( Pretty likely ).

I've seen and tried using dh-make-perl and its behaviour is very dis-satisfactory. Unfortunately, the most recent Perl I can get on Ubuntu is 5.10.1 , and the most recent version of dh-make-perl I can get on Ubuntu is the geriatric 0.62, which is goodness knows versions behind Debians equivalent.

dh-make-perl problems

  1. Non Recursive nature

    I can handle this, that's OK, I'm used to walking deps by myself on Gentoo where needed and satisfying them, its not challenging. But that said, these files are generated build scripts which are just text files, which are essentially generated from a naive template, and this is *really fast*. The dh-make-perl script by comparison takes as long to generate and build the .deb file as I could generate and edit the text file myself by hand!.

    Additionally, at present I only generate my files by hand by choice. I only do it by hand to guarantee quality in the generation, so that I can redistribute it.

    I could just use Vincent Pit(VPIT)++'s marvellous CPANPLUS::Dist::Gentoo which for the most part JustWorks™. It does all the cool recursive traversal, generation of ebuilds where needed, and its hands free, and fast.

    I attempted to use CPANPLUS::Dist::Deb, and that kinda just failed, which I'll go into later

  2. On half the things I've tried to build with it so far, its failed

    Again, possibly I'm a retard, or possibly Ubuntu is failing again, but it keeps dying with weird problems trying to find dependencies, or computing dependencies, and sometimes even can't detect things that have been built earlier and installed. ( For the record, I've been banging my head against the wall trying to get Plack to build )

    Sure, due to the nature of perl stuff its a bit hellish to extract dependencies reliably in all cases, but even then, this is Plack man, its pretty straight forward.

    Gentoo dependencies are reasonably simple to sort out when automation gets it wrong, the Debian format? I don't even know where to start.

    Granted I haven't spent much time reading the Debian Developer Guides to learn how to fix this sort of problem, and what sort of incantations to call to get something to build once I've manually fixed the problem, but its really overkill to even need to do that, I didn't need to read anything to start hacking on ebuilds. Its all self-contained and its bash, a language I already know, and extremely straight forward. Sure, I needed to learn a bit for supremely advanced edge cases, but I don't see demand for those on a regular basis.

I guess the obvious solution to the above would be learning more about Debian? But I've already exercised more than my share of WTF quota in this avenue.

CPANPLUS::Dist::Deb

Either this module sucks, or its just terribly broken, or its sucking due to ubuntuisims. My impression is its starting to be a little under-maintained, but not sure. The first time I tried to use it ( well, install it that is ), the majority of its tests just failed hard. So, I upgraded from Karmic to Lucid, and as a result, tests just Hang instead for about 5 minutes, before running the tests again, and failing most of them. Brilliant.

make[1]: Entering directory `/home/anyone/pl/CPANPLUS-Dist-Deb-0.12'
PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
t/00_constants.t .. ok     
t/01_load.t ....... ok    
t/02_debs.t ....... 1/? # Taking care of Build / xs  # massive hang here.

And then the rest of the Massive Failure is too big to include even in this inordinately large blog I eventually managed to get it to build and install, but I had to use --notest to get it to work.

Actually, I had to use define DEB_BUILD_OPTIONS="nocheck" .. because for some lovely reason, --notest, despite being very helpful, is deprecated!

Then the real fun started

using cpan2dist --format CPANPLUS::Dist::Deb Plack went off and decided to build packages with stupid names ( 'cpan-libplack-perl' anyone? ), that then fubared for some reason I still don't even want to understand. Hell, it makes Java back-traces look simple.

Conclusion: Perhaps relying on distro-packaged CPAN packages on most distros still sucks too hard

I've come to understanding at long last why people JustUseCpan™ instead of relying on their distros. Just look at the massive hell-hole of problems I encountered on just one distribution of Linux. Woe be unto him to wants to develop a Perl Project and then ship it and hope its easy to install using the tools provided by the recipients distribution of choice. I've been lulled into a false sense of security by my lovely system which is so simple to use.

So, You're doing a Project and relying on CPAN.pm and friends

There's a variety of goals a person like myself wants to achieve with this scenario.

  1. Low Pollution

    Pooping over /usr and friends is unacceptable. Especially if its not 100% Guaranteed reversible. No 2 Modules should be able to modify each others files, either by intent or accident. In some distributions, this is guaranteed by building and installing into a clean directory-tree with a "sandbox" mechanism that prohibits writing outside the build environment, and then collision-testing all the files in the clean-install directory prior to unpacking them into the file-system, and then bailing if a collision occurs. I like to have this degree of certainty with modules, and in fact, all software, which is the primary reason I rely on my Distros' package manager because it can give me these guarantees.

    You should NOT need elevated permissions to ever perform configure/build/test or install. Final application to the file-system should be performed by an externality with the needed permissions, that has no way of being "scripted" during the install phase by the package that is being installed.

    If another mechanism can exist within a context ( think perhaps something like local::lib ) that give me this same certainty without resorting t say, putting the whole bastard in git and relying on the ability to revert commits, ( its not that I'm averse to gitifying an install tree, its just when you install lots of modules, you don't want to have to halt things between installations just to maintain a 1:1 commit:distribution ratio -_-. I tried something like this once, and it was masochism ) then ThatConcept++, I want it!

  2. Ease of Roll-Out/Distribution

    Ideally, you want Some Way to minimise the amount of work one needs to do on any given target to make sure the installed modules are the very same ones that were on the platform it was developed in. Having to do the above dicking around on various distributions with their rubbishy package management crap, is a real nightmare. Especially if you don't have the luxury of knowing in advance what the target machine will be running. Sure, you try to know, but sometimes requirements change, and sometimes you don't get much choice about the machine you're working with, so its great to have it completely not matter where you're taking it.

    If you can assume its going to have a working version of some recent version of Perl, and that its not a completely different platform to the original ( ie: transitioning from Linux to Win32( or worse, Win64 ) is a nightmare, it would be nice to be unilaterally transformable, but that's too much "dream" at the moment ), then you can dump your code tree on it and have it more-or-less JustWork without having to waste more time working out how to get the bastard up and running.

    For me, this means I'd want a way to have a mostly-perl-version agnostic local::lib-ish installation, which essentially requires

    1. Checkout
    2. Some way to rebuild .XS stuff for $arch_target without needing to reinstall everything from scratch
    3. Optionally run t/* tests for everything that's installed
    4. Run/Serve up the code

  3. Somehow avoid the need to build a second instance of Perl on the target machine

    Having to do this is both very annoying, and very time consuming. Having a system, a methodology that avoids this need and Just Works for everyone who uses this methodology would be great

Kicking around the idea

/
 build/
      tars/
         Source tar.gz's 
      tmp/
         "Scratch" directory where things are configured/built/fake-installed
      installed-t/
        dist-name-version/
          Some attempt at extracting t/ from each dist
 cpan/
      main/
        primary @INC Path
      profile_a/
        supplementary @INC for experiments
 project/
      project_code*

There's some theoretical layout ideas. Some borrowed from how CPAN currently works.

To facilitate this layout however, some theoretical tools are needed

  1. Firstly, some way to create an @INC path that includes only the modules shipped with Perl itself, if that. This would be like local::lib, except we explicitly do not want modules that are provided by the system to be visible. This is to ensure that when new modules are added to the projects dependencies, they have to be installed in the projects custom inc path in order to work, to avoid the issue of going later on to a different machine, and then and only then discovering you need it.
    If there is no practical way to modify @INC that satisfies this criteria, then a combination of Module::CoreList and require hijacking would be needed to prohibit loading non-core modules from the system.
  2. Secondly, some way to "bootstrap" an environment for anything that might be using the project, be it hacking up $ENV vars like local::lib does, or something that loads itself via perl -M to mess with stuff before the rest of the code runs.
  3. A variation on the above to be able to run a cpan client without vision of "system" Perl libraries, in order to install things as if they were nowhere on the system already.
  4. Optionally, some tool that hooks into the cpan client to extract information to facilitate rebuilding XS files and running tests at a later install
  5. Some method to bundle an entire project tree for network-redistribution ( Git is the most logical option to me, but Rsync or tar.gz + scp would be suffice here too )
  6. A recipient tool on the receiving end that can re-inflate the code directory back in place ( git checkout for example )
  7. An ability to, like on the design machine, "bootstrap" into the controlled environment scenario.
  8. Optional/Nice to have: Automated XS Rebuild for all applicable items if needed
  9. Optional/Nice to have: Automated re-test of everything installed ( preferably without having to re-unpack re-configure re-rebuild and re-install every single package.( The idea is, to have the system be able to make itself useful, in the shortest possible time, without having to connect to the internet to download more data at any stage )
  10. Run the "bootstrapped" services.

This is about as far as I've gotten in my fleshing out of my desirables, let alone building a solution that works. I am sort-of hoping there is something simple and straight-forward that already exists and I can just go use and then recommend to everyone else I see because its just so damn awesome. But as I stated half-an-hour of reading ago, I don't have a good idea how to look :/

In the famous words of one too many lazy coder: "Plz Halps"

In case something in the above has made you want to mock me, please remember, I already said I feel like a retard.

2 comments:

  1. On the last time I've jumped to use local::lib in most of my deployments.

    The few times I've used dh-make-perl worked as a charm (no recursion as well as dpkg-buildpackage for backports), you only need to have the right -dev (or whatever) packages installed when compiling XS is needed.

    ReplyDelete
  2. I've been discussing / brainstorming another (similar) part of this problem with several people in my github project, http://github.com/robinsmidsrod/unnamed-perl-cms-project, read through the thoughts in the deployment section and I think you'd find a lot of similarities in our wishes. If you'd like to talk I'm idling in #perlcms on irc.perl.org in the hopes that other would want to discuss some of the ideas I have expressed in the README (wrt. a perl cms or deployment issues).

    If you don't fancy going into that thing you might consider looking at the Nix (nixos.org) package manager, which is possible to install on any Linux version (other Unixes as well, I think), and is supposed to handle a lot of these dependency issues. I personally haven't had the time to try it yet, put I'd love some feedback on whether or not it is useful for perl/cpan.

    ReplyDelete