2010-11-13

Searching / Design spec for the Ultimate 'require' tool.

Perl's de-facto require method is something of confusing amounts of complexity, complexity that is often overlooked in the edge cases. It looks straight forward:

require Class::Name;

And you think you're done right?

Not so.

Most of the problems come from one of 2 avenues.

  1. Things that happen when the module specified cannot, for whatever reason, be sourced
  2. Things that happen when you want to require a module by string name

Point 2 is probably the most commonly covered one, and it seems to be the primary objective of practically every require module I can find on CPAN.

However, many of the existing modules, in attempting to solve the string-name issue, result in the handling of 'this module cannot be sourced' becoming WORSE!.

Module Sourcing Headaches

The mysterious Perl 5.8 double-require hell

The following code, in my testing, works without issue:

     eval "require Foo;1"; 
     require Foo;

Now, if Foo happens to be broken, and cannot be sourced, on Perl < 5.10, then nothing will happen in the above code!. Scary, but true. Its even scarier if those 2 lines of code are worlds apart.

A quirk of how Perl 5.8 functions ( which is now solved in 5.10 ) is that once a module is require'd, as long as that file existed on disk, $INC{ } will be updated to map the module name to the found file name. This doesn't seem to bad, until you see how it behaves with regard to that being called again somewhere else. Take a look at this sample code from perlfunc:

sub require {
    my ( $filename ) = @_;
    if ( exists $INC{$filename} ){
       return 1 if $INC{$filename};
       die "Compilation failed in require";
    }
    ...
}

Now on 5.10 this is fine, because $INC{$filename} is set to 'undef' if an error was encountered. But on everything prior to 5.10, the value of $INC{$file} is in every way identical to the value it would have if the module loaded successfully. And as you do not want to require the module again once it has loaded, this behaviour falsely thinks "Oh, that's already loaded" and doesn't tell anyone anywhere that there is a problem.

If that's too much reading for you, here's the executive summary of the problem: You need everyone, everywhere, who either directly, or indirectly calls require inside an eval, to make sure any compilation/parsing errors with require is handled immediately. Because failing this, everything else that requires that same broken file will treat the file as successfully loaded, will not error, and you'll just get some confusing problem where the modules contents will not be anywhere you can see them.

From a debugging perspective, this behaviour frankly scares me, and I'm very glad its fixed in 5.10, and glad I can use 5.10, but for you poor suckers stuck working with 5.8, or trying to make 5.8 backwards compatible modules, this problem will crop up eventually, if not for you, for somebody who uses your modules.

Awful exceptions are awful

This following code looks fine at a first approach, but there are many things wrong with it:

     if( eval "require Foo; 1" ){ 
         # behaviour to perform if there is a Foo
     } else {
         # behaviour to perform if there is no Foo
     }

A nice and elegant way of saying "Try use this module, and if its not there, resort to some default behaviour"

But what about the magical middle condition, where its there, but its broken?. In this code, it will silently fall back to the default behaviour, and nothing anywhere will tell you that Foo is broken, and you'll spend several hours with a dumb look on your face while you prod completely unrelated code.

What we really need is a way to disambiguate between "its there" and "its broken", because ideally, if its there, and broken, we want a small nuclear explosion.

On Perl 5.10 and higher, this isn't so hard, we can just prod $INC{} to see what happened.

TestValueImplication
exists $INC{'Foo.pm'} a false valueThe module couldn't be found on disk, or nobody required it yet
exists $INC{'Foo.pm'} a true valueThe module exists on disk, and somebody has required it
defined $INC{'Foo.pm'}a false valueThe module exists, somebody required it, but it failed ( >5.10 only )
defined $INC{'Foo.pm'} a true value
  • The module loaded successfully ( >=5.10 )
  • Absolutely nothing useful( < 5.10 )

So that approach is not exactly very nice, or very portable.

The next option you have, is, if you're fortunate enough to actually get require to die for you when it should, is regexing the exception it throws. But that is just horrible. Regexing messages from die is stupid, its limited, and prone to breaking. Proper object exceptions are our salvation. What we really need for this situation is different exceptions that indicate the type of problem encountered, so we're not left guessing with cludgy code.

Stringy require headaches

This is the lesser evil, but not without its perils.

At some stage, if you write anything moderately interesting, you'll find the need to programmatically divine the name of a module to require. This is where require tends to bite you in the ass.

sub load_plugin { 
    my $plugin = shift;
    my $fullname = 'MyPackage::' . $plugin;
    require $fullname;
}

This is simply prohibited by the Perl Gods of Yore. You have to find some other way, and there are many modules targeted at this. There are some simple approaches, but they're also somewhat dangerous approaches too sometimes.

Bad Approach

Here is something you should really avoid if you're expecting the code to be used anywhere worth having any security. DO NOT DO THIS:

sub load_plugin { 
    my $plugin = shift;
    my $fullname = 'MyPackage::' . $plugin;
    eval "require $fullname;1" or die $@;
}

Firstly, you just pretty much wrote a wide open security hole. Somebody just needs to call:

   load_plugin( 'Bobby; unlink "/etc/some/important/document";' ); 

and the show is pretty much over. That's not necessarily so tragic if its your own code, and you're the only person who ever invokes it, but if its public facing, ( and especially if the code is published somewhere ), then avoid that style like cancer, because in my opinion, its not "if" its exploitable, but "when" its exploitable. Taint mode may help you a little bit, but don't bet on it.

Secondly, if you were foolish enough to have accidentally left out that 'or die $@' part, then you will have just created an invisible bug to be discovered later for everyone using Perl 5.8. Congratulations.

Less insane approach

The less insane approach is to emulate how perl maps Package names to file names internally, and pass that value to require. ( Because when you pass something as a string to require, its expecting a path of sorts, not a module name ).

sub load_plugin { 
    my $plugin = shift;
    my $fullname = 'MyPackage::' . $plugin;
    $fullname =~ s{::}{/}g;
    $fullname .= '.pm';
    require $fullname;
}

This is good, because there's no room for accidentally forgetting to call die $@, and the worst somebody can do is specify an arbitrary file on disk to read, which is what you were doing to begin with anyway. This is way way less dangerous than allowing execution of arbitrary code. Both these code samples are still plagued by the 5.8 double-require situation, if somebody manages to require() the broken code before you do and hide the error, but that's substantially less likely to happen.

Existing Modules, and what is wrong with them

I've seriously looked at many many modules on CPAN for this task. And sadly, none fit the bill perfectly.

UNIVERSAL::require

This seems to be the most popular one. But it only solves the stringy-require issue, and in reality, adds MORE potential for failure.
  • Victim to the double-load on 5.8 issue.

    this one line of code is sufficient enough to make this weak to the double require issue.

    return eval { 1 } if $INC{$file};
    
    As discussed above, on 5.8, if the file has already been 'required' but failed, $INC{$file} will be set to the path to that file. And as a result , UNIVERSAL::require will just respond with "Oh right".

  • No Exceptions

    This module doesn't help us at all with regard to exception objects. It relies entirely on Perl's native ( virtually non-existent ) exception system

  • Actually exacerbates the 5.8 issue

    In my opinion, this module actually makes us take a step backwards in progressive coding. It replaces useful informative exception throwing, with silence, and requires you to check a return value. The result is, everyone who does Foo->require() without checking the return value, will result in the very next thing that tries to require Foo, and expect an exception when its broken, silently succeed, but there will be no "Foo"

  • 2005 called and want their Perl style back

    Seriously, we've been trying to encourage people to use stuff like 'Autodie' because checking the return value of every open, every close, and every print ( yes, print can fail! ) is tedium, lazy people often forget to, adding 'die "$@ $? $!" ' at the end of everything SUCKS, let alone throwing actual exceptions that explain /what/ the problem was.
    Try working out whether the reason open failed was the file just wasn't there, or there was a permissions issue, or one of the other dozens of possible reasons, via code, and you're stuck using regular expressions. Yuck.

  • Monkey patching

    A lot of people really dislike the monkey-patch style that bolts into UNIVERSAL. Magically turning up everywhere on every object is really nasty, and really magical, and far too much magic for something that could be achieved by using an exported method instead. Seriously, string_require("Foo::Bar") vs "Foo::Bar"->require(); the difference is not big enough to warrant the nastiness of the latter.

Module::Load

  • 5.8 Double-Load Weak

    Still relies entirely on require to die if it cant load something.

  • No Exceptions

    Relies on $@ being a useful enough value to the user

  • Implicitly treats Exceptions like scalars

    Even if in the future fantasty land Perl 's require started throwing useful Exceptions ( backtrace, attributes that explained the problem type, introspection, soforth ), the code concatentates it into another scalar, so any exceptions that may exist will get squashed

Module::Locate

  • Holy hell, what?

    the code is from 2005 and has 2005 written all over it, if it was less chatoic, I might be able to see how it works

  • Doesn't invoke require

    It doesn't use require anywhere, so it doesn't even populate $@

  • Recommended use is to pass discovered variables to require

    Doesn't sound like much of a win to me. probably prone to the 5.8 issue

  • Doesn't throw exception objects

    Seems in 2005, nobody had discovered exceptions yet really.

Module::Require

  • Module is not really designed for one-off module requires
  • Code is weak vs 5.8 issues.
  • Code is pretty high on the wtfometer
  • Code aggravates 5.8 issues with suppressed failures by ignoring $@ after failures
  • No exception objects

Mrequire

  • Mangles $@ with chomp
  • No exception objects
  • 5.8 double-require weak
  • Oh dear, please , not AUTOLOAD :(

File::Where

  • Mostly an over the top file finding library, doesn't handle any of the require stuff
  • the usual, no exceptions, 5.8 double-require-weak

Module::Use

  • Not for requiring modules at all.

autorequire

  • Not really for this job, but...
  • Has a method for detecting package loading, however....
  • That method is subject to the 5.8 double-require weakness and its friends

Acme::RequireModule

  • proDespite being Acme::, it sucks less than everything else so far!
  • Still depends on native require for exceptions
  • XS
  • Still defers to the internal require() op, so probably still suffers the 5.8 problems.
  • Depends on >= 5.10 anyway

ClassLoader

  • Just as bad interface wise as UNIVERSAL::require
  • But worse, AUTOLOAD magics
  • Documented in German
  • eval "use $string"
    , very bad
  • Substitutes Perl require-fails string-only exceptions with alternative, german, string-only exceptions. Joy.
  • Prone to 5.8 issues

Module::Runtime

  • Prone to 5.8 issues
  • Standard Perl native exceptions only

THE BEST SO FAR

Class::Load

  • pro:Actually appears to have work-arounds in place for heuristically solving the 5.8 problem!
  • pro:Tests for the above claimed fact!
  • pro:Tests pass !
  • Still no exception objects ( perl default exceptions )
  • pro: Reasonably sane API
  • pro: No need to check silly return values

tl;dr summary

Class::Load is awesome, you should use it everywhere you need require to actually work sanely with possibly-missing or possibly-broken classes (ie: everywhere that there is a user-part in a require ).
You can probably use it for more, but that might be overkill =).

The only way I can see something being better than it is if something decides to implement object exceptions with failure metadata in them, instead of needing to re-explore the failure manually

No comments:

Post a Comment