by Steven J. Owens (unless otherwise attributed)
This is a tour of the things that an experienced programmer should know about when learning (or considering learning) perl. This is a combination of gems and flaws. The gems explain why you should want to learn perl - perl without these things is Just Another Programming Language, without much magic. At the same time, I want to warn the experienced programmer about the flaws, especially the ones that prey upon an experienced programmer's expectations (for example, function prototypes).
Note: On a casual reading, it might look like this article is negative about Perl. Don't be mistaken, I'm pretty fond of perl. Although there's probably more flaws, and more writing about the flaws, than gems and writing about the gems, that's because there are several small flaws and the gems are really bright and shiny and don't need as much text to explain them.
Note: Since writing this, I've actually been doing a fair bit of perl (go figure) and I'm finding that quite a lot has changed. There are some new features - many of them backported from the Gonna-Be-Here-Real-Soon-No-Shit-We-Mean-It-This-Time perl 6 project, others just packages and conventions strongly adopted by the perl community. Moose provides a drop-in overhaul of the perl OO stuff and looks really, really nifty. Some of the uglier perlisms have been stamped on by convention, etc.
There is, of course, a great deal of written material out there about perl, much of it excellent. In a sense, this is one of perl's gems. The perl community has some truly good writing.
I shouldn't even have to mention the fine books by Larry Wall, Randal Schwartz and Tom Christiansen. You can't turn over a rock in the perl world without finding a recommendation to read their works, and with very good reason.
There's also the perldoc utility, part of the standard perl install, which has tons of good information. Over the past decade the rest of the world has caught up a lot to the quality level of the Perl documentation, so you may underestimate how great Perl's documentation has always been. Particularly recommended reading here is "perldoc perlfaq" and "perldoc perltoot"
I'll try to get around to including a more specific recommended reading list. But for the purposes of this article, I really, really like some of Mark-Jason Dominus's tutorials, which highlight and explain stuff that's not covered well elsewhere. His other stuff (http://perl.plover.com/) is also fun and worth reading, but is less directed to the key issue of perl gems and flaws.
On skimming this article again, I notice that I mention Mark-Jason Dominus in several places below. Like I said above, this is only a small part of the great writing that the perl world has, I just found he addressed some key topics that I think an experienced but new-to-perl programmer should look out for.
But having said that, I recommend reading:
Coping with Scoping
http://perl.plover.com/FAQs/Namespaces.html
Understand References Today
http://perl.plover.com/FAQs/references.html
Other good ones:
Suffering from Buffering?
http://perl.plover.com/FAQs/Buffering.html
Seven Useful Uses of local
http://perl.plover.com/local.html
Seven Useful Uses of local isn't really that handy and useful, per se, but it does explore a well-known, but (in my experience) little-understood aspect of Perl's scoping. Not something you have to read, at least at first, but probably something you'll get some good out of reading, sooner or later.
Just the FAQs: Precedence Problems
http://perl.plover.com/FAQs/Precedence.html
I don't find the Precedence FAQ as interesting/useful, because I just always use parentheses to make sure my intended precedence is explicit (see my comments about implicit vs. explicit behaviors). What the fuck is the big deal here? But apparently some do, so for them, this is a good article to read.
Mark-Jason Dominus posting on Perl list context and scalar context
http://perl.plover.com/context.html
This isn't actually one of Mark-Jason's FAQs, but it should be, because it thoroughly explains one of the twistier things that newcomers to Perl will probably struggle with, how the difference between list and scalar context can alter the behavior of a perl expression.
Also good, for waxing eloquent on some of the flaws I point out below:
Sins of Perl Revisited by Mark-Jason Dominus
http://www.perl.com/pub/a/1999/11/sins.html
Addendum:
Stas Bekman from the mod_perl world wrote a nice little trio of articles, "The Perl You Need To Know". A lot of the time perl scripts are very straightforward and directed -- either CGI scripts or a command line utility that is run, does a single job and then exits -- rather than a full-blown, complex, persistent application. A lot of the documentation out there reflects that, glossing over certain details. When you start getting into mod_perl, you can't take them for granted anymore. But even if you're not doing a mod_perl app anytime soon, if you're a real programmer you'll probably appreciate having this stuff laid out for you explicitly:
The Perl You Need To Know, Part 1
http://www.perl.com/pub/a/2002/04/23/mod_perl.html
The Perl You Need To Know, Part 2
http://www.perl.com/pub/a/2002/05/07/mod_perl.html
The Perl You Need To Know, Part 3
http://www.perl.com/pub/a/2002/05/14/mod_perl.html
Addendum II: More recently, Somni on #perl wrote this:
http://shoebox.net/articles/perl-warts.html
Okay, moving on to the gems and flaws:
Perl has lots and lots of implicit behaviors. These behaviors are handy when you're writing a one-off, but frankly I find them annoying in general, and really annoying when I'm trying to decipher somebody else's code. Let me say it as simply and baldly as I can:
I know I'm a bit more hard line about this than most programmers. For example, in many/most languages, flow control structures (if/then, etc) make the block delimiters (typically {curly-braces}) optional if the block is only one line long. I never leave the block delimiters off. If you ask me why not, I'll answer why the hell should I?
Oh my god, my poor fingers, having to type two more keys. Oh my god, my poor eyes, having to read two more characters. There's no practical reason to leave them off. On the other hand, leaving them on will prevent an accidental newline from ever breaking your code. Even the best programmers sooner or later run into this.
But if you really need this general principle justified, then you're an idiot (and we're off and running! :-).
Perl doesn't really do OO, it just provides some handy shortcuts to make it feasible to fake it. This is annoying and will waste time and confuse you if you take at face value the claim that perl does OO.
I'll try to get back here and summarize how OO is shoehorned into perl.
(And if you want to argue what "really do OO" means, I'll clarify: OO is not as deeply integrated into perl as many of the other features (lists, hashes, etc.) that make perl so powerful. It's so not-as-deeply-integrated that it's almost more trouble than it's worth.)
Addendum: Recently MOOSE has been taking the perl world by storm. Moose is a perl package that adds seriously revamped OO support to perl. If half of what I'm hearing holds true, MOOSE will make me a lot happier about doing complex projects in Perl.
Globals are, of course, generally a bad thing, because they create the ability for code to arbitrarily modify data in other parts of the program, which almost inevitably leads to confusion. In general, if you can possibly do something without using a global to do it, then that's the right way to do it.
I hate the fact that everything is global by default in perl.
Apparently this is a bit of a holy war in the perl community, or touches on one, because I once got some perl notables pulling passive-aggressive autistic "What do you mean? That makes no sense!" crap when I asked if there was a switch or setting to reverse this behavior - make all variables locally scoped by default.
(Oh, and by the way, that was only possible after I'd managed to explain that I meant "locally scoped" in the generic programming sense, not in the perl local sense - here's a clue guys, stop acting like Microsoft programmers and insisting that everybody abandon common usage of terms in favor of Perl's special little world).
There are two keywords for tighter-than-global scoping in perl, "local" and "my". Use "my". Don't use "local".
"local" is the vermiform appendix of perl. It came before "my" and is generally superceded by "my". Local works kind of oddly, and it's not for any intentional reason, but just because local was shoehorned into perl and that was the best they could do at the time.
It is generally recommended not to use "local" at all.
There are, of course, some nifty special cases where local can be handy, see Mark-Jason Dominus' article, Seven Useful Uses of local, referenced above.
Perl doesn't have ANSI-C-style function prototypes. Perl has something called prototypes, that are somewhat related to function signatures, but they're not ANSI C prototypes.
http://www.perl.com/pub/a/1999/11/sins.html#5
See my note about common usage of terms in the global section. Particularly ironic in this case since (I vaguely recall - really need to track down a reference for this before publishing (oops, too late!)) perl's prototypes were shoehorned into perl in response to ANSI prototypes.
It was the best they could do at the time, and I'm certainly not criticizing the folks who did it (I'll save that for until after I've created my own wildly successful and popular programming language), but it's annoying to deal with people who have no clue about the history and insist that "Perl does too have prototypes!"
"Perl's prototypes are not the prototypes in other languages. Other languages use prototypes to define, name and type the arguments to a subroutine. Perl prototypes are about altering the context in which the subroutine's arguments are evaluated, instead of the normal list context. They are fraught with peril, full of traps and should not be used without considerable thought." -- http://www.perlfoundation.org/perl5/index.cgi?prototype
Addendum: As I said in the OO section, recently MOOSE has been taking the perl world by storm. Besides adding seriously revamped OO support to perl, MOOSE also adds real function prototypes - maybe even better than ANSI prototypes.
Besides lack of ANSI prototypes, the most annoying thing about Perl subroutines is that all of your parameters get flattened into a single list. That is, if you pass a scalar and two lists into a function, the first several lines of the function code have to pick apart one big list. The only sane way to do this is to, instead, pass in the scalar and two references to lists. See the next section.
Reference syntax is ugly and painful and finicky, and it isn't going to get better anytime soon. If it were better, using references would be a lot more easy and other perl flaws would be a lot easier to live with (I'm talking about you, subroutines!).
Fortunately, Mark-Jason Dominus's excellent article, Understand References Today, makes it easier to figure them out. MJD also makes some acerbic comments in Seven Deadly Sins of Perl Revisted:
"Tom's complaint seems to be that reference syntax is too complicated. I don't think anyone can argue with that. Reference syntax is awful. It isn't going to get any better, either."
Perl has several extremely powerful native datatypes. There's the usual variable types that hold simple values like ints or strings (called scalar variables - in perl a scalar is an int or a string depending on what you try to do with it). In addition, perl also has lists (self-resizing arrays), and hashes (dictionaries). Also regexes (regular expressions).
I say "native datatypes" because, while other languages certainly have equivalents to lists and hashes, in perl they are worked deeply into the fabric of the language. They are not something you use and then return to regular programming, they are the stuff of programming, in perl. This makes working in perl incredibly handy, most of the time. Read The Perl Cookbook to truly understand why this is so.
This is equally true for regular expressions. See the section below.
As a general rule of thumb, in Perl, Try the stupid way first.
Although there are perl fanatics out there who build complex and sophisticated applications in perl, most of the time I (and I suspect most people) use perl as a tool language, to solve specific, limited problems. Most of the time, those problems are of a scope and scale such that it really is pointless to get all tricksy, and you're better off just trying the brute-force, brain-dead stupid way first, and seeing if that's good enough.
As a simple example; many of my perl scripts seem to be about munging log files and coming up with a conclusion. Although log files can get quite large, most log files are plain text, and even a truly massive plain text log file is still, in this day and age, only a fraction of the memory on your average workstation. You can engage in mental gymnastics to minimize the amount of memory your script uses, or you can say the hell with it and just try to do what you need - including loading the entire dang file in as a single list of strings. Most of the time, you'll find that works just fine.
In some senses, the native list/hash/regex support mentioned above makes perl prone to, uhm, how shall I say this... "meatball programming".
In the old tv series MASH, about a US field surgery camp in the Korean War, they introduced the character Charles Emerson Winchester The Third, a good surgeon but with no battlefield experience. The other doctors have to explain to him (paraphrasing), "this is meatball surgery, get the kid patched together and shipped back to base for real medical care while you get to the next dying soldier."
http://www.urbandictionary.com/define.php?term=meatball+surgery
Perl excels at meatball programming. Often there is a "right" way to do something, but in perl you can just fake it using a list or a hash. Getting into the habit of using these idioms is a big part of using any language effectively, but perl is especially idiom-prone. As I say in the "Gem: scalars, list, etc" section, this is a strength, but the awful (flaw-ful? ugh) side effect of this is that you have to decipher the idioms, and the idioms are not always used to simplify the code.
As with english, while there are often many ways to say it, many of those ways often suck.
Speaking of english, programmers are notorious for playing with language, and for incorporating jokes and puns into their technical work. The perl crowd is especially (in)famous for doing this.
This is either a gem or a flaw, depending on whether you have a sense of humor or not. And yes, that sentence is deliberately ambiguous. :-)
The Perl Cookbook is my favorite way to learn perl idioms. Effective Perl Programming was good, but I found the Perl Cookbook just insanely helpful, and far more effective at getting me to use and learn perl idioms.
Note: Today the next several sections will be somewhat easier to write than it would have been ten or fifteen years ago, because a lot of other languages have adopted some of the common perl syntax, like foreach. I'm not crediting Perl with inventing these idioms (I have no idea who thought of it first) but Perl definitely used them long before the mainstream languages that have adopted them relatively recently.
Scatter assignment is nifty and handy. Scatter assignment allows you to assign multiple return values to multiple variables, in one line:
my ($first, $second, $third) = 1, 2, 3;
Okay, so what? Here's where that actually comes in handy:
my ($first, $second, $third) = someFunction() ;
For a more specific example:
my $csvline = "ubuntu,linux,perl,5" ; my ($distro, $os, $language, $version) = split(/,/, $csvline) ;
Perl scalar variables can hold strings or numbers and Perl will convert between them on the fly, depending on what the expression looks like. You can add a string to an int and Perl will add the int value of the string:
$birthyear = 1967 ; $age = "42" ; $currentyear = $birthyear + $age ;
This can be quite handy for a quick script, but of course it can also be quite complicated and mysterious.
Also, while it's nifty, it also means we need some idiosyncratic syntax (say that five times fast) for explicitly doing concatenation and comparison. Hm, I'm actually not sure that's a flaw - I like explicit behaviors, after all. I guess it's more of a gotcha, a discontinuity that may trip you up, so I'm describing it here.
String concatenation is done with a "." instead of a "+". E.g.
my $string = "one" . " plus " . "one" ; print OUT $string ; => "one plus one"
Perl also has some short-hand for appending; this ends up being surprisingly handy at times.
my $string = "one" . " plus " . "one" ; $string .= " equals two" ; print OUT $string ; => "one plus one equals two"
There's an equivalent for ints:
my $someint = 1; $someint += 1 ;
String comparison is done with EQ instead of ==. Why the fuck they couldn't just use "EQUALS", or even "equals", I don't know.
Oddly enough, in perl, I so seldom do straight-out string comparison that I end up looking this up every time. I suspect that use of hashes and especially regexes obviates much of what we normally use string comparison for.
While we're at it, Perl has a number of handy little operators, like the numeric comparison operator <=>. Google on "perl operators" or do "perldoc perlop" to learn more.
One of the things that makes perl wonderful and terrible is that many things work differently depending on whether you're assigning to a list or a scalar. In other words, the following two examples might have very different behavior:
$somescalar = someFunction();
@somelist = someFunction();
In perl parlance this is called "list context" or "scalar context", and it's one of those implicit behaviors that can mess with your head if you're not ready for them. Fortunately, Mark-Jason Dominus explains the topic pretty clearly here:
Perl geeks love to rhapsodize about closures, so I suppose I should include something about them here.
http://www.perl.com/pub/a/2002/05/29/closure.html
One of the more tedious things about the java programming language is that it won't let you get away with being sloppy. I agree that it's good to avoid being sloppy, but often this means that you can't compile your code until you've gotten a lot of make-work out of the way - for example, you can't sketch in a bunch of methods and leave out the body until later, you have to at least have a literal return statement (if the method has a return value).
This can be tedious, but on the other hand, it's comforting - you know that certain kinds of problems just can't bite you on the ass. I actually miss that comfort when I work in perl. Perl doesn't have quite the neurotic rigidity that Java has, but it does have some handy mechanisms that help a bit:
"use strict" will tell the perl interpreter to disallow a number of the shortcuts and half-assery that you can normally get away with. In essence, it tells perl "Don't give me too much rope", starting with requiring you to explicitly declare each variable using "my $varname;" (or "local $varname;" but local is sorta deprecated, so use my instead). This is generally a really good idea in any serious programming project with perl:
use strict;
Also "use warnings ;" (or perl -w) will make the compiler print a variety of warnings for commonly known bad style, things that may bite you in the ass down the road.
#/bin/perl -w
Also, "perl -c scriptname.pl" will compile the script but not actually run it, giving you a quick and easy way to check that you haven't added any syntax errors but without risking the program actually doing anything, and possibly munging your data if the script isn't done yet.
Even if you don't actually use "use english", it's a good idea to read up on it. The "use english" documentation is one of the few places that lists and summarizes a bunch of the built-in perl variables (the Perl Cookbook gives you lots of examples of using them).
One of the first things I was taught about perl (and which is actually not true, but a good thing to tell a beginning perl programmer, I suspect) was, "Perl is basically a programming language built around a regular expression engine" and "in perl, if you can do it with a regular expression, you probably should."
Like I said, not really true, but probably a good thing to say to a programmer just learning perl, because otherwise they're going to tend not to really get armpit deep in regexes, and never really get the habit of taking full advantage of them.
This is really good little tutorial to get you a bit deeper into using regular expressions, and also to show you some of the optional-but-mandatory-for-your-sanity conveniences that perl has for working with regexes more easily:
http://www.cs.uvic.ca/~gtzan/seng265/Summer2006/slides/05_perl_regex.pdfParticularly the /xms options (e.g. m/pattern/xms or s/before/after/xms).
For a sysadmin-tool/shell-scripting/glue language, you'd think Perl would make it a bit easier and friendlier to fork processes and invoke other commands. I label this as "Gem or Flaw" because I suspect that it's probably worlds easier than most other languages, so I shouldn't judge it as an argument against using Perl. But it's not nearly as friendly as you've come to expect from Perl, particularly given that this is perl's raison de etre.
Perl's eval command allows you to evaluate perl code dynamically, which means you can generate code and execute it on the fly. Java is (finally) starting to get better at this sort of thing, but it's still a much more well-explored problem space in perl. It's a lot of rope, so you can easily hang yourself, but it's also very powerful.
CPAN is the Comprehensive Perl Archive Network, basically a huge database of perl libraries that is available for installation, as well as really handy tools that make the installation process easy.
There was a time when I would have said that CPAN was a massive advantage that the perl community had over Java. Since then, CPAN has gotten even better, but the open source Java world, as well as Linux package management technology, have done some serious catching up. Still, Perl has an incredibly strong community of bright, helpful, annoying people. And a lot of those bright people have put a lot of really good code into CPAN.