A Regular Expression (hereafter "regex") is a pattern of characters defined by a specific format or "sub-language". The regex sub-language is simultaneously extremely terse and extremely expressive.
Regular expressions are a concept that's not unique to Perl, by the way. Perl supports a fairly broad extension of standard regex syntax. In this lesson I'll be starting with the basics and going to advanced topics in later lessons. I don't promise I'll stay away from Perl-specific stuff here.
To the uninitiated (and even the initiated!), they can be completely mind-boggling. I've got employees who have been completely proficient in Perl for years who still come to me with their eyes crossed about some regular expression or other. I believe it's the regex sub-language that gives Perl the reputation of being about as readable as modem line noise.
The point of a regular expression is to test whether the pattern described in the expression is found (or "matched") in a given string. I'll say that another way. You hand the regex engine a string and a pattern, and you ask, "Is this pattern found in that string?"
The basic structures of a regex are the "=~" and "m//" operators. The "=~" is an equals-like operator that tells the Perl einge to set up for a match. The two slashes in "m//" are delimiters on the expression itself--in fact, the "m" is optional in real live Perl scripts. A regular expression returns true or false--either it matched the given string, or it didn't.
Here's an example of a regular expression in action:
Code:
my $matchtext = 'The rain in Spain falls mainly on the plain';
if ($matchtext =~ /ain/) {
print "Found a match!\n"; #we would expect to see this
}
Couple things to notice here. First, notice the equals-like operator that cues Perl to know we're talking about a regular expression, "=~". The mnemonic I use for that is "approximating". There's a "not found in" operator too, spelled !~, which simply inverts the return of the test.
Second, notice that I left the "m" off the m// operator. Totally legal, in fact I recommend it for readability purposes. I'll be doing that in all my examples.
Third, notice what's between the slashes. "ain". That means, match the LITERAL STRING "ain". Which the regex engine did ONCE AND ONLY ONCE: it found it at "The r
ain in Spain falls mainly on the plain." Regular expressions match left-to-right. If the expression had been /lain/, it would have found it only in the word "plain" at the end of the string.
These are the basics of how to spell a regular expression. Next lesson we'll get into wildcard globbing, alternation, and backreferences.