So our example from the last lesson wasn't all that interesting. How hard is it to find a static string in another static string? Booooring!
I'm going to introduce now a syntax that I'll use to demonstrate regular expressions, which look like this:
Code:
my @teststrings = ('string1', 'string2');
foreach my $string (@teststrings) {
if ($string =~ /regex/) {
print "$string matched!\n";
} else { # optional else clause
print "$string didn't match!\n";
}
}
I'll embed in there comments about which we'd expect to match, given the values of string1, string2, and regex.
Oooookay! So what if you have a list of names, and you want to find everyone in that list with your same first name. Just for fun, we'll say your name is Stu.
Your list looks like this:
Dave White
Adrian Wapcaplet
Stu Peterson
Sally Smith
Michael Jones
Stu Watkins
Micheline Mann
Stu Pott
Here's the code I'd write to pick out the "Stu"s:
Code:
my @names = ('Dave White','Adrian Wapcaplet','Stu Peterson',
'Sally Smith','Michael Jones','Stu Watkins','Micheline Mann','Stu Pott');
foreach my $name (@names) {
if ($name =~ /Stu .*/) {
print "$name matched!\n";
}
}
The output we'd expect from this is:<pre>Stu Peterson matched!
Stu Watkins matched!
Stu Pott matched!</pre>
So let's look closer at the regex I wrote there. What I said was "/Stu .*/". There are two special characters in here. The dot (spelled ".") means "any character". (Strictly speaking, it means "any character except a newline, unless the regex is modified with /s", but for now let's ignore that.)
So . means "any character". And * means "zero or more of the preceding thing". So together they're "zero or more of any character". So all strung together, this regular expression says, in the given string, match the literal characters S, t, u, and a space, followed by zero or more of any character. Hence the Stus show up as matched.
Now... What if I want to make sure I only get Stu's with last names. ANYTHING that begins "Stu " (notice the space) will be matched by that string, even if there's nothing following that. We'd match zero characters at the end of "Stu ", and the regex would be perfectly happy with that.
So fine, instead of "*" (zero or more), we want to use "+", meaning ONE or more. The string "Stu " is NOT matched by the regular expresssion /Stu .+/. That'll match Stu Meatt and Stu Pidd, but it won't match Stuart Little or plain old Stu.
Be sure you're clear on the above before reading further in this lesson.
Okay, next thing. Let's say we want to grab the last names off of there? So we've got those in a variable to use all by themselves? Maybe later we'll want to alphebatize the Stus by last name or something. Here's how we do that.
Surround any portion of a regular expression with (parentheses), and you'll capture its matched value in a magic variable called $1. The second set of parens will be stored in $2, the third in $3, etc.
So this code:
Code:
my @names = ('Dave White','Adrian Wapcaplet','Stu Peterson',
'Sally Smith','Michael Jones','Stu Watkins','Micheline Mann','Stu Pott');
foreach my $name (@names) {
if ($name =~ /Stu (.+)/) {
print "$name has the last name $1!\n";
}
}
will print out:<pre>Stu Peterson has the last name Peterson!
Stu Watkins has the last name Watkins!
Stu Pott has the last name Pott!</pre>
Just be sure to assign $1 into another variable or push it onto an array or something--the next regular expression will overwrite it.
We covered a lot of ground so far, so I'm going to leave some time for questions before my next tutorial. Fire away, people!